-
Notifications
You must be signed in to change notification settings - Fork 21
Fp32 sq8 dist functions L2Sqr [MOD-13392] #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures. - Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h. - Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures. - Developed unit tests to validate the correctness of the new distance functions against expected results. - Ensured compatibility with existing optimization features for various CPU architectures.
…mproving performance
… SQ8-to-SQ8 calculations
… NEON and AVX512 headers
- Implemented NEON, SVE, and AVX512F optimized functions for calculating L2 squared distance between SQ8 (scalar quantized 8-bit) vectors. - Introduced helper functions for processing vector elements using NEON and SVE intrinsics. - Updated L2_space.cpp and L2_space.h to include new distance function for SQ8-to-SQ8. - Enhanced AVX512F, NEON, and SVE function selectors to choose the appropriate implementation based on CPU features. - Added unit tests to validate the correctness of the new L2 squared distance functions. - Updated benchmark tests to include performance measurements for the new implementations.
…ocumentation accordingly
…tance assertion tolerance
…om/RedisAI/VectorSimilarity into dorer-sq8-dist-functions-l2
… using AVX512 VNNI; add benchmarks and tests for new functionality
…VE, and AVX512; add corresponding selection functions and update tests for consistency.
…update benchmarks and tests for new functionality
- Updated distance function declarations in IP_space.h to clarify that SQ8-to-SQ8 functions use precomputed sum/norm. - Removed precomputed distance function implementations for AVX512F, NEON, and SVE architectures from their respective source files. - Adjusted benchmark tests to remove references to precomputed distance functions and ensure they utilize the updated quantization methods. - Modified utility functions to support the creation of SQ8 quantized vectors with precomputed sum and norm. - Updated unit tests to reflect changes in the quantization process and removed tests specifically for precomputed distance functions.
…nsistency - Updated include paths in AVX512F_BW_VL_VNNI.cpp to reflect new naming conventions. - Modified unit tests in test_spaces.cpp to streamline vector initialization and quantization processes. - Replaced repetitive code with utility functions for populating and quantizing vectors. - Enhanced assertions in tests to ensure optimized distance functions are correctly chosen and validated. - Removed unnecessary parameters from utility functions to simplify their interfaces. - Improved test coverage for edge cases, including zero and constant vectors, ensuring accuracy across various scenarios.
…tion parameters and clean up code formatting
…to dorer-fp32-sq8-dist-functions-ip-cosine
- Implemented algebraic identity for L2 squared distance to avoid dequantization in hot loops across AVX2, AVX512, NEON, SSE4, SVE implementations. - Updated L2 distance functions to utilize precomputed sum and sum of squares, improving efficiency. - Modified unit tests to validate the new implementations and ensure consistency with previous non-optimized calculations. - Enhanced test utilities to support preprocessing of float vectors for SQ8 L2 space.
…to dorer-fp32-sq8-dist-functions-l2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes SQ8 L2 squared distance calculations using an algebraic identity to eliminate dequantization operations from hot loops. The optimization leverages the mathematical identity ||x - y||² = ||x||² + ||y||² - 2*IP(x, y) to compute L2 distance by reusing optimized inner product implementations and precomputed sum-of-squares values.
Changes:
- Refactored SQ8 inner product implementation to extract a common
SQ8_InnerProduct_Implfunction that returns raw inner product values (not distance form) - Implemented the algebraic L2 distance formula across all SIMD architectures (SSE4, AVX2, AVX2+FMA, AVX512, NEON, SVE)
- Updated query and storage vector layouts to include precomputed sum and sum_squares metadata for L2 distance calculations
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/utils/tests_utils.h | Added helper functions for L2 query preprocessing and reference L2Sqr implementation |
| tests/unit/test_spaces.cpp | Updated tests to use new query/storage layouts and validate optimized implementations |
| src/VecSim/spaces/L2/L2.cpp | Replaced element-wise dequantization with algebraic formula using inner product |
| src/VecSim/spaces/L2/L2_SSE4_SQ8.h | SSE4 implementation using algebraic identity |
| src/VecSim/spaces/L2/L2_AVX2_SQ8.h | AVX2 implementation using algebraic identity |
| src/VecSim/spaces/L2/L2_AVX2_FMA_SQ8.h | AVX2+FMA implementation using algebraic identity |
| src/VecSim/spaces/L2/L2_AVX512F_BW_VL_VNNI_SQ8.h | AVX512 implementation using algebraic identity |
| src/VecSim/spaces/L2/L2_NEON_SQ8.h | NEON implementation using algebraic identity |
| src/VecSim/spaces/L2/L2_SVE_SQ8.h | SVE implementation using algebraic identity |
| src/VecSim/spaces/IP/IP.h | Added declaration for SQ8_InnerProduct_Impl common implementation |
| src/VecSim/spaces/IP/IP.cpp | Extracted inner product logic into reusable SQ8_InnerProduct_Impl function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #885 +/- ##
==========================================
- Coverage 97.13% 97.11% -0.02%
==========================================
Files 129 129
Lines 7615 7497 -118
==========================================
- Hits 7397 7281 -116
+ Misses 218 216 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
meiravgri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
tests/unit/test_spaces.cpp
Outdated
| params[2] = inv_norm; | ||
| // Create V1 fp32 query with precomputed sum and sum_squares | ||
| // Query layout: [float values (dim)] [sum] [sum_squares] | ||
| std::vector<float> v1_orig(dim + 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using storage_metadata_count and query_metadata_count
| ASSERT_NEAR(dist, baseline, 0.01) << "SQ8_InnerProduct failed to match expected distance"; | ||
| } | ||
|
|
||
| TEST_F(SpacesTest, SQ8_l2sqr_no_optimization_func_test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test to compare the 0 dist?
…paces and optimize memory allocation
| test_utils::populate_fp32_sq8_query(v1, dim, true, 1234); | ||
| size_t quantized_size = | ||
| dim * sizeof(uint8_t) + sq8::storage_metadata_count<VecSimMetric_L2>() * sizeof(float); | ||
| v2 = new uint8_t[quantized_size]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different seed
Describe the changes in the pull request
This PR optimizes SQ8 L2 squared distance calculations by leveraging an algebraic identity to avoid dequantization in hot loops. The key optimization uses:
||x - y||² = Σx_i² - 2*IP(x, y) + Σy_i² = x_sum_squares - 2 * IP(x, y) + y_sum_squaresWhere the inner product (IP(x, y)) is computed via the existing SQ8 inner product implementations.
This allows reusing the common inner product code across L2, IP, and Cosine distance functions, resulting in cleaner code and improved performance.
Which issues this PR fixes
Main objects this PR modified
Mark if applicable