Fp32 sq8 dist functions L2Sqr [MOD-13392] #885

dor-forer · 2026-01-11T07:43:37Z

Describe the changes in the pull request

This PR optimizes SQ8 L2 squared distance calculations by leveraging an algebraic identity to avoid dequantization in hot loops. The key optimization uses:

||x - y||² = Σx_i² - 2*IP(x, y) + Σy_i² = x_sum_squares - 2 * IP(x, y) + y_sum_squares
Where the inner product (IP(x, y)) is computed via the existing SQ8 inner product implementations.
This allows reusing the common inner product code across L2, IP, and Cosine distance functions, resulting in cleaner code and improved performance.
Which issues this PR fixes

MOD-13392

Main objects this PR modified

...
...

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

- Implemented inner product and cosine distance functions for SQ8-to-SQ8 vectors in SVE, NEON, and AVX512 architectures. - Added corresponding distance function selection logic in IP_space.cpp and function headers in IP_space.h. - Created benchmarks for SQ8-to-SQ8 distance functions to evaluate performance across different architectures. - Developed unit tests to validate the correctness of the new distance functions against expected results. - Ensured compatibility with existing optimization features for various CPU architectures.

…mproving performance

…ions

… SQ8-to-SQ8 calculations

… NEON and AVX512 headers

… function

- Implemented NEON, SVE, and AVX512F optimized functions for calculating L2 squared distance between SQ8 (scalar quantized 8-bit) vectors. - Introduced helper functions for processing vector elements using NEON and SVE intrinsics. - Updated L2_space.cpp and L2_space.h to include new distance function for SQ8-to-SQ8. - Enhanced AVX512F, NEON, and SVE function selectors to choose the appropriate implementation based on CPU features. - Added unit tests to validate the correctness of the new L2 squared distance functions. - Updated benchmark tests to include performance measurements for the new implementations.

…ocumentation accordingly

…tance assertion tolerance

…onsistency

…om/RedisAI/VectorSimilarity into dorer-sq8-dist-functions-l2

…tions

…ation

… using AVX512 VNNI; add benchmarks and tests for new functionality

…pulation

…VE, and AVX512; add corresponding selection functions and update tests for consistency.

…update benchmarks and tests for new functionality

- Updated distance function declarations in IP_space.h to clarify that SQ8-to-SQ8 functions use precomputed sum/norm. - Removed precomputed distance function implementations for AVX512F, NEON, and SVE architectures from their respective source files. - Adjusted benchmark tests to remove references to precomputed distance functions and ensure they utilize the updated quantization methods. - Modified utility functions to support the creation of SQ8 quantized vectors with precomputed sum and norm. - Updated unit tests to reflect changes in the quantization process and removed tests specifically for precomputed distance functions.

…nsistency - Updated include paths in AVX512F_BW_VL_VNNI.cpp to reflect new naming conventions. - Modified unit tests in test_spaces.cpp to streamline vector initialization and quantization processes. - Replaced repetitive code with utility functions for populating and quantizing vectors. - Enhanced assertions in tests to ensure optimized distance functions are correctly chosen and validated. - Removed unnecessary parameters from utility functions to simplify their interfaces. - Improved test coverage for edge cases, including zero and constant vectors, ensuring accuracy across various scenarios.

…tion parameters and clean up code formatting

…to dorer-fp32-sq8-dist-functions-ip-cosine

…rom query blob

- Implemented algebraic identity for L2 squared distance to avoid dequantization in hot loops across AVX2, AVX512, NEON, SSE4, SVE implementations. - Updated L2 distance functions to utilize precomputed sum and sum of squares, improving efficiency. - Modified unit tests to validate the new implementations and ensure consistency with previous non-optimized calculations. - Enhanced test utilities to support preprocessing of float vectors for SQ8 L2 space.

…to dorer-fp32-sq8-dist-functions-l2

Copilot

Pull request overview

This PR optimizes SQ8 L2 squared distance calculations using an algebraic identity to eliminate dequantization operations from hot loops. The optimization leverages the mathematical identity ||x - y||² = ||x||² + ||y||² - 2*IP(x, y) to compute L2 distance by reusing optimized inner product implementations and precomputed sum-of-squares values.

Changes:

Refactored SQ8 inner product implementation to extract a common SQ8_InnerProduct_Impl function that returns raw inner product values (not distance form)
Implemented the algebraic L2 distance formula across all SIMD architectures (SSE4, AVX2, AVX2+FMA, AVX512, NEON, SVE)
Updated query and storage vector layouts to include precomputed sum and sum_squares metadata for L2 distance calculations

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/utils/tests_utils.h	Added helper functions for L2 query preprocessing and reference L2Sqr implementation
tests/unit/test_spaces.cpp	Updated tests to use new query/storage layouts and validate optimized implementations
src/VecSim/spaces/L2/L2.cpp	Replaced element-wise dequantization with algebraic formula using inner product
src/VecSim/spaces/L2/L2_SSE4_SQ8.h	SSE4 implementation using algebraic identity
src/VecSim/spaces/L2/L2_AVX2_SQ8.h	AVX2 implementation using algebraic identity
src/VecSim/spaces/L2/L2_AVX2_FMA_SQ8.h	AVX2+FMA implementation using algebraic identity
src/VecSim/spaces/L2/L2_AVX512F_BW_VL_VNNI_SQ8.h	AVX512 implementation using algebraic identity
src/VecSim/spaces/L2/L2_NEON_SQ8.h	NEON implementation using algebraic identity
src/VecSim/spaces/L2/L2_SVE_SQ8.h	SVE implementation using algebraic identity
src/VecSim/spaces/IP/IP.h	Added declaration for SQ8_InnerProduct_Impl common implementation
src/VecSim/spaces/IP/IP.cpp	Extracted inner product logic into reusable SQ8_InnerProduct_Impl function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/VecSim/spaces/L2/L2.cpp

…ces.cpp

codecov · 2026-01-11T08:17:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (bdcbf80) to head (00a6ed0).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #885      +/-   ##
==========================================
- Coverage   97.13%   97.11%   -0.02%     
==========================================
  Files         129      129              
  Lines        7615     7497     -118     
==========================================
- Hits         7397     7281     -116     
+ Misses        218      216       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meiravgri

beautiful

meiravgri · 2026-01-12T04:57:45Z

tests/unit/test_spaces.cpp

-    params[2] = inv_norm;
+    // Create V1 fp32 query with precomputed sum and sum_squares
+    // Query layout: [float values (dim)] [sum] [sum_squares]
+    std::vector<float> v1_orig(dim + 2);


consider using storage_metadata_count and query_metadata_count

meiravgri · 2026-01-12T04:59:55Z

tests/unit/test_spaces.cpp

    ASSERT_NEAR(dist, baseline, 0.01) << "SQ8_InnerProduct failed to match expected distance";
 }

 TEST_F(SpacesTest, SQ8_l2sqr_no_optimization_func_test) {


is there a test to compare the 0 dist?

…paces and optimize memory allocation

meiravgri · 2026-01-12T08:47:10Z

tests/benchmark/spaces_benchmarks/bm_spaces_sq8.cpp

+        test_utils::populate_fp32_sq8_query(v1, dim, true, 1234);
+        size_t quantized_size =
+            dim * sizeof(uint8_t) + sq8::storage_metadata_count<VecSimMetric_L2>() * sizeof(float);
+        v2 = new uint8_t[quantized_size];


different seed

dor-forer added 30 commits December 28, 2025 09:37

Add SQ8-to-SQ8 benchmark tests and update related scripts

8697a3e

Format

e0ce268

Orgnizing

ab6b077

Add full sq8 bencharks

931e339

Optimize the sq8 sq8

a56474d

Optimize SQ8 distance functions for NEON by reducing operations and i…

a25f45c

…mproving performance

format

0ad941e

Add NEON DOTPROD-optimized distance functions for SQ8-to-SQ8 calculat…

68cd068

…ions

PR

0b4b568

Remove NEON DOTPROD-optimized distance functions for INT8, UINT8, and…

d0fd2e4

… SQ8-to-SQ8 calculations

Fix vector layout documentation by removing inv_norm from comments in…

9de6163

… NEON and AVX512 headers

Remove 'constexpr' from ones vector declaration in NEON inner product…

63a46a1

… function

Change the name

5bef023

Add full range tests for SQ8 distance functions with SIMD optimizations

72053af

Refactor distance functions to remove inv_norm parameter and update d…

525f8da

…ocumentation accordingly

Update SQ8 Cosine test to normalize both input vectors and adjust dis…

13a477b

…tance assertion tolerance

Rename 'compressed' to 'quantized' in SQ8 functions for clarity and c…

c18000e

…onsistency

Merge branch 'dorer-sq8-dist-functions-ip-cosine' of https://github.c…

b58f8ef

…om/RedisAI/VectorSimilarity into dorer-sq8-dist-functions-l2

Rename 'compressed' to 'quantized' in SQ8 distance tests for clarity

286990a

Refactor quantization function to remove unused normalization calcula…

8cdc3fc

…tions

Add TODO to store vector's norm and sum in L2 squared distance calcul…

189290e

…ation

Implement SQ8-to-SQ8 distance functions with precomputed sum and norm…

bbf810e

… using AVX512 VNNI; add benchmarks and tests for new functionality

Add edge case tests for SQ8-to-SQ8 precomputed cosine distance functions

dbbb7d9

Refactor SQ8 test cases to use CreateSQ8QuantizedVector for vector po…

36ab068

…pulation

Implement SQ8-to-SQ8 precomputed distance functions using ARM NEON, S…

00617d7

…VE, and AVX512; add corresponding selection functions and update tests for consistency.

Implement SQ8-to-SQ8 precomputed inner product and cosine functions; …

4331d91

…update benchmarks and tests for new functionality

dor-forer added 9 commits January 7, 2026 19:43

Refactor SQ8 inner product implementations to use structured quantiza…

5506f55

…tion parameters and clean up code formatting

Fix SQ8 EdgeCases test by adjusting vector size for constant vector test

8cbc649

Fix formatting in SQ8_EdgeCases test by adjusting vector initialization

7b34dc4

Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…

053411e

…to dorer-fp32-sq8-dist-functions-ip-cosine

Refactor SQ8 inner product implementations to use precomputed y_sum f…

8186454

…rom query blob

Fix formatting in SQ8_EdgeCases test for better readability

37600e1

Refactor SQ8 cosine distance calculation to use optimized function

77d03be

Merge branch 'main' of https://github.com/RedisAI/VectorSimilarity in…

aff60ae

…to dorer-fp32-sq8-dist-functions-l2

dor-forer requested a review from Copilot January 11, 2026 07:43

Copilot started reviewing on behalf of dor-forer January 11, 2026 07:44 View session

Copilot AI reviewed Jan 11, 2026

View reviewed changes

src/VecSim/spaces/L2/L2.cpp Show resolved Hide resolved

Fix formatting in IP.cpp and IP.h documentation for better readability

bcd8ca3

dor-forer added the bm-spaces-sq8-full label Jan 11, 2026

dor-forer requested a review from meiravgri January 11, 2026 07:54

dor-forer marked this pull request as ready for review January 11, 2026 07:54

Remove unused CreateSQ8CompressedVector helper function from test_spa…

1f9f9eb

…ces.cpp

meiravgri reviewed Jan 12, 2026

View reviewed changes

dor-forer added 2 commits January 12, 2026 10:07

Add self-distance L2 test for SQ8 edge cases with optimization checks

050da95

Refactor SQ8 query handling to unify preprocessing for IP/Cosine/L2 s…

e894935

…paces and optimize memory allocation

dor-forer requested a review from meiravgri January 12, 2026 08:42

meiravgri reviewed Jan 12, 2026

View reviewed changes

Fix query population seed in SQ8 benchmark for consistency

00a6ed0

dor-forer requested a review from meiravgri January 12, 2026 09:00

meiravgri approved these changes Jan 12, 2026

View reviewed changes

dor-forer added this pull request to the merge queue Jan 12, 2026

Merged via the queue into main with commit 32d0279 Jan 12, 2026
24 checks passed

dor-forer deleted the dorer-fp32-sq8-dist-functions-l2 branch January 12, 2026 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fp32 sq8 dist functions L2Sqr [MOD-13392] #885

Fp32 sq8 dist functions L2Sqr [MOD-13392] #885

Uh oh!

dor-forer commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov bot commented Jan 11, 2026 •

edited

Loading

Uh oh!

meiravgri left a comment

Uh oh!

meiravgri Jan 12, 2026

Uh oh!

meiravgri Jan 12, 2026

Uh oh!

meiravgri Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fp32 sq8 dist functions L2Sqr [MOD-13392] #885

Fp32 sq8 dist functions L2Sqr [MOD-13392] #885

Uh oh!

Conversation

dor-forer commented Jan 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

codecov bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meiravgri left a comment

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

meiravgri Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 11, 2026 •

edited

Loading