Commit 849bfe1
authored
Optimize AVX2 Haswell DGEMM SUP Kernels for Improved FMA Throughput (#894)
Details:
- This commit enhances the performance of AVX2 DGEMM SUP edge kernels by addressing FMA instruction latency
issues in low-computation scenarios, particularly in corner cases handled by edge kernels.
- Key Improvements:
- Reduced FMA Latency:
Previously, edge kernels reused a limited set of vector registers to hold FMA results, creating dependencies that
forced the CPU to wait for prior FMA instructions to complete before issuing new ones. This bottleneck was
especially pronounced in small-sized matrix multiplications.
- Register Set Expansion:
The updated implementation utilizes two distinct sets of vector registers to hold intermediate FMA results.
This allows subsequent FMA instructions to proceed without waiting for previous ones, improving
instruction-level parallelism and throughput.
- Final Accumulation Strategy:
At the end of the unrolled K-loop, the two register sets are summed to produce the final result, ensuring
correctness while maintaining performance gains.
- Modified Kernels:
- m_left edge kernels:
bli_dgemmsup_rv_haswell_asm_6x8m
bli_dgemmsup_rv_haswell_asm_6x6m
bli_dgemmsup_rv_haswell_asm_6x4m
bli_dgemmsup_rv_haswell_asm_6x2m
- mn_left edge kernels:
bli_dgemmsup_rv_haswell_asm_{1..6}x{2,4,6,8}
- Newly Added Kernels:
- m_left kernels:
bli_dgemmsup_rv_haswell_asm_6x{1,3,5,7}m
- mn_left kernels:
bli_dgemmsup_rv_haswell_asm_{1..6}x{1,3,5,7}
These additions ensure comprehensive coverage for all edge-case matrix sizes, improving robustness
and performance consistency across the DGEMM SUP microkernel suite.1 parent 5c2b22d commit 849bfe1
File tree
11 files changed
+17217
-2347
lines changed- frame/include
- kernels/haswell
- 3/sup
- d6x8
11 files changed
+17217
-2347
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
777 | 777 | | |
778 | 778 | | |
779 | 779 | | |
| 780 | + | |
780 | 781 | | |
781 | 782 | | |
782 | 783 | | |
| |||
810 | 811 | | |
811 | 812 | | |
812 | 813 | | |
| 814 | + | |
813 | 815 | | |
814 | 816 | | |
815 | 817 | | |
| |||
912 | 914 | | |
913 | 915 | | |
914 | 916 | | |
| 917 | + | |
| 918 | + | |
915 | 919 | | |
916 | 920 | | |
917 | 921 | | |
| |||
1239 | 1243 | | |
1240 | 1244 | | |
1241 | 1245 | | |
| 1246 | + | |
| 1247 | + | |
1242 | 1248 | | |
1243 | 1249 | | |
1244 | 1250 | | |
| |||
0 commit comments