Skip to content

Commit eac5ee9

Browse files
authored
Optimize bli_gemm_haswell_asm_d6x8: reduce latency & improve throughput (#895)
Details: - Reorder instructions to reduce pipeline stalls and dependencies - Adjust register usage to improve data reuse - Tweak prefetching so cache warming is more effective - Since gemm kernels now support general strides, corrected macro to enable optimized assembly kernel for general stride inputs
1 parent e23d936 commit eac5ee9

File tree

1 file changed

+1262
-707
lines changed

1 file changed

+1262
-707
lines changed

0 commit comments

Comments
 (0)