Commit eac5ee9
authored
Optimize bli_gemm_haswell_asm_d6x8: reduce latency & improve throughput (#895)
Details:
- Reorder instructions to reduce pipeline stalls and dependencies
- Adjust register usage to improve data reuse
- Tweak prefetching so cache warming is more effective
- Since gemm kernels now support general strides, corrected macro to
enable optimized assembly kernel for general stride inputs1 parent e23d936 commit eac5ee9
1 file changed
+1262
-707
lines changed
0 commit comments