cpu: aarch64: add ASIMD softmax JIT implementation #4441
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This commit introduces an f32 ASIMD
softmaxJIT implementation using theexpeltwise injector added in #4376, while also improving performance for the existingsve_*implementations (primarily by increasing the unrolling factorunroll_regs_and skipping the multiplication with default dequantization / requantization factorssrc_scales/dst_scales). Forjit:asimdandjit:sve_128, theexpfunction is also effectively inlined by settingpreserve_vmm = false, whereasjit:sve_256did not benefit from such a change.As the previous softmax implementation heavily relied on predicated instructions,
jit_softmax_base_twas refactored to only include common logic for SVE and non-SVE implementations alike. At the same time, two different derived constructs were added to handle ISA-specific work:jit_softmax_sve_tandjit_softmax_asimd_t.In addition, the JIT eltwise injector was changed to support storing/loading preserved vectors on non-SVE targets.
Performance improvements (f32)
c6g
c7g
c8g