Optimize the rotm kernel with RVV intrinsic.#5038
Closed
tingboliao wants to merge 3 commits intoOpenMathLib:developfrom
Closed
Optimize the rotm kernel with RVV intrinsic.#5038tingboliao wants to merge 3 commits intoOpenMathLib:developfrom
tingboliao wants to merge 3 commits intoOpenMathLib:developfrom
Conversation
added 2 commits
December 31, 2024 10:32
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
Collaborator
|
Thanks - the numbers are very compelling, but I'm not entirely sure having that much architecture-specific code at the interface level is a good idea. At least I don't think we've done this before, and if every architecture ifdef'd their specific intrinsics implementation into it, the file would get unwieldy rather quickly. (Need some time to think about alternatives though - not sure if it's easy to add a kernel mapping for just riscv64 either...) |
Author
|
Thanks, we will further consider new alternatives, and submit a new Pull Request (PR) later if possible. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Based on the scalar implementation of rotm, we optimized it by using RVV 1.0 Intrinsic.
Subsequently, we developed related cases for the functional and performance verifications on K230 and K1.
The performance data are shown as below:
Parameter setting: OPENBLAS_LOOPS = 10000.
K230 [C908, vlen = 128]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 875.57 | 1536.78 |
| drotm.goto | 799.77 | 1408.70 |
K1 [C908, vlen = 256]@1.6GHz:
| Cases | Scalar / MFlops | Optimized RVV / MFlops |
| srotm.goto | 880.02 | 1490.44 |
| drotm.goto | 811.13 | 1541.92 |
In the above data, the bigger value is, the better performance is.