Conversation
|
@chfast, let me know what you find out please :-) |
|
Preliminary benchmarks: So this looks solid for compare operators directly and seems to help the shifts as well, although this may be code layout effect. |
|
Thanks, glad to hear :-) |
|
Confirmed, this does not affect assembly of shift operators. Proof: https://godbolt.org/z/4b6cT4rje. |
|
Yes, makes sense. When I played with this repo a while back, I noticed that the benchmarks had variability from one run to the next, so I'm not sure how reliable the comparison truly is. |
Codecov Report
@@ Coverage Diff @@
## master #269 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 10 10
Lines 1941 1937 -4
=========================================
- Hits 1941 1937 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
@chfast should this be merged? |
|
On the latest compilers I don't see much difference so I'm not very much in favor because this implementation has 4 branches while the original one is branchless. But people are welcome to drop their own benchmarking results (use "compare" filter). |
|
Why does it matter that it is branchless if it is slower? |
|
I will try to clean up the situation.
Here you can see the assembly output for each: https://godbolt.org/z/WY3qqzrET AnalysisIn the worst case every variant need to load all 8 words. Additionally, "sub" executes 4 subtract instructions (independent of input). The "ne" executes 4 BenchmarksI added all variants of implementations in #277 and benchmarked them on two CPUs. There are 5 benchmark cases:
HaswellSkylakeZen3This is noisy machine with AMD CPU. For this case I used Xeon Platinum 8272CLFrom the results we should rather consider switching back to "sub" implementation. I may do some other benchmarks if I have time. |
|
Kudos, SonarCloud Quality Gate passed!
|








No description provided.