[rlsw] Micro-optimizations, tighter pipeline and cleanup#5673
Open
Bigfoot71 wants to merge 20 commits intoraysan5:masterfrom
Open
[rlsw] Micro-optimizations, tighter pipeline and cleanup#5673Bigfoot71 wants to merge 20 commits intoraysan5:masterfrom
Bigfoot71 wants to merge 20 commits intoraysan5:masterfrom
Conversation
This adds a macro system that generate a function for each possible combination of blending factors, resulting in 11*11 functions, hence 121. This then allows for only one indirection and function call instead of two previously (assuming the first call was inlined).
Simplifies the validation of blend functions. Can allow `SW_SRC_ALPHA_SATURATE` as dst factor, but hey
removes `float screen[2]`; each step stores the transformed coordinates in `float coord[4]`. This also simplifies vertex interpolation during triangle rasterization.
My mistake in a previous commit
This removes the per-pixel switch; it's slightly more efficient on my hardware, but probably a poor prediction Should remain profitable or at worst the same
…ipping + a little cleanup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR focuses on micro-optimizations and cleanup work made possible by the previous refactor.
Improvements
screenmember has been removed;coordnow carries data through every stage of the pipelineuint8_t<->floatSIMD conversion paths are now reused across more texture formatsuint8_t->floatcolor conversion as a non-SIMD fallbackglDrawArraysandglDrawElementshave been cleaned up and simplifiedPlus various other minor adjustments and some code reorganization.
Profiling results
There is no longer a single dominant bottleneck, costs are now fairly evenly distributed across the pipeline.
With these changes I can finally hit (on my machine) a stable 60 FPS in O2 without manual SIMD in
models_first_person_maze(including in high overdraw areas).With O3 + SSE2 this goes up to ~200 FPS in
models_first_person_mazeand ~2800 bunnies intextures_bunnymarkbefore hitting 30 FPS.What's next
The next meaningful architectural improvement in the current state, while preserving current capabilities, would be to accumulate vertices and render per scanline rather than in fully immediate mode. This would also open the door to parallelization, which would likely be the most biggest remaining gain.
Beyond that, it may be worth exploring alternative rasterization methods alongside the current scanline approach, this would require some design thought but could open up better paths depending on the target hardware. A tile-based rasterizer in particular would also make lazy clearing more natural than now (in addition to the other benefits it would bring).