Skip to content

Use Bool for advection bias instead of enum or structs#5451

Closed
glwagner wants to merge 5 commits intomainfrom
bool-bias
Closed

Use Bool for advection bias instead of enum or structs#5451
glwagner wants to merge 5 commits intomainfrom
bool-bias

Conversation

@glwagner
Copy link
Copy Markdown
Member

Summary

Replaces the @enum Bias LeftBias RightBias from #5318 (and the original struct LeftBias end / struct RightBias end) with a plain Bool:

@inline bias(u::Number) = u > 0  # true = left-biased, false = right-biased

This preserves the performance optimization from #5318 — a single concrete type means Julia compiles one method instead of two specialized methods with call-site branching — while being Reactant-safe. Bool is a ReactantPrimitive that Reactant can trace natively, unlike Julia enums which are not supported by Reactant's tracing infrastructure and may cause memory corruption (see #5449).

Builds on #5450 (reinstate of #5318) to preserve Jack Franklin's original contribution in the git history.

Test plan

🤖 Generated with Claude Code

glwagner and others added 3 commits March 29, 2026 11:05
…#5318)""

This reverts commit 7da56015a19ce0cad13a219cabb80f0046d55ecb.
The enum bias change (#5318) commented out the sharding_tests CI job.
This re-enables them so that Reactant compatibility is tested.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace `@enum Bias LeftBias RightBias` with a plain Bool:
`bias(u) = u > 0` (true = left, false = right).

This preserves the performance optimization from #5318 (single
concrete type, so Julia compiles one method instead of two) while
being Reactant-safe: Bool is a ReactantPrimitive that Reactant
can trace natively, unlike Julia enums which are not supported
by Reactant's tracing infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simone-silvestri simone-silvestri added the benchmark performance runs preconfigured benchamarks and spits out timing label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 47159080.659 35922470.089 +31.3%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 26086040.133 24917910.964 +4.7%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49455624.075 36858805.343 +34.2%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 54472973.284 51481912.352 +5.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 96386670.708 65379536.555 +47.4%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 34439262.305 27387694.407 +25.7%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 11005618.301 10806244.475 +1.8%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 67080188.089 67234990.957 -0.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 52301982.533 42545699.825 +22.9%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 32450809.172 25166560.390 +28.9%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46260575.740 32211588.917 +43.6%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 44817642.626 33316352.612 +34.5%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 40793796.951 33887255.130 +20.4%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56595448.012 37479992.472 +51%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49790065.314 36558706.733 +36.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 43947400.681 32518527.254 +35.1%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu__rk_substep_turbulent_kinetic_energy_ 4.012 4.012 +0.0% 315 4.017 4.005 4.333
gpu_compute_hydrostatic_free_surface_Gu_ 2.609 3.953 -34.0% 318 2.614 2.599 2.856
gpu_compute_hydrostatic_free_surface_Gv_ 2.439 3.781 -35.5% 318 2.443 2.430 2.653
gpu_compute_CATKE_closure_fields_ 1.615 1.618 -0.2% 318 1.618 1.608 1.737
gpu_compute_TKE_diffusivity_ 1.270 1.271 -0.0% 315 1.271 1.262 1.370
gpu_compute_hydrostatic_free_surface_Gc_ 1.043 2.520 -58.6% 315 1.044 1.034 1.129
gpu_compute_hydrostatic_free_surface_Gc_ 1.040 2.520 -58.7% 315 1.041 1.030 1.121
gpu_compute_hydrostatic_free_surface_Gc_ 1.035 2.520 -58.9% 315 1.036 1.025 1.121
gpu__compute_w_from_continuity_ 0.339 0.341 -0.6% 633 0.339 0.334 0.347
gpu_solve_batched_tridiagonal_system_kernel_ 0.527 0.526 +0.2% 315 0.527 0.519 0.540

@jackdfranklin
Copy link
Copy Markdown
Collaborator

I don't think this will actually solve the issue. After speaking to @giordano, it seems that the compiler uses too much memory when trying to compile the kernel - this will likely be the case if we use enums or bools. On top of that, enums are just structs containing ints (after macros) and run fine with all the other Reactant tests. I think it is a sharding issue specifically. Moreover, the test runs fine on MacOS, so it is a Linux specific issue with the code generation.

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Mar 31, 2026

I don't think this will actually solve the issue. After speaking to @giordano, it seems that the compiler uses too much memory when trying to compile the kernel - this will likely be the case if we use enums or bools. On top of that, enums are just structs containing ints (after macros) and run fine with all the other Reactant tests. I think it is a sharding issue specifically. Moreover, the test runs fine on MacOS, so it is a Linux specific issue with the code generation.

Can you explain how this is different from using the enum? It looks exactly the same to me but I think you are implying this is a different approach. The performance difference is identical as well?

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 1, 2026

@giordano may be able to help with an explanation too I suspect.

@giordano
Copy link
Copy Markdown
Collaborator

giordano commented Apr 1, 2026

I don't have a great explanation besides what Jack said above. In practice, somehow the sharding attributes to the MLIR modules disappear when doing this, and that causes as a side effect memory usage to blow up, which is only the visible symptom, but not the root (😏) problem. Why this is happening, is wholly unclear. I have the feeling it's bug in the x86-64 XLA backend, but haven't investigated too much in that direction.

The problem is independent of the specific implementation (enum vs boolean), once the sharding problem is resolved, that's a matter of style preference.

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 1, 2026

Ok I think you are saying that the difference between enum and Bool is style preference only? This is my question: what is the difference between enums (#5450) and bools (this PR)?

@giordano
Copy link
Copy Markdown
Collaborator

giordano commented Apr 1, 2026

Ok I think you are saying that the difference between enum and Bool is style preference only?

I think so. The generated code, at least for sharding, is the same.

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 1, 2026

Which approach is preferred aesthetically? This one or enums? I am agnostic.

@giordano
Copy link
Copy Markdown
Collaborator

giordano commented Apr 1, 2026

Which approach is preferred aesthetically? This one or enums? I am agnostic.

I'm not sure to be honest. I personally find the enum approach cute 😅 But I don't know if that's better.

@simone-silvestri
Copy link
Copy Markdown
Collaborator

If they are the same in terms of performance and maintainability I also prefer the enum because it is more explicative

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 3, 2026

ill close this then

@glwagner glwagner closed this Apr 3, 2026
@glwagner glwagner deleted the bool-bias branch April 3, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark performance runs preconfigured benchamarks and spits out timing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants