Skip to content

Reinstate enum bias (#5318) with sharding tests enabled#5450

Open
glwagner wants to merge 7 commits intomainfrom
reinstate-enum-bias
Open

Reinstate enum bias (#5318) with sharding tests enabled#5450
glwagner wants to merge 7 commits intomainfrom
reinstate-enum-bias

Conversation

@glwagner
Copy link
Copy Markdown
Member

@glwagner glwagner commented Mar 29, 2026

Summary

This PR tests whether the enum bias change passes sharding/Reactant tests.
Sharding tests currently fail with an IFRT rendezvous timeout — the same failure
occurs on the bool-bias branch (#5451), which replaces enums with Bool.
Since Bool is a valid ReactantPrimitive, this rules out enum-specific memory
corruption. The issue appears to be related to compiling a single method with
ifelse branching (single concrete type) vs two specialized methods (two struct types).

Test plan

🤖 Generated with Claude Code

glwagner and others added 2 commits March 29, 2026 11:05
…#5318)""

This reverts commit 7da56015a19ce0cad13a219cabb80f0046d55ecb.
The enum bias change (#5318) commented out the sharding_tests CI job.
This re-enables them so that Reactant compatibility is tested.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@simone-silvestri simone-silvestri added the benchmark performance runs preconfigured benchamarks and spits out timing label Mar 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 47149058.568 35922470.089 +31.3%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 25662789.725 24917910.964 +3%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49455667.138 36858805.343 +34.2%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56953819.921 51481912.352 +10.6%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 96483155.768 65379536.555 +47.6%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 34423227.404 27387694.407 +25.7%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 11011880.750 10806244.475 +1.9%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 66671125.442 67234990.957 -0.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 52114124.287 42545699.825 +22.5%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 32452710.578 25166560.390 +29%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46254368.559 32211588.917 +43.6%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 45050198.560 33316352.612 +35.2%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 40790123.013 33887255.130 +20.4%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56480374.239 37479992.472 +50.7%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49800425.387 36558706.733 +36.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 43875411.010 32518527.254 +34.9%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu__rk_substep_turbulent_kinetic_energy_ 4.013 4.012 +0.0% 315 4.020 4.007 4.334
gpu_compute_hydrostatic_free_surface_Gu_ 2.613 3.953 -33.9% 318 2.620 2.605 2.846
gpu_compute_hydrostatic_free_surface_Gv_ 2.442 3.781 -35.4% 318 2.448 2.434 2.649
gpu_compute_CATKE_closure_fields_ 1.618 1.618 -0.0% 318 1.621 1.611 1.740
gpu_compute_TKE_diffusivity_ 1.272 1.271 +0.1% 315 1.274 1.265 1.381
gpu_compute_hydrostatic_free_surface_Gc_ 1.042 2.520 -58.7% 315 1.043 1.034 1.128
gpu_compute_hydrostatic_free_surface_Gc_ 1.039 2.520 -58.8% 315 1.040 1.030 1.127
gpu_compute_hydrostatic_free_surface_Gc_ 1.035 2.520 -58.9% 315 1.037 1.026 1.125
gpu__compute_w_from_continuity_ 0.341 0.341 -0.3% 633 0.341 0.336 0.349
gpu_solve_batched_tridiagonal_system_kernel_ 0.526 0.526 +0.0% 315 0.526 0.520 0.541

@glwagner
Copy link
Copy Markdown
Member Author

The gains seem greater here than on #5451

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 47009265.781 35895220.121 +31%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 26149442.876 24771489.364 +5.6%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49447473.933 36838133.539 +34.2%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 54347176.274 50909934.439 +6.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 96535669.651 65757236.055 +46.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 34432317.920 27313210.271 +26.1%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 11011079.855 10806949.060 +1.9%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 67148155.583 67253236.274 -0.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 52296647.498 42570437.956 +22.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 32454761.103 25142754.033 +29.1%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46267872.462 32186832.440 +43.7%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 45050478.579 33214670.740 +35.6%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 40790400.094 33765740.677 +20.8%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56600109.775 37466015.934 +51.1%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49797265.097 36538748.422 +36.3%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 43851008.790 32497799.113 +34.9%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu__rk_substep_turbulent_kinetic_energy_ 4.012 4.012 +0.0% 315 4.021 4.006 4.332
gpu_compute_hydrostatic_free_surface_Gu_ 2.609 3.952 -34.0% 318 2.617 2.601 2.841
gpu_compute_hydrostatic_free_surface_Gv_ 2.438 3.779 -35.5% 318 2.445 2.429 2.648
gpu_compute_CATKE_closure_fields_ 1.616 1.618 -0.2% 318 1.619 1.608 1.741
gpu_compute_TKE_diffusivity_ 1.270 1.271 -0.1% 315 1.273 1.264 1.377
gpu_compute_hydrostatic_free_surface_Gc_ 1.044 2.522 -58.6% 315 1.045 1.033 1.131
gpu_compute_hydrostatic_free_surface_Gc_ 1.040 2.522 -58.7% 315 1.043 1.031 1.127
gpu_compute_hydrostatic_free_surface_Gc_ 1.035 2.522 -58.9% 315 1.037 1.026 1.125
gpu__compute_w_from_continuity_ 0.338 0.340 -0.7% 633 0.338 0.334 0.346
gpu_solve_batched_tridiagonal_system_kernel_ 0.527 0.526 +0.1% 315 0.527 0.520 0.540

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 38.46154% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.36%. Comparing base (b71a2d7) to head (e1545a6).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...on/bounds_preserving_tracer_advection_operators.jl 0.00% 12 Missing ⚠️
src/Advection/upwind_biased_reconstruction.jl 0.00% 6 Missing ⚠️
src/Advection/weno_interpolants.jl 70.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #5450       +/-   ##
===========================================
+ Coverage   29.97%   73.36%   +43.38%     
===========================================
  Files         394      402        +8     
  Lines       22289    22886      +597     
===========================================
+ Hits         6682    16790    +10108     
+ Misses      15607     6096     -9511     
Flag Coverage Δ
buildkite 68.97% <44.11%> (?)
julia 68.97% <44.11%> (?)
reactant_1 6.45% <0.00%> (ø)
reactant_2 11.23% <0.00%> (ø)
reactant_3 9.56% <41.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@simone-silvestri
Copy link
Copy Markdown
Collaborator

How close are we to get this merged?

@jackdfranklin
Copy link
Copy Markdown
Collaborator

I'm not sure, I think it may be a case of waiting for Reactant/whatever compiler backend is breaking to be fixed... The possible short term solutions I have thought of are

  1. Moving the sharding CI to run on buildkite, where we could give the runners more RAM
  2. Use a Metal/MacOS runner for the sharding, since apparently the tests pass fine on metal

The issue is that these will still not resolve the underlying issue so may be an issue for some downstream workflows?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 53311354.365 39462017.660 +35.1%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 26346803.024 26467335.728 -0.5%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 55740216.419 40216954.241 +38.6%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 62400230.625 56327038.284 +10.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 95949237.252 65549385.656 +46.4%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 37781775.797 29389126.879 +28.6%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 11382343.802 11115104.849 +2.4%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 80695008.977 80421904.710 +0.3%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 60133973.749 47630378.741 +26.3%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 34945213.953 26835778.078 +30.2%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 46266450.211 32171696.927 +43.8%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 50694569.337 36290432.423 +39.7%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 45706310.696 37136451.947 +23.1%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56496743.721 37474477.295 +50.8%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56242850.576 39842364.133 +41.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 49472831.441 35340200.872 +40%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.614 3.957 -33.9% 318 2.619 2.604 2.859
gpu_compute_hydrostatic_free_surface_Gv_ 2.442 3.782 -35.4% 318 2.447 2.433 2.657
gpu__rk_substep_turbulent_kinetic_energy_ 1.988 1.988 -0.0% 315 1.992 1.983 2.173
gpu_compute_CATKE_closure_fields_ 1.584 1.581 +0.2% 318 1.587 1.575 1.718
gpu_compute_hydrostatic_free_surface_Gc_ 1.042 2.519 -58.6% 315 1.043 1.034 1.129
gpu_compute_hydrostatic_free_surface_Gc_ 1.039 2.519 -58.8% 315 1.040 1.030 1.121
gpu_compute_hydrostatic_free_surface_Gc_ 1.036 2.519 -58.9% 315 1.037 1.026 1.119
gpu__compute_w_from_continuity_ 0.339 0.339 +0.1% 633 0.339 0.334 0.344
gpu_compute_TKE_diffusivity_ 0.642 0.643 -0.1% 315 0.643 0.636 0.693
gpu_solve_batched_tridiagonal_system_kernel_ 0.526 0.526 +0.0% 315 0.527 0.519 0.542

@simone-silvestri
Copy link
Copy Markdown
Collaborator

simone-silvestri commented Apr 7, 2026

We can try moving the tests on buildkite, a 30 - 40% improvement seems quite juicy to pass off :)
I will open a PR to move them and see what happens

@giordano
Copy link
Copy Markdown
Collaborator

giordano commented Apr 7, 2026

Large memory usage is just a side effect of sharding being completely broken with this change for some reason (which unfortunately no one has the time to investigate right now), giving the tests more memory would only hide the brokenness.

@simone-silvestri
Copy link
Copy Markdown
Collaborator

simone-silvestri commented Apr 7, 2026

Ok then we can easily extend the relevant functions in the reactant extension until the issue is fixed?
Edit, it might be a bit difficult because we do not pass the architecture inside kernels

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 7, 2026

Edit, it might be a bit difficult because we do not pass the architecture inside kernels

We do have the architecture through the grid

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

Benchmark Comparison

Benchmark Comparison: PR vs Main

Benchmark PR (pts/s) Main (pts/s) Change
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 52667825.978 39037407.198 +34.9%
EarthOcean_tripolar_180x90x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 28104941.747 27318209.221 +2.9%
EarthOcean_tripolar_720x360x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 55244107.530 39940139.064 +38.3%
EarthOcean_tripolar_360x180x50_F32_WENOVectorInvariantDefault_WENO7_CATKE_2tr 63949710.743 55912431.191 +14.4%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_nothing_2tr 94969383.109 64682661.228 +46.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+Biharmonic_2tr 37412738.891 29303716.863 +27.7%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE+GM+Biharmonic_2tr 11335197.329 11085542.777 +2.3%
EarthOcean_tripolar_360x180x50_F64_nothing_nothing_CATKE_2tr 79636643.088 77186398.559 +3.2%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant5_WENO5_CATKE_2tr 59513652.317 47231600.042 +26%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariant9_WENO9_CATKE_2tr 35423914.717 26685856.411 +32.7%
EarthOcean_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 45783262.768 31979802.621 +43.2%
EarthOcean_immersed_lat_lon_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 49769005.134 35837301.173 +38.9%
EarthOcean_tripolar_zstar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 44934839.720 36533538.256 +23%
EarthOcean_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 56048222.332 37227046.995 +50.6%
EarthOcean_immersed_lat_lon_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr 55766682.372 39616094.034 +40.8%
EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_3tr 49041625.927 35085493.014 +39.8%

NSYS Kernel Profiling

EarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr

Kernel Median (ms) Main (ms) Change Instances Avg (ms) Min (ms) Max (ms)
gpu_compute_hydrostatic_free_surface_Gu_ 2.608 3.960 -34.1% 318 2.615 2.599 2.851
gpu_compute_hydrostatic_free_surface_Gv_ 2.439 3.780 -35.5% 318 2.445 2.431 2.658
gpu__rk_substep_turbulent_kinetic_energy_ 1.989 1.988 +0.1% 315 1.993 1.985 2.175
gpu_compute_CATKE_closure_fields_ 1.582 1.585 -0.2% 318 1.585 1.572 1.718
gpu_compute_hydrostatic_free_surface_Gc_ 1.046 2.529 -58.6% 315 1.048 1.037 1.134
gpu_compute_hydrostatic_free_surface_Gc_ 1.040 2.529 -58.9% 315 1.041 1.031 1.128
gpu_compute_hydrostatic_free_surface_Gc_ 1.034 2.529 -59.1% 315 1.036 1.026 1.118
gpu__compute_w_from_continuity_ 0.340 0.338 +0.5% 633 0.340 0.335 0.347
gpu_compute_TKE_diffusivity_ 0.643 0.643 -0.1% 315 0.644 0.637 0.695
gpu_solve_batched_tridiagonal_system_kernel_ 0.527 0.526 +0.2% 315 0.527 0.521 0.542

@simone-silvestri
Copy link
Copy Markdown
Collaborator

We do have the architecture through the grid

but we set it to nothing in the adapt. Maybe we can just pass it I am not sure it will add too much extra parameter weight in the kernels. I think we will find more places where we want to extend internal functions based on the architecture

@glwagner
Copy link
Copy Markdown
Member Author

glwagner commented Apr 8, 2026

We do have the architecture through the grid

but we set it to nothing in the adapt. Maybe we can just pass it I am not sure it will add too much extra parameter weight in the kernels. I think we will find more places where we want to extend internal functions based on the architecture

The parameter space should be minimal, I agree, because they are singletons. Distributed will need a good adapt though to remove the non-singleton info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark performance runs preconfigured benchamarks and spits out timing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants