Reinstate enum bias (#5318) with sharding tests enabled#5450
Reinstate enum bias (#5318) with sharding tests enabled#5450
Conversation
…#5318)"" This reverts commit 7da56015a19ce0cad13a219cabb80f0046d55ecb.
The enum bias change (#5318) commented out the sharding_tests CI job. This re-enables them so that Reactant compatibility is tested. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
|
The gains seem greater here than on #5451 |
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5450 +/- ##
===========================================
+ Coverage 29.97% 73.36% +43.38%
===========================================
Files 394 402 +8
Lines 22289 22886 +597
===========================================
+ Hits 6682 16790 +10108
+ Misses 15607 6096 -9511
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
How close are we to get this merged? |
|
I'm not sure, I think it may be a case of waiting for Reactant/whatever compiler backend is breaking to be fixed... The possible short term solutions I have thought of are
The issue is that these will still not resolve the underlying issue so may be an issue for some downstream workflows? |
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
|
We can try moving the tests on buildkite, a 30 - 40% improvement seems quite juicy to pass off :) |
|
Large memory usage is just a side effect of sharding being completely broken with this change for some reason (which unfortunately no one has the time to investigate right now), giving the tests more memory would only hide the brokenness. |
|
Ok then we can easily extend the relevant functions in the reactant extension until the issue is fixed? |
We do have the architecture through the grid |
Benchmark ComparisonBenchmark Comparison: PR vs Main
NSYS Kernel ProfilingEarthOcean_tripolar_360x180x50_F64_WENOVectorInvariantDefault_WENO7_CATKE_2tr
|
but we set it to |
The parameter space should be minimal, I agree, because they are singletons. Distributed will need a good adapt though to remove the non-singleton info. |
Summary
sharding_testsCI job that Switch biases to be enums rather than distinct types. #5318 had commented outThis PR tests whether the enum bias change passes sharding/Reactant tests.
Sharding tests currently fail with an IFRT rendezvous timeout — the same failure
occurs on the
bool-biasbranch (#5451), which replaces enums withBool.Since
Boolis a validReactantPrimitive, this rules out enum-specific memorycorruption. The issue appears to be related to compiling a single method with
ifelsebranching (single concrete type) vs two specialized methods (two struct types).Test plan
🤖 Generated with Claude Code