Fix boundary condition in SMCK #2524
Conversation
|
I approved workflow runs to see if your change has any unexpected effects in the test suite. Thanks for the PR!z |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2524 +/- ##
=======================================
Coverage 93.83% 93.83%
=======================================
Files 20 20
Lines 12104 12104
Branches 2242 2242
=======================================
Hits 11358 11358
Misses 567 567
Partials 179 179
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
The tests pass, but the question is if >= 0 is in fact correct and underflow is the root cause of the bug or if the correct bound is indeed > 0. Sadly, we'd need someone w/deeper knowledge of this model to weigh in. |
|
This is excellent, thanks @currocam. I think your fix is almost certainly correct, but I want to add a test to the C suite so that we can see why the condition wasn't being hit before. I'll take a close look in the coming days. |
| @@ -7690,7 +7690,7 @@ msp_smc_k_common_ancestor_event( | |||
|
|
|||
| /* find second hull */ | |||
| search = &x_hull->left_avl_node; | |||
There was a problem hiding this comment.
I don't know quite what's going on, but I'd want to check that we're not getting search == NULL here and remaining_mass == 0 before the loop even starts. LMK if you'd like me to get my head around all this enough to double-check the correctness.
There was a problem hiding this comment.
In the problematic example I posted, I think that’s not the case.
Before entering the loop, search is not null and remaining_mass is exactly 1.0 (up to floating-point error, I guess).
At the end of the first iteration, remaining_mass is decreased by one (which makes remaining_mass==0) but search is still not null. That's the hull we chose when doing >0 instead of >=0.
The assertion error is triggered in the 3070th iteration when the loop finishes traversing the linked list and finds the NULL.
|
Closing in favour of #2525 |
Hi again,
Related to #2523
I cloned the source code and added a few print statements around the error.
msprime/lib/msprime.c
Lines 7691 to 7699 in 1b14a8a
In the failing case, at the iteration the
tsk_bug_asserttriggers,remaining_massis exactly zero (!!!)I don't have a good understanding of the source code (nor C, really). Is this some sort of boundary condition error that is almost never triggered?
After changing the condition into
remaining_mass > 0rather thanremaining_mass >= 0, the reproducible example of #2523 runs without error on my machine: