Skip to content

Fix boundary condition in SMCK #2524

Closed
currocam wants to merge 1 commit into
tskit-dev:mainfrom
currocam:main
Closed

Fix boundary condition in SMCK #2524
currocam wants to merge 1 commit into
tskit-dev:mainfrom
currocam:main

Conversation

@currocam
Copy link
Copy Markdown

@currocam currocam commented May 4, 2026

Hi again,

Related to #2523

I cloned the source code and added a few print statements around the error.

msprime/lib/msprime.c

Lines 7691 to 7699 in 1b14a8a

/* find second hull */
search = &x_hull->left_avl_node;
for (search = search->prev; remaining_mass >= 0; search = search->prev) {
tsk_bug_assert(search != NULL);
y_hull = (hull_t *) search->item;
if (y_hull->left == x_hull->left || y_hull->right > x_hull->left) {
remaining_mass--;
}
}

In the failing case, at the iteration the tsk_bug_assert triggers, remaining_mass is exactly zero (!!!)

I don't have a good understanding of the source code (nor C, really). Is this some sort of boundary condition error that is almost never triggered?

After changing the condition into remaining_mass > 0 rather than remaining_mass >= 0, the reproducible example of #2523 runs without error on my machine:

Platform: macOS-26.4.1-arm64-arm-64bit
Python: 3.12.12 (main, Dec  9 2025, 19:05:33) [Clang 21.1.4 ]
msprime: 0.1.dev4250
tskit: 1.0.0
numpy: 2.4.2

2026-05-04 17:17:40,038 [41544] INFO     msprime.ancestry: Sampling 50 individuals with ploidy 2 in population 0 (name='pop0') at time 0
2026-05-04 17:17:40,042 [41544] INFO     msprime.ancestry: Starting replicate 0
2026-05-04 17:17:40,042 [41544] INFO     msprime.ancestry: model[0] {'name': 'smc_k', 'k': 1.0} started at time=0 nodes=100 edges=0
2026-05-04 17:17:40,042 [41544] INFO     msprime.ancestry: Running model {'name': 'smc_k', 'k': 1.0} until max time: inf
2026-05-04 17:17:40,089 [41544] DEBUG    msprime.ancestry: time=554.851 ancestors=3239 ret=1
2026-05-04 17:17:40,131 [41544] DEBUG    msprime.ancestry: time=1491.18 ancestors=3753 ret=1
2026-05-04 17:17:40,164 [41544] DEBUG    msprime.ancestry: time=3131.51 ancestors=3985 ret=1
2026-05-04 17:17:40,192 [41544] DEBUG    msprime.ancestry: time=6154.73 ancestors=4027 ret=1
2026-05-04 17:17:40,218 [41544] DEBUG    msprime.ancestry: time=11301.7 ancestors=4099 ret=1
2026-05-04 17:17:40,240 [41544] DEBUG    msprime.ancestry: time=19977.2 ancestors=4015 ret=1
2026-05-04 17:17:40,261 [41544] DEBUG    msprime.ancestry: time=34372.7 ancestors=3771 ret=1
2026-05-04 17:17:40,279 [41544] DEBUG    msprime.ancestry: time=71819.1 ancestors=1739 ret=1
2026-05-04 17:17:40,282 [41544] DEBUG    msprime.ancestry: time=227877 ancestors=0 ret=0
2026-05-04 17:17:40,282 [41544] DEBUG    msprime.ancestry: Skipping remaining 0 models
2026-05-04 17:17:40,306 [41544] INFO     msprime.ancestry: Completed at time=227877 nodes=35505 edges=124253
35283
``

@molpopgen
Copy link
Copy Markdown
Member

I approved workflow runs to see if your change has any unexpected effects in the test suite.

Thanks for the PR!z

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.83%. Comparing base (1b14a8a) to head (61522ca).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2524   +/-   ##
=======================================
  Coverage   93.83%   93.83%           
=======================================
  Files          20       20           
  Lines       12104    12104           
  Branches     2242     2242           
=======================================
  Hits        11358    11358           
  Misses        567      567           
  Partials      179      179           
Flag Coverage Δ
C 84.10% <100.00%> (ø)
c-python 72.89% <100.00%> (ø)
python-tests 98.58% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Python API 98.61% <ø> (ø)
Python C interface 92.85% <ø> (ø)
C library 91.04% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@molpopgen
Copy link
Copy Markdown
Member

The tests pass, but the question is if >= 0 is in fact correct and underflow is the root cause of the bug or if the correct bound is indeed > 0. Sadly, we'd need someone w/deeper knowledge of this model to weigh in.

@jeromekelleher
Copy link
Copy Markdown
Member

This is excellent, thanks @currocam. I think your fix is almost certainly correct, but I want to add a test to the C suite so that we can see why the condition wasn't being hit before. I'll take a close look in the coming days.

Comment thread lib/msprime.c
@@ -7690,7 +7690,7 @@ msp_smc_k_common_ancestor_event(

/* find second hull */
search = &x_hull->left_avl_node;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know quite what's going on, but I'd want to check that we're not getting search == NULL here and remaining_mass == 0 before the loop even starts. LMK if you'd like me to get my head around all this enough to double-check the correctness.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the problematic example I posted, I think that’s not the case.

Before entering the loop, search is not null and remaining_mass is exactly 1.0 (up to floating-point error, I guess).

At the end of the first iteration, remaining_mass is decreased by one (which makes remaining_mass==0) but search is still not null. That's the hull we chose when doing >0 instead of >=0.

The assertion error is triggered in the 3070th iteration when the loop finishes traversing the linked list and finds the NULL.

@jeromekelleher jeromekelleher mentioned this pull request May 8, 2026
@jeromekelleher
Copy link
Copy Markdown
Member

Closing in favour of #2525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants