Replace the quadratic paired-mapping search with linear-time sweep by NicolasBuchin · Pull Request #565 · ksahlin/strobealign

NicolasBuchin · 2026-02-27T13:56:14Z

The previous paired-end mapping logic attempted to form pairs by testing all combinations of chains from read1 and read2, ordered by score, with a hard cap on the number of trials. This resulted in worst-case behavior close to O(MAX_PAIRS²) runtime and could not guarantee to find the best pair of chains.

This PR proposes to find pairs by splitting chains of each read into forward and revcomp sets, then sorting in O(N*log(N)) by ref_id then ref_start for each set, and doing a ~O(N+M) sweep on each forward/revcomp pair of sets using 2 pointers. Credit to @ksahlin for the idea.

We can guarantee to find the best scoring pair, that each chain is in one unique pair, but not that all valid pairs of unique chains will be returned.

For scoring pairs, we also introduce a bonus like what is done in paired extension to favor pairs that fit well in the distribution of known paired mappings.

Since returning sorted chains by score isn't needed when calling get_nams_by_chaining(), we extract the logic outside of this function.

Later I plan to introduce this new approach to paired extension, which still uses O(N²) complexity in this PR, but only after some evaluation has been done and we agree to use this new approach for pairing chains.

TODO:

Evaluate accuracy and runtime compared to main

Instead of a O(N*M) pairing we do a ~O(N*log(N) M*log(M)) sort to pair in ~O(N+M) Since sorting will be different for paired and single ends, the sorting logic is extracted out fo get_nams_by_chaining().

Paired mapping reworked:

b4d217f

Instead of a O(N*M) pairing we do a ~O(N*log(N) M*log(M)) sort to pair in ~O(N+M) Since sorting will be different for paired and single ends, the sorting logic is extracted out fo get_nams_by_chaining().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace the quadratic paired-mapping search with linear-time sweep#565

Replace the quadratic paired-mapping search with linear-time sweep#565
NicolasBuchin wants to merge 1 commit intomainfrom
paired-mapping-update

NicolasBuchin commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NicolasBuchin commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NicolasBuchin commented Feb 27, 2026 •

edited

Loading