Skip to content

Replace the quadratic paired-mapping search with linear-time sweep#565

Open
NicolasBuchin wants to merge 1 commit intomainfrom
paired-mapping-update
Open

Replace the quadratic paired-mapping search with linear-time sweep#565
NicolasBuchin wants to merge 1 commit intomainfrom
paired-mapping-update

Conversation

@NicolasBuchin
Copy link
Collaborator

@NicolasBuchin NicolasBuchin commented Feb 27, 2026

The previous paired-end mapping logic attempted to form pairs by testing all combinations of chains from read1 and read2, ordered by score, with a hard cap on the number of trials. This resulted in worst-case behavior close to O(MAX_PAIRS²) runtime and could not guarantee to find the best pair of chains.

This PR proposes to find pairs by splitting chains of each read into forward and revcomp sets, then sorting in O(N*log(N)) by ref_id then ref_start for each set, and doing a ~O(N+M) sweep on each forward/revcomp pair of sets using 2 pointers. Credit to @ksahlin for the idea.

We can guarantee to find the best scoring pair, that each chain is in one unique pair, but not that all valid pairs of unique chains will be returned.

For scoring pairs, we also introduce a bonus like what is done in paired extension to favor pairs that fit well in the distribution of known paired mappings.

Since returning sorted chains by score isn't needed when calling get_nams_by_chaining(), we extract the logic outside of this function.

Later I plan to introduce this new approach to paired extension, which still uses O(N²) complexity in this PR, but only after some evaluation has been done and we agree to use this new approach for pairing chains.

TODO:

  • Evaluate accuracy and runtime compared to main

    Instead of a O(N*M) pairing we do a ~O(N*log(N) M*log(M)) sort to pair in ~O(N+M)
    Since sorting will be different for paired and single ends, the
    sorting logic is extracted out fo get_nams_by_chaining().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant