Determine anchor orientation prior to chaining#574
Determine anchor orientation prior to chaining#574Itolstoganov wants to merge 20 commits intomainfrom
Conversation
…ould be in the correct orientation)
|
Very nice. Looks like it could solve problems with inconsistent NAMs (chains) donwstream -- similar to the non-canonical syncmers idea. This approach still preserves the same syncmers being generated, so more consistency in scoring chains between different directions compared to non-canonical seeds that can be different in abundance and in quantity between strands. About the commit: The description/code should be changed so that "canonical" is replaced with "forward" in most places. All the syncmers are canonical, but some are forward and some reverse w.r.t. the sequence. So the bit is really keeping whether the seed is forward or not. |
Depends on how you look at it? My logic was that we only store forward syncmers from the sequence and hashes of their canonical versions, so all syncmers are forward and some of them are canonical. This seems more intuitive to me since the position field of the This only concerns the syncmers code, in the context of the index, "forward"/"unoriented" is used instead of "canonical". |
This stores a bool in
Syncmerwhich is true iff the syncmer is canonical. These canonicity bits are then included in the randstrobe hash and used to filter out hits with a different orientation than the reference. The new randstrobe hash layout<strobe 1 hash><strobe 1 canonicity bit><strobe 2 hash><strobe 2 canonicity bit>
Accuracy is the same as in main, the runtime is also mostly the same (slightly faster for sim6).
ends.pdf
The main benefit is the removal of spurious anchors and the resulting small chains.
chain-stats.pdf