A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764
Open
A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764
Conversation
Replace the per-nibble method call chain in SAMUtils.compressedBasesToBytes with a 512-byte pre-computed lookup table (NIBBLE_PAIR_LOOKUP) that decodes two bases per iteration. Ported from htslib's code2base approach. The previous implementation made 4 method calls per base pair, each involving a try/catch and array lookup. The new version does a single table lookup per compressed byte with no method calls, halving loop iterations. Benchmarks show ~7% speedup on pure BAM reading (5.2M records, 776M bases). Remove dead methods compressedBaseToByte, compressedBaseToByteLow, and compressedBaseToByteHigh which are no longer called. Add tests for odd-length sequences, single base, empty sequence, offset, and round-trip encoding.
BAMRecord construction was performing ~10 dictionary lookups per record (index->name->index round-trips for both ref and mate, then again via a redundant setHeader() call in BAMRecordCodec.decode()). This reduces it to 2 lookups per record (one resolveNameFromIndex each for ref and mate) by: - Adding package-private setReferenceNameAndIndex/setMateReferenceNameAndIndex methods to SAMRecord that set both fields directly - Using these in the BAMRecord constructor instead of setReferenceIndex/ setMateReferenceIndex which round-trip through name resolution - Removing the redundant setHeader(header) call in BAMRecordCodec.decode() since the header is already set in the BAMRecord constructor Profiling showed reference resolution at ~8% of CPU samples during BAM reading. Benchmarking on a 12.3GB BAM (207M records) confirmed a ~7-8% wall-clock improvement (104.2s -> 96.2s).
…t validation. Two targeted optimizations to the BAM tag decoding path: 1. BinaryTagCodec.readNullTerminatedString: replace the double-pass approach (scan forward for null, reset, re-read) with a single-pass scan over the backing byte array, eliminating the intermediate byte[] allocation per string tag. 2. SAMBinaryTagAndValue/SAMBinaryTagAndUnsignedArrayValue: add package-private constructors that skip isAllowedAttributeValue() validation, used from BinaryTagCodec.readTags() where the value types are known to be valid from the BAM type codes. Profiling with eager decode showed tag decoding at 18.2% of CPU. Benchmarking on a 3.9GB BAM (52M records) showed a ~3% improvement in the eager-decode path (40.0s -> 38.9s) with no regression in the lazy path.
tfenne
commented
Mar 31, 2026
| header, referenceID, coordinate, readNameLength, mappingQuality, | ||
| bin, cigarLen, flags, readLen, mateReferenceID, mateCoordinate, insertSize, restOfRecord); | ||
|
|
||
| if (null != header) { |
Member
Author
There was a problem hiding this comment.
The header is passed to the BAMRecord constructor through the factory call 4 lines above - this does nothing other than cause a whole second round of refId/mateRefId resolution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixing some low-hanging fruit that gives about a 6-7% reduction in the time taken to read BAM files (in both lazy and eager modes).
The first commit is pulled over from #1763 and optimized decoding of the 4-bit bases in BAM into full bytes. If/when merged I'll prune it from the other PR.
The second commit eliminates a silly chain of calls where resolving the refId and mateRefId ended up running in circles resovling int->String->int 2-3 times each - this was probably the biggest win(!)
The final commit makes parsing the aux tags slightly more efficient; it's impact will likely scale with the number of String valued tags.