Skip to content

A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764

Open
tfenne wants to merge 3 commits intomasterfrom
tf_bam_optimization
Open

A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764
tfenne wants to merge 3 commits intomasterfrom
tf_bam_optimization

Conversation

@tfenne
Copy link
Copy Markdown
Member

@tfenne tfenne commented Mar 31, 2026

Description

Fixing some low-hanging fruit that gives about a 6-7% reduction in the time taken to read BAM files (in both lazy and eager modes).

The first commit is pulled over from #1763 and optimized decoding of the 4-bit bases in BAM into full bytes. If/when merged I'll prune it from the other PR.

The second commit eliminates a silly chain of calls where resolving the refId and mateRefId ended up running in circles resovling int->String->int 2-3 times each - this was probably the biggest win(!)

The final commit makes parsing the aux tags slightly more efficient; it's impact will likely scale with the number of String valued tags.

tfenne added 3 commits March 30, 2026 20:03
Replace the per-nibble method call chain in SAMUtils.compressedBasesToBytes
with a 512-byte pre-computed lookup table (NIBBLE_PAIR_LOOKUP) that decodes
two bases per iteration. Ported from htslib's code2base approach.

The previous implementation made 4 method calls per base pair, each involving
a try/catch and array lookup. The new version does a single table lookup per
compressed byte with no method calls, halving loop iterations.

Benchmarks show ~7% speedup on pure BAM reading (5.2M records, 776M bases).

Remove dead methods compressedBaseToByte, compressedBaseToByteLow, and
compressedBaseToByteHigh which are no longer called. Add tests for odd-length
sequences, single base, empty sequence, offset, and round-trip encoding.
BAMRecord construction was performing ~10 dictionary lookups per record
(index->name->index round-trips for both ref and mate, then again via
a redundant setHeader() call in BAMRecordCodec.decode()). This reduces
it to 2 lookups per record (one resolveNameFromIndex each for ref and
mate) by:

- Adding package-private setReferenceNameAndIndex/setMateReferenceNameAndIndex
  methods to SAMRecord that set both fields directly
- Using these in the BAMRecord constructor instead of setReferenceIndex/
  setMateReferenceIndex which round-trip through name resolution
- Removing the redundant setHeader(header) call in BAMRecordCodec.decode()
  since the header is already set in the BAMRecord constructor

Profiling showed reference resolution at ~8% of CPU samples during BAM
reading. Benchmarking on a 12.3GB BAM (207M records) confirmed a ~7-8%
wall-clock improvement (104.2s -> 96.2s).
…t validation.

Two targeted optimizations to the BAM tag decoding path:

1. BinaryTagCodec.readNullTerminatedString: replace the double-pass
   approach (scan forward for null, reset, re-read) with a single-pass
   scan over the backing byte array, eliminating the intermediate byte[]
   allocation per string tag.

2. SAMBinaryTagAndValue/SAMBinaryTagAndUnsignedArrayValue: add
   package-private constructors that skip isAllowedAttributeValue()
   validation, used from BinaryTagCodec.readTags() where the value
   types are known to be valid from the BAM type codes.

Profiling with eager decode showed tag decoding at 18.2% of CPU.
Benchmarking on a 3.9GB BAM (52M records) showed a ~3% improvement
in the eager-decode path (40.0s -> 38.9s) with no regression in the
lazy path.
@tfenne tfenne requested review from nh13 and yfarjoun March 31, 2026 12:01
header, referenceID, coordinate, readNameLength, mappingQuality,
bin, cigarLen, flags, readLen, mateReferenceID, mateCoordinate, insertSize, restOfRecord);

if (null != header) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header is passed to the BAMRecord constructor through the factory call 4 lines above - this does nothing other than cause a whole second round of refId/mateRefId resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant