A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance by tfenne · Pull Request #1764 · samtools/htsjdk

tfenne · 2026-03-31T12:01:49Z

Description

Fixing some low-hanging fruit that gives about a 6-7% reduction in the time taken to read BAM files (in both lazy and eager modes).

The first commit is pulled over from #1763 and optimized decoding of the 4-bit bases in BAM into full bytes. If/when merged I'll prune it from the other PR.

The second commit eliminates a silly chain of calls where resolving the refId and mateRefId ended up running in circles resovling int->String->int 2-3 times each - this was probably the biggest win(!)

The final commit makes parsing the aux tags slightly more efficient; it's impact will likely scale with the number of String valued tags.

Replace the per-nibble method call chain in SAMUtils.compressedBasesToBytes with a 512-byte pre-computed lookup table (NIBBLE_PAIR_LOOKUP) that decodes two bases per iteration. Ported from htslib's code2base approach. The previous implementation made 4 method calls per base pair, each involving a try/catch and array lookup. The new version does a single table lookup per compressed byte with no method calls, halving loop iterations. Benchmarks show ~7% speedup on pure BAM reading (5.2M records, 776M bases). Remove dead methods compressedBaseToByte, compressedBaseToByteLow, and compressedBaseToByteHigh which are no longer called. Add tests for odd-length sequences, single base, empty sequence, offset, and round-trip encoding.

BAMRecord construction was performing ~10 dictionary lookups per record (index->name->index round-trips for both ref and mate, then again via a redundant setHeader() call in BAMRecordCodec.decode()). This reduces it to 2 lookups per record (one resolveNameFromIndex each for ref and mate) by: - Adding package-private setReferenceNameAndIndex/setMateReferenceNameAndIndex methods to SAMRecord that set both fields directly - Using these in the BAMRecord constructor instead of setReferenceIndex/ setMateReferenceIndex which round-trip through name resolution - Removing the redundant setHeader(header) call in BAMRecordCodec.decode() since the header is already set in the BAMRecord constructor Profiling showed reference resolution at ~8% of CPU samples during BAM reading. Benchmarking on a 12.3GB BAM (207M records) confirmed a ~7-8% wall-clock improvement (104.2s -> 96.2s).

…t validation. Two targeted optimizations to the BAM tag decoding path: 1. BinaryTagCodec.readNullTerminatedString: replace the double-pass approach (scan forward for null, reset, re-read) with a single-pass scan over the backing byte array, eliminating the intermediate byte[] allocation per string tag. 2. SAMBinaryTagAndValue/SAMBinaryTagAndUnsignedArrayValue: add package-private constructors that skip isAllowedAttributeValue() validation, used from BinaryTagCodec.readTags() where the value types are known to be valid from the BAM type codes. Profiling with eager decode showed tag decoding at 18.2% of CPU. Benchmarking on a 3.9GB BAM (52M records) showed a ~3% improvement in the eager-decode path (40.0s -> 38.9s) with no regression in the lazy path.

tfenne · 2026-03-31T12:03:22Z

src/main/java/htsjdk/samtools/BAMRecordCodec.java

                header, referenceID, coordinate, readNameLength, mappingQuality,
                bin, cigarLen, flags, readLen, mateReferenceID, mateCoordinate, insertSize, restOfRecord);

-        if (null != header) {


The header is passed to the BAMRecord constructor through the factory call 4 lines above - this does nothing other than cause a whole second round of refId/mateRefId resolution.

tfenne added 3 commits March 30, 2026 20:03

tfenne requested review from nh13 and yfarjoun March 31, 2026 12:01

tfenne commented Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764

A few targeted optimizations to the BAM decoding path that yield 6-7% improvement in performance#1764
tfenne wants to merge 3 commits intomasterfrom
tf_bam_optimization

tfenne commented Mar 31, 2026

Uh oh!

tfenne Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tfenne commented Mar 31, 2026

Description

Uh oh!

tfenne Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant