-
Notifications
You must be signed in to change notification settings - Fork 242
ReferenceSequenceFile is thread-safe for uncompressed FASTA but decidedly non-threadsafe for bgzipped FASTA files #1749
Copy link
Copy link
Open
Description
I have some code that makes thousands of repeated calls to ReferenceSequenceFile.getSubsequenceAt(). The code is run multi-threaded, and so there are definitely many threads calling into that at once.
When working on an uncompressed FASTA file this works just fine. However, when I switch to using a block-gzipped FASTA it blows up nearly instantly with exceptions like:
[2025/10/24 08:48:15 | SearchReference | Error] ArraySeq(List(), htsjdk.samtools.util.RuntimeIOException: java.util.zip.DataFormatException: invalid distance too far back
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:161)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:561)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:543)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:479)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:469)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:207)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:342)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:268)
at htsjdk.samtools.reference.BlockCompressedIndexedFastaSequenceFile.readFromPosition(BlockCompressedIndexedFastaSequenceFile.java:119)
at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.getSubsequenceAt(AbstractIndexedFastaSequenceFile.java:204)
at htsjdk.samtools.reference.BlockCompressedIndexedFastaSequenceFile.getSubsequenceAt(BlockCompressedIndexedFastaSequenceFile.java:47)
...
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.util.zip.DataFormatException: invalid distance too far back
at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
at java.base/java.util.zip.Inflater.inflate(Inflater.java:376)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
... 26 more
Dropping the thread pool down to a single thread results in no exceptions being thrown and successful (albeit slow) completion.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels