Skip to content

Fast way to get number of aligned reads in indexed CRAM file. #1752

@dariober

Description

@dariober

With this method I can quickly get the total number of aligned reads in an indexed bam file by querying the index.

However, with CRAM input, it returns 0. Is there a way to quickly get the total number of reads in a CRAM without iterating through each record?

This is with htsjdk 4.3.0

  public static long getAlignedReadCount(String bam) throws IOException {
    SamReaderFactory srf = SamReaderFactory.make();
    srf.validationStringency(ValidationStringency.SILENT);
    SamReader samReader;
    samReader = srf.open(new File(bam));
    
    List<SAMSequenceRecord> sequences =
        samReader.getFileHeader().getSequenceDictionary().getSequences();
    long alnCount = 0;
    for (SAMSequenceRecord x : sequences) {
      alnCount +=
          samReader.indexing().getIndex().getMetaData(x.getSequenceIndex()).getAlignedRecordCount();
    }
    samReader.close();
    return alnCount;
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions