Skip to content

Releases: mcwdsi/bam2tensor

v2.3

25 Mar 19:03

Choose a tag to compare

What's New

bam2tensor-inspect command

New CLI tool to inspect .methylation.npz output files without writing Python:

$ bam2tensor-inspect sample.methylation.npz
sample.methylation.npz
  Genome:          hg38
  Chromosomes:     24 (chr1, chr2, ... chrX, chrY)
  Reads:           1,423,891
  CpG sites:       28,217,448
  Data points:     12,847,322 (sparsity: 99.97%)
  CpG index CRC32: a1b2c3d4
  bam2tensor:      v2.3
  File size:       14.2 MB

Accepts multiple files and works on outputs from older versions (metadata fields are omitted gracefully).

Embedded provenance metadata in .npz files

Each output file now contains a metadata.json entry inside the ZIP archive with:

  • bam2tensor_version — version that produced the file
  • genome_name — reference genome identifier (e.g., hg38)
  • expected_chromosomes — chromosome list defining the column mapping
  • total_cpg_sites — number of CpG columns
  • cpg_index_crc32 — CRC32 checksum of CpG positions (two files with the same CRC32 have identical column semantics and can be directly stacked/compared)

scipy.sparse.load_npz ignores this entry, so existing code is unaffected. Read metadata via bam2tensor.metadata.read_npz_metadata() or unzip -p file.npz metadata.json.

Improved output format documentation

The README now explicitly documents that column indices are determined by the reference genome's CpG sites and that GenomeMethylationEmbedding is needed to map columns back to genomic coordinates.

v2.2

17 Mar 14:54

Choose a tag to compare

Changes

  • No changes

v2.1

16 Mar 03:37

Choose a tag to compare

Changes

  • No changes

v2.0

16 Mar 01:26

Choose a tag to compare

bam2tensor v2.0 — Production/Stable

Promoted from Beta to Production/Stable.

New Features

  • Bismark aligner support via XM methylation tag (Z/z for methylated/unmethylated CpG)
  • --download-reference flag to auto-download and cache reference genomes (hg38, hg19, mm10, T2T-CHM13)
  • --output-dir flag to write output files to a separate directory
  • Chromosome name mismatch detection with actionable error messages (e.g. UCSC chr1 vs Ensembl 1)
  • Aligner auto-detection displayed in CLI output (Bismark, Biscuit/bwameth, gem3/Blueprint)
  • bwameth and EM-seq documentation and support (same YD tag format as Biscuit)

Improvements

  • Structured CLI output with configuration summary, per-BAM progress, and final summary
  • Optimized BAM read filtering using single bitwise flag check
  • Elapsed time formatted as minutes + seconds for long runs
  • Migrated from Poetry to uv for dependency management
  • Test coverage improved to 98% (reference.py: 53%→100%, functions.py: 88%→96%, main.py: 89%→99%)
  • Tests directory excluded from coverage metrics (only production code measured)

Bug Fixes

  • Fixed typeguard failure: pass str not PosixPath to input_bam
  • Fixed read duplication bug where reads spanning multiple CpG clusters could be counted twice

v1.5

07 Jan 17:58

Choose a tag to compare

Changes

📦 Dependencies

  • Bump the poetry-dependencies group with 16 updates (#67) @dependabot[bot]
  • Bump the github-action-dependencies group with 4 updates (#66) @dependabot[bot]
  • Bump nox from 2025.10.16 to 2025.11.12 in /.github/workflows in the workflows-dependencies group (#65) @dependabot[bot]

v1.4

04 Dec 18:52

Choose a tag to compare

Changes

📦 Dependencies

v1.3

16 Feb 18:28

Choose a tag to compare

Changes

  • No changes

v1.2

16 Feb 17:32

Choose a tag to compare

Changes

📦 Dependencies

  • Bump the poetry-dependencies group with 12 updates (#5) @dependabot

v1.1

25 Jan 17:32

Choose a tag to compare

Changes

  • No changes

v1.0.1

20 Jan 01:55

Choose a tag to compare

Changes

  • No changes