Skip to content

Latest commit

 

History

History
82 lines (51 loc) · 2.55 KB

File metadata and controls

82 lines (51 loc) · 2.55 KB

1. Background

Merqury provides a suite of efficient methods for assessing the quality, completeness, and phasing of genome assemblies using a reference-free, k-mer-based approach. Merqury extends the spectra-cn functionality of KAT and introduces novel features such as spectra-asm, spectra-hap, and blob plots; assembly QV and completeness measurements, and the first reference-free approach for measuring assembly phase blocks using parental k-mers. The included Meryl k-mer counter also provides fast and flexible methods for k-mer set manipulation. Compared to traditional assembly metrics, such as N50 contig size, Merqury provides a much broader evaluation of assembly quality and we recommend reporting these metrics along with any new genome assembly.

Source:

Rhie, A., Walenz, B.P., Koren, S. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020). https://doi.org/10.1186/s13059-020-02134-9

repo: https://github.com/marbl/merqury

2. Dependencies

3. Analysis

  • First step: meryl

The inputs to merqury are generated by meryl. This tool calculates kmer distribution using the TellSeq reads. One command per file.
Therefore it is necessary to merge the results of both reads files, R1 and R2.

  • Second step: merqury

The second step is to run merqury to perform the genome assessment along several dimensions such as:

  • Copy-number spectrum analysis

  • kmer completeness spectrum analysis

  • genome continuity

  • genome correctness (track error bases in the assembly)

  • Outputs look like this:

EB31_merqury.completeness.stats

EB31_Sealer_scaffold    all     698449439       889173305       78.5504

Column 1: Assembly
Column 2: k-mer set used for measuring completeness - all = read set (This gets expended with hap-mers later)
Column 3: solid k-mers in the assembly
Column 4: Total solid k-mers in the read set
Column 5: Completeness (%)

EB31_merqury.qv

EB31_Sealer_scaffold    609853  1326738725      46.385  2.29882e-05

Column 1: assembly of interest. 
Column 2: k-mers uniquely found only in the assembly
Column 3: k-mers found in both assembly and the read set
Column 4: QV
Column 5: Error rate

QV score  Errors        % accuracy
QV10     1 in 10         90%
QV20     1 in 100        99%
QV30     1 in 1000       99.9%
QV40     1 in 10,000     99.99%
QV50     1 in 100,000    99.999%
QV60     1 in 1,000,000  99.9999%