Merqury provides a suite of efficient methods for assessing the quality, completeness, and phasing of genome assemblies using a reference-free, k-mer-based approach. Merqury extends the spectra-cn functionality of KAT and introduces novel features such as spectra-asm, spectra-hap, and blob plots; assembly QV and completeness measurements, and the first reference-free approach for measuring assembly phase blocks using parental k-mers. The included Meryl k-mer counter also provides fast and flexible methods for k-mer set manipulation. Compared to traditional assembly metrics, such as N50 contig size, Merqury provides a much broader evaluation of assembly quality and we recommend reporting these metrics along with any new genome assembly.
Source:
Rhie, A., Walenz, B.P., Koren, S. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020). https://doi.org/10.1186/s13059-020-02134-9
repo: https://github.com/marbl/merqury
-
This step should be run after the assembly is "ready"
-
merqury version 1.3
-
slurm workload manager https://slurm.schedmd.com/documentation.html
- First step: meryl
The inputs to merqury are generated by meryl.
This tool calculates kmer distribution using the TellSeq reads.
One command per file.
Therefore it is necessary to merge the results of both reads files, R1 and R2.
- Second step: merqury
The second step is to run merqury to perform the genome assessment along several dimensions such as:
-
Copy-number spectrum analysis
-
kmer completeness spectrum analysis
-
genome continuity
-
genome correctness (track error bases in the assembly)
-
Outputs look like this:
EB31_merqury.completeness.stats EB31_Sealer_scaffold all 698449439 889173305 78.5504 Column 1: Assembly Column 2: k-mer set used for measuring completeness - all = read set (This gets expended with hap-mers later) Column 3: solid k-mers in the assembly Column 4: Total solid k-mers in the read set Column 5: Completeness (%)
EB31_merqury.qv EB31_Sealer_scaffold 609853 1326738725 46.385 2.29882e-05 Column 1: assembly of interest. Column 2: k-mers uniquely found only in the assembly Column 3: k-mers found in both assembly and the read set Column 4: QV Column 5: Error rate QV score Errors % accuracy QV10 1 in 10 90% QV20 1 in 100 99% QV30 1 in 1000 99.9% QV40 1 in 10,000 99.99% QV50 1 in 100,000 99.999% QV60 1 in 1,000,000 99.9999%