Quast

Latest version: v5.2.0

Safety actively analyzes 621920 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

5.2.0

1. Core alignment changes (might affect all alignment-based metrics!):
- new version of minimap2 (2.24; pre-installed minimap2 in PATH accepted if >=2.19);
- default bandwidth for chaining and base alignment (minimap2's '-r' option) increased
from 85 to 200 (now it is controlled via QUAST's "--local-mis-size" option unless
"--large" is specified, in the latter case the default minimap2 value is used);
- 'asm10' preset is used instead of 'asm5' for "--min-identity" in range [95.0, 99.0).

2. New metrics:
- auN, auNG, auNA, auNGA (areas under the Nx/NGx/NAx/NGx curves; for more detail see
https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity or the manual).

3. New options:
- "--local-mis-size" for setting minimal local misassembly size (default is 200, was 86);
- "--report-all-metrics" for keeping the same content (list of metrics) in the main report
independently of inputs/options.

4. MetaQUAST change:
- preserving explicitly specified reference genomes in the reports (they were previously
hidden if nothing aligned to these references).

5. Major fixes:
- substantial speed up of the BSS algorithm (important for large genomes or MetaQUAST);
- preserving assembly names in reports (hyphens were changed to underscores before);
- "--no-gzip" option removed (it was never used in the pipeline); incorrectly set
.gz extensions also removed (used_snps and gff files are not compressed by QUAST);
- preserving explicitly specified reference genomes in metaQuast reports
(were hidden if nothing aligned to these references).

6. Minor fixes (rare crashes or slightly incorrect results in specific cases):
- total reads number (was incorrectly calculated in case of secondary, supplementary,
and duplicate reads);
- proper processing of filepaths with commas;
- check correctness of existing BED files and overwriting if corrupted.

7. Updates in embedded tools:
- updated Augustus (BUSCO dependency) to 3.3.3;
- new GeneMark license files.

8. Cosmetic changes in warning/notice/error messages.

5.1.0rc1

1. MetaQUAST changes:
- new option: "--reuse-combined-alignments" for reusing alignments against the
combined_reference in the subsequent runs_per_reference analysis stages;
- new default: "--min-identity" default value set to 90% for both combined_reference
and runs_per_reference stages. Compare with 95% default in regular QUAST;
- improved no-ref mode: download the best (less fragmented, more complete) available
assembly; search references with respect to strain and isolate; speed up downloading;
fixed some internet connection issues; do not limit number of reference fragments
when --ref-list is used.

2. New option:
- "--x-for-Nx" for reporting Nx, Lx, etc metrics for specific value of 'x' in
addition to N50, L50, etc. The default value is 90. The previous non-changeable
default was 75.

3. New way of calculating old metrics:
- Num mismatches/indels per 100 kbp (now computed with respect to the total number
of aligned bases in the _assembly_ rather than in the _reference_ as before, may be
important when Duplication ratio is way above 1);
- Do not report a misassembly/break between the first and last alignment block of
a contig if it covers more than 95% of a cyclic chromosome/plasmid (prokaryotes only).

4. Critical fix:
- Python 3.8+ support (cgi.escape replaced with html.escape)

5. Small fixes (rare crashes or slightly incorrect results in specific cases):
- GC calculation (zero division due to rate side effects);
- duplication ratio (fractional overestimation due to Ns stretches in some contigs);
- HTML report heatmap colors for partial BUSCO genes (red-blue color switch);
- use of provided BAM files (--bam option not working properly);
- postprocessing of minimap2 mediocre aligments (good pairs of alignments with a
strecth of mismatches/indels/Ns in between were skipped due to low averageg IDY).

6. Updates in embedded tools:
- new version of SILVA database (138.1);
- fixed links to BUSCO databases (v3/odb9);
- new GeneMark license files.

7. Cosmetic changes in warning/notice/error messages, pipeline steps order, etc.

5.0.2

1. Fixed bug with missing genome features reference stats and plot in report.html

2. Fixed bug related to newest versions of joblib (0.10 and higher).

3. Fixed bug with some rare crashes of reads_analyzer module.

4. Tiny fixes in error and warning messages.

5.0.1

1. Using 'asm20' minimap2 preset for references with high divergence from the
assembled organism (provided --min-identity is below 90%). As before, 'asm10' is
used for --min-identity below 95% and 'asm5' for 95% and above.

2. Fixed bug in using --split-scaffolds with MetaQUAST.

3. Fixed bug in parsing genome sequences of GeneMark predicted genes.

4. Fixed bug with crash of UpperBound creation when no paired-end reads are provided.

5. Now FASTA entry names are considered as a sequence before the first space in the
header line (">..."). Previously, the entire lines were considered.
This change
* shortens names of intermediate files (e.g. in k-mer-based metrics calculation)
* simplifies using of standard annotation files (provided with --features/-g)

6. Trying to use already installed minimap2, Glimmer, joblib, simplejson rather than
distributions from the QUAST package (important for external QUAST installers).

7. Improved documentation and error/warning/info messages.

8. GeneMark licence files are updated.

5.0.0

1. QUAST-LG mode is added ("--large") for evaluating large genomes!
Significant speed up on large genomes achieved by the switch to fast Minimap2
aligner and huge refactoring of the post processing bottlenecks in the QUAST code.
The more adequate output is due to (1) improved handling of transposable elements
(TEs) causing many false positive misassemblies in regular QUAST runs and (2) use
of proper thresholds on minimal alignment, contig length, and extensive misassembly
sizes.

2. New module: upper bound assembly ("--upper-bound-assembly").
We determine which part of the reference genome could be potentially reconstructed
using a given set of reads. The algorithm takes into account zero covered regions
and genomic repeats (identified with Red repeat finder). The constructed assembly
is added to the evaluation to demonstrate the theoretical limits on the assembly
completeness and contiguity quality metrics for the given genome and set of reads.

3. New module: k-mer-based statistics ("--k-mer-stats").
We identify unique k-mers in the reference genome (using KMC tool) and track
their presence and relative location in assemblies. The percentage of the assembled
k-mers is a novel completeness measure and the number of large inconsistencies
(translocations or relocation with > 100 kbp difference in reference and assembly
positions) is a novel correctness measure. By default, k is 101 bp and it can be
specified with "--k-mer-size" option.

4. Improved and extended gene prediction/annotation functionality:
- Barrnap for rRNA genes prediction ("--rna-finding") is added;
- BUSCO for finding conserved single-copy orthologs ("--conserved-genes-finding";
Linux only) is added;
- regular predicted genes (using GeneMark or Glimmer) are split into full and partial;
- "--fungus" option is added for more accurate processing of fungus assemblies using
GeneMark-ES and BUSCO;
- "--features" option is added to replace "-G/--genes", it allows to count all genomic
features from GFF or any specific feature type (e.g., 'CDS').

5. Icarus updates:
- changes in alignment viewers:
* GC% track is added to the read coverage pane;
* a button for highlighting all assembly misassemblies is added;
* local misassemblies are now unchecked (hidden) by default.
- static Circos plot of alignments ("--circos") is added;
- chromosome names in the main menu are sorted in the human-friendly order now
(e.g., chr1, chr2, ..., chr10 instead of chr1, chr10, chr2, ...).

6. Improved reads support:
- reads are now mapped to all assemblies and various alignment stats are reported;
- single ("--single") and interlaced ("--12") reads are supported;
- multiple read libraries are supported, including both paired-end ("--pe1/2/12")
and mate-pair ("--mp1/2/12") libraries;
- Oxford Nanopores ("--nanopore") and PacBio SMRT ("--pacbio") are supported;
- ready SAM and BAM files can be provided both for reads mapped against assemblies
("--sam/bam") and reads mapped against the reference genome ("--ref-sam/bam");
- reads stats can still be skipped by using "--no-read-stats" option.

7. Modified processing of undefined nucleotides ('N'):
- reference Ns are excluded from Genome Fraction computation (100% if all ACGT bases
are covered);
- assembly Ns are excluded from "Unaligned" and "partially unaligned length"
computation;
- scaffold gaps are now defined as simply a gap between alignments having at least 10
consecutive Ns (affects " scaffold gap size mis.", previously it was underestimated
due to a strict threshold on the percentage of Ns in the gap sequence).

8. MetaQUAST changes:
- trying to download next best match if a reference genome is not found in NCBI
(without references mode only);
- link to the combined reference report is added to the main report HTML;
- sample summary reports (TXT, TEX, etc) are renamed to exclude special characters in
the filenames ('', '%', etc).

9. Changes and new metrics related to scaffold gap size misassemblies:
- local scaffold gap misassemblies are added (local misassemblies caused by incorrect
estimation of scaffold gap sizes);
- contig and scaffold misassemblies are separated in the detailed misassemblies report
(these scaffold misassemblies contain incorrectly estimated scaffold gap sizes
exceeding scaffold-gap-max-size threshold or they are inversions/translocations caused
by incorrect scaffolding).

10. New and renamed options:
- "--scaffolds" is renamed to "--split-scaffolds";
- "--skip-unaligned-mis-contigs" is added to treat significantly unaligned (>50%) contigs
with misassemblies as normal contigs (i.e. count their number of misassemblies in the
misassembly-related metrics).

11. Changes in the list of embedded third-party tools:
- removed: GAGE, gnuplot;
- replaced: MUMmer and E-MEM (new: Minimap2), Manta (new: GRIDSS);
- added: BUSCO, Barrnap, KMC, Red.

12. Fixed several minor bugs.

4.6.3

1. Fixed crash of quast.py --test (introduced in v4.6.2).

2. Fixed crash of BSS in MetaQUAST mode (introduced in v4.6.2).

3. Proper float/integer division in both Python2 and Python3 (may affect
the number of scaffold gap size misassemblies in Python2).

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.