Gtdbtk

Latest version: v2.4.0

Safety actively analyzes 630094 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

2.1.0

Major changes:

* GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple **class**-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **55 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see [383](https://github.com/Ecogenomics/GTDBTk/issues/383)).
* Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the `gtdbtk.bac120.summary.tsv` as 'Unclassified'
* Genomes filtered out during the alignment step are now reported in the `gtdbtk.bac120.summary.tsv` or `gtdbtk.ar53.summary.tsv` as 'Unclassified Bacteria/Archaea'
* `--write_single_copy_genes` flag in now available in the `classify_wf` and `de_novo_wf` workflows.


Features:

* ([392](https://github.com/Ecogenomics/GTDBTk/issues/392)) `--write_single_copy_genes` flag available in workflows.
* ([387](https://github.com/Ecogenomics/GTDBTk/issues/392)) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1.
This version requires a [new reference package](https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_v2_data.tar.gz)

2.0.0

Major changes:
* GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **35 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag.
* Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by [Dombrowski et al., 2020](https://www.nature.com/articles/s41467-020-17408-w). This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy.
* By default, all directories containing intermediate results are **now removed** by default at the end of the `classify_wf` and `de_novo_wf` pipelines. If you wish to retain these intermediates files use the `--keep-intermediates` flag.
* All MSA files produced by the `align` step are now compressed with gzip.
* The classification summary and failed genomes files are now the only files linked in the root directory of `classify_wf`.


Features:
* `convert_to_itol` to convert trees into iTOL format (373)
* Output FASTA files are compressed by default (369)
* Intermediate files will be removed by default when using classify/de-novo workflows unless specified by `--keep_intermediates` (369)
* Add --genes flag for Error (362)
* A warning will be displayed if pplacer fails to place a genome (360 / 356)

**Important**
* This version is not backwards compatible with GTDB release 202.
* This version requires a [new reference package](https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_data.tar.gz)

1.7.0

* (336) Warn the user if they have provided an incorrectly formatted taxonomy file.
* (348) Gracefully exit the program if no single copy hits could be identified.
* (351) Fixed an issue where GTDB-Tk would crash if spaces were present in the reference data path.
* (354) Added optional --tmpdir argument to set temporary directory (thanks tr11-sanger ).

1.6.0

* (337) Set minimum tqdm version to `4.35.0`
* (335) Fixed typo in output log messages (fplaza)
* Removed the option to re-calculate RED values (–recalculate_red)

1.5.1

Changelog:
* 327 Disallow spaces in genome names/file paths due to downstream application issues.
* 326 Disallow genome names that are blank.

1.5.0

Changes:
* Updated to use PFAM 33.1 markers.
* Updated to use GTDB R202 taxonomy (note, this will require an update to the reference package https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data)

Fixes:
* Automatic drop of genome leads to error in downstream modules of classify_wf (312)
* --scratch_dir not working in v 1.4.1 (311)

Page 3 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.