Chewbbaca

Latest version: v3.3.5

Safety actively analyzes 619456 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

3.3.5

- Added function to check if input files passed to the CreateSchema and AlleleCall modules have unique prefixes longer than 30 characters (the prefix includes everything in the basename before the first `.`). The process prints a message with the list of input files with a prefix longer than 30 characters and exits.

- Fixed issue in the AlleleCall module when running in mode 1 (trying to write the file with the list of invalid CDSs, but the data is not available when running in mode 1).

- Added more tests and improved test scripts.

- Simplified the help message for all modules.

3.3.4

- Improved BLAST exception capturing.

- CreateSchema and Allelecall exit if input files include blank spaces in the filename.

- Removed global variable that could lead to issues during multiprocessing.

3.3.3

- Fixed warning related with BLASTp `--seqidlist` parameter. For BLAST>=2.10, the TXT file with the sequence IDs is converted to binary format with `blastdb_aliastool`.

- The `Bio.Application` modules are deprecated and might be removed from future Biopython versions. Modified the function that calls MAFFT so that it uses the subprocess module instead of `Bio.Align.Applications.MafftCommandline`. Changed the Biopython version requirement to >=1.79.

- Added a `pyproject.toml` configuration file and simplified the instructions in `setup.py`. The use of `setup.py` as a command line tool is deprecated and the `pyproject.toml` configuration file allows to install and build packages through the recommended method.

- Updated the Dockerfile to install chewBBACA with `python3 -m pip install .` instead of the deprecated `python setup.py install` command.

- Removed FASTA header integer conversion before running BLASTp. This was done to avoid a warning from BLAST related to sequence header length exceeding 50 characters.

- The seqids and coordinates of the CDSs closest to contig tips are stored in a dictionary during gene prediction to simplify LOTSC and PLOT5/3 determination (in many cases this reduces runtime by ~20%).

- Limited the number of values stored in memory while creating the `results_contigsInfo.tsv` and `results_alleles.tsv` output files to reduce memory usage.

- Adding data to the FASTA and TSV files for the missing classes per locus instead of storing the complete per input data to reduce memory usage.

- The data for novel alleles is saved to files to reduce memory usage.

- Fixed the in-frame stop codon count values displayed in the reports created by the SchemaEvaluator module.

- The `UniprotFinder` module now exits cleanly if the output directory already exists.

- Improved info printed to the stdout by the CreateSchema and AlleleCall modules, added comments, and changed variable names to better match data being stored.

3.3.2

- Changed FASTA file validation to reduce memory usage.

- Removed legacy schema conversion. Users should use the `PrepExternalSchema` module to adapt schemas created with chewBBACA<=2.1.0.

- Added prints about output files created by the `PrepExternalSchema` module.

3.3.1

- Fixed issue leading to errors during allele calling if it was running in default mode (4) and all CDSs were classified before representative determination.

- Fixed schema name assignment in the DownloadSchema module.

- Fixed bug related to gene prediction parallelization when running Pyrodigal in meta mode. Processes were hanging if `multiprocessing.pool.Pool` was used. Using `multiprocessing.pool.ThreadPool` fixes the issue. The solution was described in an [issue](https://github.com/althonos/pyrodigal/issues/46) in Pyrodigals' repository.

3.3.0

- Added the AlleleCallEvaluator module. This module generates an interactive HTML report for the allele calling results. The report provides summary statistics to evaluate results per sample and per locus (with the possibility to provide a TSV file with loci annotations to include on a table). The report includes components to display a heatmap representing the loci presence-absence matrix, a heatmap representing the distance matrix based on allelic differences and a Neighbor-Joining tree based on the MSA of the core genome loci.

- Added [pyrodigal](https://github.com/althonos/pyrodigal) for gene prediction. This simplified the processing of the gene prediction results and reduced runtime.

- Fixed an issue where the AlleleCall module would try to create results files for excluded inputs.

- Fixed exception capturing during multiprocessing when using Python>=3.11.

- Fixed PLOT5/3 identification when coding sequences are in the reverse strand.

- Fixed computation of the representative self-scores when performing allele calling for a subset of the loci in a schema (would only compute the self-scores for the subset of loci if the 'self_scores' file had still not been created).

- Fixed issue related to the classification of single EXC/INF and single/multiple ASM/ALM (would classify some inputs as NIPH instead of EXC/INF).

- Fixed issue related to protein exact match classification when multiple pre-computed PROTEINtable files include the same protein hash.

- Changed the `-i`, `--input-files` parameter in the PrepExternalSchema and UniprotFinder modules to `-g`, `--schema-directory` and added the `--gl`, `--genes-list` parameter to enable adapting or annotating a subset of the loci in the schema.

Page 1 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.