TreeSAPP version 0.11.0 changes how users store and interact with reference package feature annotations.
These feature annotations are clade-specific labels that indicate some extra-taxonomic features that are characteristic of sequences in the reference package.
For example, in the particulate methane monooxygenase and ammonia monooxygenase subunit A reference package, XmoA,
the feature annotations indicate which paralog is represented by a clade (PmoA, AmoA, EmoA, etc.)
As another example, the methyl coenzyme M reductase subunit A (McrA) reference package contains feature annotations for
each pathway of methanogenesis that is used by the different clades.
We recommend updating to this version, and updating reference packages you have created.
Added
- A new attribute called 'feature_annotations' has been introduced to reference packages.
It can store what was previously saved to iTOL-compatible annotation files by `treesapp colour`.
- `treesapp package edit` accepts a taxonomy-phenotype mapping file to populate the feature_annotations attribute.
See [Wiki](https://github.com/hallamlab/TreeSAPP/wiki/Reference-package-operations) for details.
- `treesapp update` with automatically propagate feature annotations from the original reference package by mapping
the reference sequences through their unique descriptions (organism name and accession).
- `treesapp package view tree` will print a Newick tree with each leaf node's accession and description.
- `treesapp abundance` creates a simple_bar.txt file for each sample analyzed.
- Ability to automatically detect the sequence type based on the input provided.
- PQuery classification data is stored in each reference package in the 'training_df' attribute as a pandas.DataFrame.
- Improved query sequence filtering by phylogenetic placement information in `treesapp update`
- Now able to update a reference package's 'lineage_ids' attribute with `treesapp package edit`
- `treesapp create` is able to accept multiple fasta files through --fastx_input and concatenate them into the one
file used to build the reference package.
Fixed
- Segmentation fault from Prodigal is no longer possible as `treesapp assign` verifies input presence earlier.
- `treesapp purity` bug where the reference package path was not correctly passed to `treesapp assign` if in the same directory
- Calculation of tree coverage in `treesapp purity`
Changed
- Renamed the classification table made by `treesapp assign` (and used by subcommands like `layer`) 'classifications.tsv'.
- The reference package attribute 'refpkg_code' is automatically set and
does not need to be changed as it is guaranteed to be unique.
- The reference package disband path has been changed to just the reference package code.
- `treesapp colour` accesses and uses the 'feature_annotations' to write iTOL-compatible annotation files
(i.e. colour_strip.txt and colours_styles.txt). It no longer accepts taxonomy-phenotype tables.
- `treesapp layer` uses the 'feature_annotations' attribute in reference packages to annotate classified sequences.
- The versioned sequence accessions (or first split for unformatted sequence headers) are used in the
ReferencePackage lineage_ids attribute. This ensures unique sequence IDs and helps with iterative updates.