Kb-python

Latest version: v0.28.2

Safety actively analyzes 628918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

0.25.0

`ref`
- Progress bar is now displayed when downloading pre-packaged reference files.
- Added checks to provide more useful outputs for common errors, including: 1) when FASTA and GTF chromosomes do not match, 2) when a GTF entry is not parsable, and 3) when either `transcript` or `exon` entry for a transcript is missing in the GTF (both are required).
- Added `-k` option to override default (or calculated optimal) kmer length for the Kallisto index.
- Added functionality to generate a feature barcode reference for use with the KITE feature-barcoding workflow. To use this option, supply `--workflow kite` and a feature-barcode to cell-barcode mapping.
- Added `-n` option to be able to split indices into `n` parts. This reduces the maximum memory used at any given time. Useful for running in memory-limited environments. When the `-n` option is used, the `-i` argument is used as the prefix to the `n` indices generated. Each of these indices are appended with a `.i` where `i` is the index number, starting from `i=0`. When `-n` is used the built indices must be passed in as a comma-delimited list to `kb count` (**NOTE: this feature is EXPERIMENTAL** See `count` for more details). When `-n` is used with `--workflow lamanno` or `--workflow nucleus`, only the intron FASTA is split into `n-1` parts, which are then each indexed separately. The cDNA FASTA is indexed in its entirety and is never split.
- Added functionality to build a single index using multiple references. Useful for mixed species experiments. The `fasta` argument should be a comma-delimited list of genome FASTAs, and the `gtf` argument should be a comma-delimited list of GTFs, corresponding in position to each genome FASTA.
- Added `--tmp` option to manually specify temporary directory. Otherwise, behavior is identical to previous version (`tmp` directory at the location `kb` is executed).
- Added support for IUPAC nucleotide code. Note that `kallisto` replaces non-ACGUT nucleotides to pseudorandom ones. Thanks Maarten-vd-Sande
`count`
- Added support for KITE feature-barcoding workflow. The `bustools` binary was updated to support this feature.
- **DEPRECATION**: The `--lamanno` and `--nucleus` flags will be deprecated in the next release. These have been replaced with `--workflow lamanno` and `--workflow nucleus`.
- All BUS files that are input/outputs are validated before/after running `kallisto` or `bustools`. A BUS file is considered valid if it is read with `bustools` without error and it has positive number of BUS records. This should prevent `bustools` from trying to sort empty BUS files and crashing (31).
- Added functionality to generate TCC matrices with the `--tcc` flag.
- Added `--tcc` flag to include reads that pseudoalign to multiple genes.
- When running in verbose mode (`--verbose`), commands are no longer printed with the full path to the `bustools` and `kallisto` binaries. These paths are printed once at the start of the program.
- Added `--dry-run` flag, which prints the entire workflow to standard output as shell commands, without actually running them.
- **EXPERIMENTAL**: Added support for multiple indices by passing a comma-delimited list of indices to `-i`. `kb` will align the reads to each of these indices and merge the BUS files with `bustools mash` and `bustools merge`. This feature is currently EXPERIMENTAL, and there are known issues that cause the loss of reads. This feature will be fully supported in a future release. In the meantime, use at your own risk!
- Added `--tmp` option to manually specify temporary directory. The default behavior has also changed: the default `tmp` directory is created IN THE OUTPUT FOLDER (specified by `-o`). Previously, the `tmp` directory was created where `kb` was run, which was causing issues when running multiple instances of `kb` from the same location. Thanks to Munfred and kokitsuyuzaki for the suggestion.
- `kb` now outputs a `kb_info.json` which includes useful run information, such as the commands run and their runtimes.
- Added functionality to generate a brief standalone HTML report that includes basic statistics (run_info.json, inspect.json) and quality-control plots (knee plot, elbow plot, pca, genes detected). This feature is available with the `--report` flag. Using this flag on velocity matrices may cause `kb` to crash due to high memory usage, and a corresponding warning is printed at the start. Plots for TCC matrices are not supported.
- When the matrix is converted to H5AD or Loom format (using the `--h5ad` or `--loom` options), the gene/feature names are included as a column in the `var` of the anndata. Related to 52
- Added a `--cellranger` option, which converts the raw gene matrices to cellranger-compatible format in a separate, `cellranger` directory for `standard` workflow (and `cellranger_spliced` and `cellranger_unspliced` for `velocity` and `nucleus` workflows). Note that cellranger outputs matrices with genes as rows and cells (barcodes) as columns.
- Added `--mm` flag to include bus records that pseudoalign to multiple genes, via the `--multimapping` flag in `bustools count` (57).
- `None` can be provided as the whitelist, which will force `kb` to use the `bustools whitelist` command, even if there exists a pre-packaged whitelist.
- Added support for Smart-seq reads with `-x smartseq`. FASTQs are paired by first sorting the list of FASTQ paths in lexicographical order, and taking every two to be a pair. For instance, if `1.fastq 3.fastq 2.fastq 4.fastq` is provided, `1.fastq` and `2.fastq` will be a pair, and `3.fastq and 4.fastq` will be another pair. The FASTQ argument now supports glob expressions to make it easier to provide a long list of FASTQs.

0.24.4

`--info`
- Fix typo with `indropsv3`

`ref`
- If any input (FASTA or GTF) files are provided as gzip files, they are uncompressed to the temporary directory, instead of being streamed directly. This is because `ref` relies on being able to access arbitrary locations of the files quickly. Working with decompressed files results in a considerable speedup.

`count`
- For `--lamanno`: spliced and unspliced busfiles no longer contain the `.s` suffix. This was done to make the output consistent with the normal (non`--lamanno`) command
- Implemented `--filter` with `--lamanno`
- Support for single nuclei RNA-seq with `--nuclei`. The only difference between `--nuclei` and `--lamanno` is how the spliced and unspliced matrices are combined. Specifically, `--nuclei` sums the matrices. Using `--nuclei` with neither `--loom` nor `--h5ad` results in behavior identical with `--lamanno`.

0.24.3

`kallisto`
- Update to `0.46.1`.

`--info`
- Updated information on indrop versions

0.24.2

`count`
- fix bug with `--filter` where it would produce the same matrix as unfiltered

0.24.1

`ref`
- `kb` now provides a pre-built human index for RNA velocity (`linnarsson`)
- The intronic fasta with the `--lamanno` option now includes 30-base flanking regions.

`count`
- Unfiltered count matrices will always be placed in the `counts_unfiltered` folder.
- If the `--filter` option is specified, the filtered count matrices will be placed in the `counts_filtered` folder.

0.24.0

Official release.

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.