Cnvkit

Latest version: v0.9.11

Safety actively analyzes 621825 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 7

0.9.10

This long-awaited release includes major plotting enhancements in the `heatmap`, `scatter`, and `diagram` commands, as well as a new `export gistic` command, thanks to joint work by tetedange13 and tskir (see below).

There are also significant infrastructure improvements including bug fixes, modernized packaging, and build/test automation.

New features
------------

`diagram`:

- New options `--no-gene-labels` to not display gene labels on the plot, and `-c` / `--chromosome` to plot a single chromosome (628, 629, 634; thanks tetedange13)

`heatmap`:

New CLI options (35, 625, 632, 652; thanks tetedange13 and tskir):

- `--vertical`: Transpose the plot, displaying the genome axis vertically instead of horizontally
- `--delimit-samples`: Add an delimitation line between each sample row (or column, with `--vertical`)
- `--title`: Set the plot title

`scatter`:

- New option `--fig-size`: Set the output image dimensions (600, 641; thanks tetedange13 and tskir)
- Show triangles at the bottom of the plot to indicate where segments are hidden below the plotted region by automatic pruning at 'ymin=-5'. Also log a warning when this happens. (385, 643, 645; thanks tetedange13, tskir, and micknudsen)

`export gistic`:

- New export command to generate an unsegmented "markers" file for use with GISTIC. GISTIC also takes a second input file with corresponding segments in SEG format, which CNVkit can generate with `export seg`. (622, 623, 776; thanks tetedange13, tskir, BioComSoftware)

API and CLI changes
-------------------

- Running `cnvkit.py` without any arguments will now display the full help text instead of an error message.
- Supporting scripts (aside from `cnvkit.py`) are no longer installed automatically. They are still available in the source tree.

Documentation
-------------

- Clarified `bintest` usage, provided an example, and explained outputs. (646; thanks tetedange13 and tskir)

Bugfixes
--------

- Fixed several errors and warnings due to outdated usage of dependencies, e.g. pandas, pysam.
- Fixed the Dockerfile and Docker image to install R packages properly for CNVkit to use internally. (765; thanks 28rietd)
- Made the Makefile example/test workflow more portable across environments. (661, 666, 695, 699; thanks tetedange13)
- `batch`: Apply --drop-low-coverage option in the segmetrics step. (694)
- `bintest`: Include 'probes' column in .cns output so that it is valid .cns (closes 693)
- `fix`: Condense the error message when coordinate set contains duplicate values. (637, 638; thanks tskir)
- `fix`: Choose a smoothing window fraction based on the data size to help correct biases better at the extremes of the GC range, where previously some residual GC bias could still be present after correction. (379)
- BED inputs: Handle UCSC BED 'browser' header line, as used in Agilent BED files with a 2-line header. (closes 696, 618)

Internal
--------

- Modernized the packaging configuration with pyproject.toml, leaving a stub setup.py for legacy setuptools compatibility. (790)
- Set up automated testing through GitHub Actions (GHA) to verify Python versions 3.7 through 3.10 using pytest and tox. The latter make local testing with multiple Python versions more reliable, too. (792, 793, 794)
- Updated minimum dependency versions to roughly match Ubuntu 22.04 LTS packages; these are used in CI, too.
- Applied black and pylint to reformat the codebase consistently and replace deprecated calls to libraries. (795)
- Remove joblib pinning (589, 770; thanks DavidCain and risicle)
- Remove networkx pinning (606, 771; thanks DavidCain)
- Make the extreme-GC filters more easily configurable via `params.py` (738, 752, 753, 764; thanks tetedange13 and tsivaarumugam)

0.9.9

This release contains a new script and, more importantly, a volley of bug fixes by tskir, a new CNVkit collaborator.

New script

`genome_instability_index.py`

- For each given sample (.cnr or .cns, ideally .call.cns), this script reports two values, the number of non-neutral segments and the fraction of the total sequencing-accessible genome that they cover. Together, these values have been described as the Genome Instability Index (G2I) by [Bonnet et al. (2012)](https://doi.org/10.1186/1755-8794-5-54). These numbers are not difficult to calculate directly from .cns files, but they are frequently requested, so here you go.

Bug fixes by tskir

Installation:
- Set NetworkX minimum version to work with pomegranate on Python 3.9. (614, 606; thanks auberginekenobi)

`genemetrics`, `diagram`, `scatter`:

- Fix an error in iterating over chromosomes during gene-wise operations or gene selection. (580, 573, 576, 579; thanks diushiguzhi eriktoo hrkemp drmrgd HYan-lei)

`access`:

- Fix an error when all chromosomes listed in the exclusion BED file appear only once. (581, 574; thanks dajana17)

`autobin`:

- Allow specifying explicit output filenames via -o/--output. If this option is not used, the behavior is the same as before. Some pipeline frameworks such as Snakemake require output filenames to be explicit in wrapped commands. (608, 607; thanks enes-ak)
- Fix median-size file selection. (613, 611; thanks michaelsykes)

`coverage`:

- Fix a potential crash with the -c option; generally make the -c option's results more stable. This changes the results you'd get with `coverage -c` compared to previous CNVkit versions, but in any case -c isn't recommended
for production use, only for algorithm exploration. (598, 593; thanks joys8998)

`genemetrics`:

- Rename column `n_bins` to `probes` in output, for compatibility with 'call' and 'export' commands. (586, 585; thanks eriktoo)

`scatter`:

- Avoid losing short segments in rasterized PNG output, depending on DPI settings. (615, 604; thanks jimmy200340)
- Allow NCBI-style chromosome names that contain a ".", e.g. "NC_039902.1". (603, 602; thanks amora197)

`segment`:

- Fix an IndexError during smoothing when the signal is shorter than a window, e.g. on chrY where the chromosome contains few bins. (590, 587; thanks tetedange13)

Improvements from other contributors

- scripts/guess_baits.py: Fix a copy-paste error on script launch. (588; thanks sssimonyang)
- Documentation: Link to the Debian package alongside other packages. (562; thanks mr-c)

0.9.8

Continuing a focus on stability and compatibility with other software:

* Support for reading CRAM files with an optional user-provided local FASTA
file for the reference genome sequence. (555; thanks johnegarza)
* Call Rscript subprocess with safer flags for the R environment. Previously,
`--vanilla` ignored R environments with the library path in a non-default
location specified in the user's .Rprofile. Now, `--no-restore` and
`--no-environ` ensure a clean environment but still respect the user's
.Rprofile settings beyond that. (491; thanks pablo-gar)
* Compatibility with the latest release of pandas. (502, 523)

This release also fixes some regressions reported since the release of CNVkit
0.9.7 (which introduced a number of new performance optimizations).

* `scatter`: A bug when plotting a region of a chromosome. (536, 457; thanks tskir)
* `scatter`: An IndexError when plotting entire chromosomes, e.g. chr7. (541,
461, 535; thanks tskir)
* `fix`: A bug that occurred after automatic bias corrections, introducing
NaN-valued rows in placed of rejected bins, leading to a downstream crash in
CBS segmentation. (551, 436, 547; thanks johnegarza)

0.9.7

Stable release with only minor changes from the previous beta release 0.9.7.b1.

New contributions:

- Cram support: Look for and use .cram + .crai alignment and index file pairs, in addition to .bam + .bai. (495, 434; thanks sridhar0605)
- Update Docker file to use Python 3 apt packages and pip3 (493; thanks keiranmraine)
- Documentation fix (496; thanks rollf)

0.9.7.b1

This release contains several major enhancements particularly relevant to germline analysis. If used in production pipelines, further evaluation and benchmarking would be wise. Highlights:

**Control sample clustering**: To make better use of larger reference sample pools, `reference --cluster` will correlate the given normal samples' bin-wise coverage depths to extract clusters to be used as reference profiles. The reference .cnn file produced this way will then contain the `log2` and `spread` summary statistics for each cluster, in addition to the global summary stats. Given this "clustered reference" profile, `fix --cluster` will then correlate each test sample to each clustered `log2` profile in the reference to choose the most relevant control pool for normalization. The `batch` option `--cluster` will perform both these steps. Nod to Gambin lab and the authors of ExomeDepth, CoNVaDING, CLAMMS, and others for inspiration. (308)

Calculation of bin weights has changed. **This will change your segmentation results**, hopefully for the better. Details below. (429)

The `batch` pipeline now performs some **segmentation post-processing** automatically: calculating and filtering segmentation calls by 50% confidence intervals of the segment mean log2 ratios, in order to reduce false positives, followed by separate bin-level testing to detect small (e.g. exon-size) CNVs that were not caught by segmentation. The bin- and segment-level results are returned as separate .cns files; deciding whether and how to combine or use these results together is left as an exercise for the user.

We've **dropped Python 2.7 support**. Python version 3.5 or later is now required.

This is a beta release. Please let me know how it works for you via the Issues page. If this release contains any issues that are blocking your work, try installing one of the previous stable versions 0.9.6 or 0.9.5::

conda install cnvkit=0.9.6

Dependencies
------------

- Remove all Python 2.7 compatibility shims.
- Raise minimum pandas version from 0.20.1 to 0.23.3.
- Add scikit-learn (dependency of pomegranate, for HMM segmentation). Remove the older hmmlearn implementation.

Commands
--------

`batch`:

- Post-process segments with `segmetrics` (50% CI), `call` (filter by CI, but don't call integer copy number), and `bintest`.
- Return `bintest` result as a separate, independent .cns output.
- Add option '--segment-method', equivalent to `segment -m`.
- Rename option '--method' to '--seq-method' (but '--method' still accepted for now).
- Add option `--cluster`, passed to `reference` and `fix` if given. (308)

`bintest`:

- New command superseding `cnv_ztest.py` script.
- Report p-value as a column `p_bintest` (previously `ztest`) in the .cns output.
- Fix probabilities for positive log2 values, i.e. gains, which previously always had p-value = 1.0. (429)

`fix`:

- Change calculation of bin weights to be more consistent with `1-var` meaning, with more emphasis on reference spread. It is now simpler, more consistent with `import-rna`, and particularly improves the accuracy of `bintest`. (429)
- Squeeze the range of reference-free weights
- Drop bins with gc outside [.3, .7]. CLAMMS paper shows these bins carry no useful signal.
- With `--cluster` and a clustered reference input, calculate the test sample's Pearson correlation versus each cluster's log2, and take the best one for normalization.

`reference`:

- With `--cluster`, do k-means clustering of the sample bin-level read depth correlation matrix, per [Kusmirek et al. 2018](https://doi.org/10.1101/478313). Parameter k defaults to the cube root of number of samples. Only clusters of at least 4 samples are kept for emitting summary statistics in the reference profile.

`segment`:

- hmm: Fix pomegranate-based implementation. Use iterative Savitzky-Golay smoothing with a narrow bandwidth.
- Use HMM for post-TCN segmentation on VCF allele freqs
- Add parameter for smoothing before CBS (thanks EwaMarek)

`segmetrics`:

- Add 'ttest' option for 1-sample t-test p-value.
- Implement & expose --smooth-bootstrap option. For smoothing, KDE bandwidth is based on each bin's weight as a proxy for the SD of its log2 ratio values. To reduce the risk of over-smoothing on larger sample sizes, we use a loose interpretation of Silverman's Rule to reduce the bandwidth as the number of bins in a segment increases (k^-1/4).

API
---

- `do_heatmap`: Add 'ax' parameter (thanks fbrundu)
- `CNA.residuals()`: speed; keep index intact in returned pd.Series
- smoothing: Linearly roll-off weights in mirrored wings. Affects CNA.smoothed() / savgol, but not rolling median bias correction.
- Rename `CNA.smoothed()` to `CNA.smooth_log2()`, since it returns the smoothed log2 values, not a new/altered CNA.

Bug fixes
---------

- `batch`: Fix argparse formatting issue (466)
- `import-rna`: Fix a regression in reading 2-column per-gene counts (`-f counts`).
- `reference`: Fix sex inference/usage when creating haploid-x reference (459; thanks duartemolha)
- `scatter`: Use a safe matplotlib backend on OS X to avoid crash
- VariantArray: Fix/streamline indexing of variants by bin/segment

0.9.6

Essential maintenance and bug fixes, for the most part. Some key dependencies have changed, though this should be generally painless for you, and one or two regressions introduced by recent optimizations have been fixed.

This will be the last CNVkit version to run on Python 2.7. The next major release of pandas (0.25.0) will remove support for Python 2.7, and once that happens it will become increasingly difficult to install future versions of CNVkit on Python 2.7 -- so we're not going to try.

The segmentation method `flasso` depends on the R package `cghFLasso`, which is unmaintained and has been removed from CRAN. For now, `segment -m flasso` is still supported if you already have `cghFLasso` installed. But given the above, `flasso` will be removed from the next CNVkit version in favor of the HMM-based methods.

Dependencies
------------

- Raised minimum pandas version from 0.18.1 to 0.20.1, and support up to 0.24.2, resolving some warnings and an error in pandas 0.22+. (413; thanks chapmanb)
- The soft dependency on `hmmlearn` is replaced with an explicit dependency on `pomegranate` for the HMM-based segmentation methods. This dependency will now be pulled in automatically when installing via `pip` or `conda`.
- The R package `cghFLasso` has been removed from CRAN, and therefore is no longer a dependency of CNVkit and will not be installed automatically through the standard `conda` installation method. (419)

Commands
--------

`antitarget`:

- Be more specific in removing noncanonical chromosomes (e.g. alternate contigs, mitochondria) from the binned regions. This avoids skipping chromosomes of interest in some non-human genomes with non-numeric contig names, like yeast. (388; credit for regexes to brentp)

`coverage`:

- With `--count-reads`, use query aligned length to handle soft-clipped reads properly. Now the results with and without this option should be similar. (411; thanks desnar)

`segment`:

- For `-m flasso`, partition array by chromosome to avoid edge effects. (409, 412; thanks giladmishne)
- Removed the deprecated option `--rlibpath`; use `--rscript-path` instead.
- HMM implementations have changed, and results may be different now. Note that the HMM methods are still provisional. A stable, supported version of these methods will be provided in the next CNVkit release.

Python API
----------

- `do_scatter` now returns a figure (408; thanks jeremy9959)

Bug fixes
---------

- `scatter`: Whole chromosomes can once again be specified with `-c`. (In the previous release, a chromosome without coordinates would cause an IndexError.) (393)
- `import-rna`: Option --max-log2 can now be specified by users. (Previously, only the default value of +3.0 worked.)
- VCF I/O (`skgenome.tabio`): Support GATK 4's VCF files that contain records with empty ALT alleles, substituting zero if ALT AD is missing. (391; thanks chapmanb)
- Due to a certain versioning-dependent interaction between numpy, pandas, cython, and conda (details [here](https://github.com/numpy/numpy/pull/432)), CNVkit may have printed spurious RuntimeWarning messages which could be safely ignored. The current release attempts to silence these messages if they occur. (#390).

Page 1 of 7

Releases

Has known vulnerabilities

Cnvkit

Page 1 of 7

0.9.10

0.9.9

0.9.8

0.9.7

0.9.7.b1

0.9.6

Page 1 of 7

Links

Releases