Cnvkit

Latest version: v0.9.11

Safety actively analyzes 630254 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 7

0.7.11

New dependency on [pyfaidx](https://github.com/mdshw5/pyfaidx), a Python library for handling samtools-style FASTA indexes (.fai).

`export vcf`:
- Add CNVkit version and current date (i.e. local calendar date that the
"cnvkit.py export vcf" command was run) to the VCF header.

`export theta`:
- Given a VCF of SNVs called jointly in paired tumor and normal samples,
extract SNP allele counts to THetA2's custom input format
("snp_formatted.txt"). The two additional files CNVkit generates this way can
be used with THetA2's "--TUMOR_SNP" and "--NORMAL_SNP" options to improve
estimates of tumor purity and clonality.
- Use CNVkit's segment weights and probe counts to estimate normal-sample read
counts for each segment if no copy number reference profile (.cnn) or paired
normal sample (.cnr) is given.
The command's second argument is now optional and deprecated in favor of the
`-r`/`--reference` option, which does the same thing.

`import-theta`:
- Save integer copy number in the "cn" column of the output file(s) (CNVkit's
.cns format).

`call`, `export nexus-ogt`:
- When reading structural variants from a VCF file, interpret the END tag as the
variant end position, not the length, per the VCF 4.2 specification.
This bug could cause the b-allele frequencies calculated in `call` and `export
nexus-ogt` to be erroneously repeated across many consecutive bins.

`scatter`:
- When loading CNVkit files (in any command), identify and drop rows with "NaN"
log2 values. (CNVkit never emits these, but they could happen if a user
generates .cnr files from Illumina CGH array data files using a custom
script.) The other rows (spread, gc, rmask) can be NaN without a problem, but
plotting with `scatter` would crash when adjusting the y-axis based on NaN
log2 values. (95)
- Detect & warn if input .cnr/.cns/.vcf is not sorted by genomic coordinates.
This could happen if the input VCF or manually constructed .cnr/.cns file (not
generated by CNVkit) was not sorted by genomic coordinates. Then the error
message was cryptic, because some bins/segments/SNVs were selected successfully
but plotting crashed when laying out the x-axis coordinates.

Internals & packaging:
- Use the pyfaidx library to extract sequences from a genome FASTA file (used in
the `reference` command), replacing some custom code in cnvlib. (73; thanks
mdshw5)
- Documentation updates.

0.7.10

`diagram`:
- Label genes even when given only segments (.cns). Plotting segments alone, without bin-level copy ratios (.cnr), can be convenient to produce an uncluttered PDF with a smaller file size while retaining most of the important CNV information. (94)

`scatter`:
- For calculating and plotting SNV b-allele frequencies, select the sample of interest from the given VCF based on the .cnr/.cns base filename, unless specified with `--sample-id`.

`export nexus-ogt`:
- Use normal-sample BAFs if normal-sample .cnr given. Previously, it would load tumor BAFs (taking the first tumor sample from the PEDIGREE tag) even if the properly-named .cnr file was for the normal sample in the VCF.
- Add --sample-id option to select VCF sample. Useful in case .cnr filename base doesn't match the sample IDs in the VCF header.
- Add filtering options --min-weight, --min-variant-depth.
- The `--min-variant-depth` option works the same as in `scatter -v`, filtering SNVs by coverage depth (INFO field DP, usually) for the b-allele frequency calculation.
- The `--min-weight` option allows the user to discard low-weight bins since Nexus Copy Number doesn't use CNVKit's weights for its own segmentation and could be misled by the noisier log2 ratios in less-reliable bins. For choosing the cutoff value, 0.5 is suitable in our experience, but check the distribution of weights in your own data first.

`export vcf`:
- Add custom VCF "FORMAT" fields: FOLD_CHANGE, FOLD_CHANGE_LOG2, PROBES. (91; thanks pcingola)

`segment`:
- The "flasso" method now works again; it was broken for a few releases. (88; thanks pcingola)

`Packaging & internal`:
- Add GRCh37 "access" BED file for users' convenience. The `access` command will also now raise an error if the chromosome names don't match between the "access" and "target" BED files.
- Work with the latest version of pysam (0.9). (86)
- Silence some superfluous warnings from the latest version of pandas (0.18).
- Documentation updates, including more details on the `call` command.

0.7.9

Bug fixes, most importantly to work around an API change in pysam.

Installation:
- Require pysam version earlier than 0.9 (86)

`fix`, `reference`:
- If the majority of target bins have no or very low coverage, warn the user
about this, skip bias corrections, and mask out the low-coverage target bins
during centering to ensure the output is still vaguely usable and sane.
This issue could occur because the wrong target BED was used initially, or
maybe hybridization failed in library prep.

`reference`:
- Ensure the output table's columns are ordered correctly. In some cases it was
possible for the output tables columns to be ordered differently, which still
works in CNVkit, but is weird.

`call`, `rescale`, `export`:
- Check specified gender more sensibly; on failure, default to female.
Specifically, use case-insensitive string comparison to test whether the given
argument means "male". Treating chrX as having neutral ploidy is probably a
less surprising fallback, especially if the "-y" flag is forgotten elsewhere
in the pipeline.

0.7.8

New features in the `call` command make it more amenable to analyzing tumor heterogeneity, and also make the `rescale` command redundant. Documentation is updated with more methodological background info.

`call`:
- Put absolute copy number in a new "cn" column. When rescaling log2 ratios for purity, do not round to integer absolute copy number values. (83)
- New `-v`/`--vcf` option: Calculate b-allele frequency (BAF) average for each segment and output as a new column "baf". Rescale BAFs if `--purity` is specified. Then, using BAF and total copy number (CN, the "cn" column), assign major and minor allele copy number to each segment and output as new columns "cn1" and "cn2". These values can indicate allelic imbalance, including loss of heterozygosity (LOH). (84)
- New `--center` option that works the same as in `rescale`.
- New method `-m none` to perform any specified transformations (rescaling, re-centering, adding b-allele frequencies), but do not call integer copy numbers.

`rescale`:
- Deprecated in favor of `call` with the `-m none` option, which does the same thing.
- If recentering is specified with `--center`, do it before, not after, rescaling log2 values for tumor sample purity.

`export bed`, `export vcf`:
- Take absolute copy number from "cn" column if present (83)

`antitarget`:
- Whitelist chromosomes X and Y along with integer chromosome names for inclusion as canonical mammalian chromosomes. Keep the fallback to "short" chromosome names if no such canonical chromosome names are detected. (37)

`reference`:
- Expose bias corrections (GC, RepeatMasker, targeting density) as command-line options `--no-gc`, `--no-rmask`, and `--no-edge`, similar to the `fix` command. (80)

Internal:
- VariantArray.read_vcf: somatic mask was the opposite of what it should have been, i.e. skip_somatic was skipping germline and retaining only somatic SNVs.

0.7.7

Small improvements, bugfixes, and documentation updates.

`fix`:
- Removed the hard filter on RepeatMasker fraction of antitarget bins. This filter doesn't appear to improve calling on current benchmarks.
- Drop bins that have very high coverage in the reference, in addition to the low-coverage bins already dropped (normalized log2 values outside +/- 5).
- Ignore very-low-coverage bins when recentering (by default). For good-quality samples this doesn't make much difference, but it's safer and seems to improve the centering slightly on lower-quality samples.
- Ensure antitarget bin weights are not set to 0 if the majority of target bins have no coverage -- this would cause segmentation to fail. (82)
- Don't crash if antitargets are empty (to support WGS and targeted amplicon capture), fixing a regression.

`antitarget`:
- Keep untargeted contigs that appear to be "canonical" chromosomes. Prefer chromosomes with numeric names (autosomes in most mammalian reference genomes); but if none of the targeted chromosomes have numeric names, then fall back to chromosomes with names no longer than the longest-named targeted chromosome. (37)

`batch`:
- Disallow input BAMs with duplicate base filenames (81). Now it will trigger an error instead of overwriting some output files.

`segment`:
- `--drop-outlier` option now masks outliers according to multiples (default 10x) of the 95'ile, not 90'ile. Benchmarking looks better.

Plots `scatter`, `heatmap`:
- With the "-c/--chromosome" option, handle unbounded ranges (e.g. "chr1:100-" or "chr5:-100000") treating the missing start/end of the range as the start/end of the specified chromosome.

`heatmap`:
- A more efficient implementation. Now, plotting a heatmap of .cnr is feasible, and behavior is a bit more consistent (e.g. placement of rectangles is more accurate; plotting a selection where only some samples have data will still show all samples).
- Don't crash if selection overlaps no segments, e.g. if the selection is a centromeric or telomeric region. Previously it would crash with an obscure error.

Misc. bugfixes:
- `batch`: log parallel processes correctly for "-p 0"
- `import-theta`: fix crash; namedtuples are immutable (77)
- `metrics`: require --segments (closes 79)
- `rescale`: fix crash if --purity is not specified
- VariantArray: Fix VCF parsing if filters are not used.

0.7.6

Minor bugfixes and improvements.

`scatter`:
- Tweaked plot colors for better visibility and accessibility: points are slightly darker, and segments are now a deep gold color instead of red.

`fix`:
- Downweight targets or antitargets proportionally to their relative variability of bin log2 values; i.e. if targets are twice twice as variable (by interquartile range of bin log2 values) as antitargets, divide all target bin weights by 2. This happens after all bias corrections and reference normalization, and appears to improve the final segmentation results.

`antitarget`:
- Don't emit antitargets for untargeted chromosomes with long names, e.g. "chr6_apd_hap1" -- these are presumably alternative/unassigned contigs, not real canonical chromosomes that deserve to be included for CNV calling. But do continue to keep untargeted chromosomes with names up to the length of the longest-named targeted chromosome. (Improves on 37)
- Indicate default `--min-size` in the help message.

`batch`:
- Log the number parallel processes correctly when "-p 0" is used to automatically detect the number of CPUs -- previously, this option would print on the console that samples were being run in serial, but then launch multiple parallel processes.

`segment`:
- Change the `--drop-outliers` default value from 5 to 10, based on performance in benchmarking.

Internally:
- Fixed detection of autosomes to be used for re-centering bin log2 values and detecting gender.
- Fixed parsing the GATK/Picard "interval list" file format - strand and name were swapped.

Page 4 of 7

Releases

Has known vulnerabilities

Previous Next

Cnvkit

Page 4 of 7

0.7.11

0.7.10

0.7.9

0.7.8

0.7.7

0.7.6

Page 4 of 7

Links

Releases