Bcbio-nextgen

Latest version: v1.1.5

Safety actively analyzes 628969 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.8.5

- No longer keep INFO fields with `vcfallelicprimitves` in FreeBayes,
Platypus and Scalpel calling to prevent introduction of problematic
fields for multi-allelic MNPs.
- Fix batching problem when using `coverage` and multiple shared batches
like a global normal in cancer calling. Thanks to Luca Beltrame.
- Use `mincores` specification to ipython-cluster-helper to combine single core
jobs into a single submission job for better memory shared on resource
constrained systems.
- Move disambiguation split work inside parallel framework so download and
preparation occurs on worker nodes or inside Docker containers. Enables on
demand download of disambiguation genomes.
- Ensure population databases created when some inputs do not have variant calls.
- Switch to seaborn as matplotlib wrapper, from prettplotlib.
- Fixes for ensemble structural variant calling on single samples.
- Fixes for mixing joint and pooled calling in a single configuration file.
- Support for qSNP for tumor-normal calling.
- Add eXpress to RNA-seq pipeline.
- Add transcriptome-only mapping with STAR, bowtie2 or bwa.
- Change logging time stamps to be UTC and set explicitly as ISO 8601 compliant
output. Improves benchmarking analysis and comparability across runs.
- Add support for RNA-seq variant calling with HaplotypeCaller
- Fix parallelization of DEXSeq.

0.8.4

- Improvements in VarDict calling on somatic samples.
- Fix compatibility issue with bedtools 2.22.0 when calculating genome coverage.
- Fix joint calling upload to avoid redundant inclusion of full VCF file in
individual sample directories.
- Fixes for inclusion of GATK jars inside Docker contains when running
distributed jobs.
- Enable generation of STAR indexes on demand to handle running STAR on AWS
instances.
- Re-organize code to prepare samples and reference genomes so it runs inside
distributed processing components. This isolates process to Docker containers
on AWS and also enables complex operations like preparing reference genomes on
demand.

0.8.3

- Improve tumor/normal calling with FreeBayes, MuTect, VarDict and VarScan by
validating against DREAM synthetic 3 data.
- Validate ensemble based calling for somatic analysis using multiple callers.
- Improve ability to run on Amazon AWS, including up to date interaction with
files originally stored in S3 and transfer to S3 on completion with encryption.
- Avoid race conditions during `bedprep` work on samples with shared input BED
files. These are now processed sequentially on a single machine to avoid
conflicts. Thanks to Justin Johnson.
- Add data checks and improved flexibility when specifying
joint callers. Thanks to Luca Beltrame.
- Default to a reduced number of split regions (`nomap_split_targets` defaults
to 200 instead of 2000) to avoid controller memory issues with large sample
sizes.
- Avoid re-calculating depth metrics when running post variant calling
annotation with GATK to provide accurate metrics on high depth samples.
Thanks to Miika Ahdesmaki.
- Consistently keep annotations and genotype information for split MNPs from
vcfallelicprimitives. Thanks to Pär Larsson.
- Enable VQSR for large batches of exome samples (50 or more together) to
coincide with joint calling availability for large populations.
- Support retrieval of GATK and MuTect jars from S3 to enable integration
with bcbio inside Docker.
- Bump pybedtools version to avoid potential open file handle issues. Thanks to
Ryan Dale.
- Move to bgzipped and indexes human_ancestor.fa for LOFTEE to support access
with new samtools that no longer uses razip.

0.8.2

- Fix bug in creating shared regions for analysis when using a single sample in
multiple batches: for instance, when using a single normal sample for multiple
tumors. Thanks to Miika Ahdesmaki.
- Unify approach to creating temporary directories. Allows specification of a
global temporary directory in `resources: tmp:` used for all
transactions. This enables full use of local temporary space during
processing, with results transferred to the shared filesystem on completion.
- Fix issues with concatenating files that fail to work with GATK's
CatVariants. Fall back to bcftools concat which correctly handles problem
headers and overlapping segments.
- Enable flexible specification of `indelcaller` for `variantcaller` targets
that do not have integrated indel methods. Thanks to Miika Ahdesmaki.
- Move to samtools 1.0 release. Update samtools variant calling to support new
multiallelic approach.
- Improve Platypus integration: correctly pass multiple BAM files, make use of
assembler, split MNPs, and correctly restrict to variant regions.
- Be more aggressive with system memory usage to try and make better use of
available resources. The hope is to take advantage of Java memory fixes that
previously forced us to be conservative.

0.8.1

- Support joint recalling with GATK HapolotypeCaller, FreeBayes and Platypus. The
`jointcaller` configuration variable enables calling concurrently in large
populations by independently calling on samples them combining into a final
combined callset with no-call/reference calls at any position called
independently.
- Add qsignature tool to standard and variant analyses, which helps identify
sample swaps. Add `mixup_check` configuration variant to enable.
- Fix issue with merging GATK produced VCF files with vcfcat by swapping to
GATK's CatVariants. Thanks to Matt De Both.
- Initial support for ensemble calling on cancer tumor/normal calling. Now
available for initial validation work. Thanks to Miika Ahdesmaki.
- Enable structural variant analyses on shared batches (two tumors with same
normal). Thanks to Miika Ahdesmaki.
- Avoid Java out of memory errors for large numbers of running processes by
avoiding Parallel GC collction. Thanks to Justin Johnson and Miika Ahdesmaki.
- Enable streaming S3 input to RNA-seq and variant processing. BAM and fastq
inputs can stream directly into alignment and trimming steps.
- Speed improvements for re-running samples with large numbers of samples or
regions.
- Improved cluster cleanup by providing better error handling and removal of
controllers and engines in additional failure cases.
- Support variant calling for organisms without dbSNP files. Thanks to Mark Rose.
- Support the SNAP aligner, which provides improved speed on systems with
larger amount of memory (64Gb for human genome alignment).
- Support the Platypus haplotype based variant caller for germline samples with
both batched and joint calling.
- Fix GATK version detection when `_JAVA_OPTIONS` specified. Thanks to Miika
Ahdesmaki.
- Use msgpack for ipython serialization to reduce message sizes and IPython
controller memory instead of homemade json/zlib approach.

0.8.0

- Change defaults for installation: do not use sudo default and require
`--sudo` flag for installing system packages. No longer includes default
genomes or aligners to enable more minimal installations. Users install
genomes by specifically enumerating them on the command line.
- Add support for Ensembl variant effects predictor (VEP). Enables annotation
of variants with dbNSFP and LOFTEE. Thanks to Daniel MacArthur for VEP
suggestion.
- Support CADD annotations through new GEMINI database creation support.
- Rework parallelization during variant calling to enable additional multicore
parallelization for effects prediction with VEP and backfilling/squaring off
with bcbio-variation-recall.
- Rework calculation of callable regions to use bedtools/pybedtools thanks to
groupby tricks from Aaron Quinlan. Improves speed and memory usage for
coverage calculations. Use local temporary directories for
pybedtools to avoid filling global temporary space.
- Improve parallel region generation to avoid large numbers of segments on
organisms with many chromosomes.
- Initial support for tumor normal calling with VarDict. Thanks to
Miika Ahdesmaki and Zhongwu Lai.
- Provide optional support for compressing messages on large IPython jobs to
reduce memory usage. Enable by adding `compress_msg` to `alogrithm` section of
`bcbio_system.yaml`. There will be additional testing in future releases
before making the default, and this may be replaced by new methods like
transit (https://github.com/cognitect/transit-python).
- Add de-duplication support back for pre-aligned input files. Thanks to
Severine Catreux.
- Generalize SGE support to handle additional system setups. Thanks to Karl Gutwin.
- Add reference guided transcriptome assembly with Cufflinks along with functions
to classify novel transcripts as protein coding or not as well as generally clean
the Cufflinks assembly of low quality transcripts.
- Developer: provide datadict.py with encapsulation functions for looking up and
setting items in the data dictionary.
- Unit tests fixed. Unit test data moved to external repository:
https://github.com/roryk/bcbio-nextgen-test-data
- Add exon-level counting with DEXseq.
- Bugfix: Fix for Tophat setting the PI flag as inner-distance-size and not insert size.
- Added kraken support for contamination detection (lpatano):
http://ccb.jhu.edu/software/kraken/
- Isoform-level FPKM combined output file generated (klrl262)
- Use shared conda repository for tricky to install Python packages:
https://github.com/chapmanb/bcbio-conda
- Added initial chanjo integration for coverage calculation (kern3020):
https://github.com/robinandeer/chanjo
- Initial support for automated evaluation of structural variant calling.
- Bugfix: set library-type properly for Cufflinks runs.
- Added `genome_setup.py` a script to prepare your own genome and rnaseq files.

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.