Bcbio-nextgen

Latest version: v1.1.5

Safety actively analyzes 628936 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

1.0.3

- Allow installs to pull a specific git hash or tag revision of bcbio codebase.
- Fix FreeBayes somatic and multi-sample calling order to be consistent between
chromosome region runs. Thanks to Ho Danliang.
- Fix structural variant output upload for complex batching cases. Correctly
handle shared normals and other multi-batch by naming outputs using batches.
Thanks to Sven-Eric Schelhorn.
- Move to samtools/bcftools/htslib 1.4. Provides parallel bgzip, removing need
for pbgzip and improved concatenation speed for region split VCF files.
- Improve Lumpy prioritization speeds by adjusting location of breakend
genotyping.
- UMI consensus: reduce runtimes to ~2/3 of previous avoiding unnecessary
compression and file IO.
- UMI consensus: pass along metrics about consensus read generation as BAM tags
in final file (cD = depth, cE = error rate)
- Support DNApi for de novo adapter detection in small RNA pipeline
- Several updates to the VarScan support: honor options specified in the
resource config section; honor min_allele_frac option and set --strand-filter
flag in the single-sample case; general cleanups. Thanks to Christian Brueffer.
- Update validation plots to support matplotlib 2.0.
- Enable mixed list/string inputs to germline calling. Thanks to Luca Beltrame.
- Fix qsignature outfile parsing. Thanks to Oliver Hofmann.
- Allow structural variant validations with VCF truth sets. Enables more
flexible comparisons without size and event binning.
- Provide seq2c VCF output and enable validation of calls.
- Allow specification of seq2c options through resources. Thanks to Sally Luke
and Marisa Cunha.
- Avoid using R_LIBS settings for R runs to limit incompatibilities with
externally installed R packages.
- Provide absolute paths for relative paths to files in algorithm list inputs.
Thanks to Matthias De Smet.
- Switch to Salmon from Sailfish as default alignment-free RNA-seq
quantification algorithm.
- Add `sailfish` as a valid option for `expression_caller`.
- Fix chimeric alignment output option for STAR.
- Remove deprecated tidy counts for Sailfish/Salmon.
- Allow more possible empty/skip inputs in `variantcaller` and `svcaller`: None,
null and empty lists
- Move DEXSeq to be an opt-in expression caller by default.
- Speed up combination of counts/RPKM/FPKM/TPM of samples into a single table by
10x.

1.0.2

- Fix FreeBayes paired somatic calling by generalizing support for finding
non-ordered tumor/normal placement in VCF.
- Re-add checks for pre-bgzipped fastq inputs to alignment preparation thanks to
a fix for grabix to handle Illumina bgzip outputs.
- Provide DNA damage annotation for low frequency sequencing errors in somatic
samples. Use `tools_on: [damage_filter]`
- Add viral detection for variant calling DNA-seq cancer samples. Uses
virus sequences from TCGA GDC distribution and provides simple counts of
unmapped reads against viral sequences in MultiQC report.
- Improve lumpy structural variant runs from pre-aligned BAM files, using
extract_sv_reads to avoid need to resort input files. Thanks to Neill Gibson.
- Move VCF files from SV prioritization to final upload directory. Thanks to
Miika Ahdesmaki.
- Provide whole genome coverage plots with goleft indexcov. Thanks to Brent Pedersen.
- Speed up post-alignment callability calculations by using default parameters
to goleft depth. Thanks to Brent Pedersen.
- Allow custom vcfanno configuration files for variant annotation and
GEMINI database creation, using `vcfanno` configuration parameter. Optionally
allows use of `vcfanno` without GEMINI database creation.
- Always use specified cores for analysis re-runs in local multicore mode.
Avoids confusing core behavior with checkpoints on re-starts of analysis in
a previous work directory.
- Upload of pipeline results to iRODS. Thanks to Matthias De Smet.
- Add duplicate removal to post-FreeBayes processing. Thanks to Neill Gibson.
- Support latest svtyper (0.1.1) for lumpy to provide speed improvements. Will
default to 0.1.1 at next release.
(use `bcbio_conda install -c bioconda svtyper=0.1.1` to test in development)
- Work towards supporting a Python 3 compatible bcbio codebase. Thanks to
Michael Crusoe.
- Reduce VarDict maximum BED region sizes for better memory usage. Thanks to
Nikolai Karulin, Oliver Hofmann, Miika Ahdesmaki and Zhongwu Lai.
- Require `tools_on: [lumpy_usecnv]` to pre-run CNVkit as input to Lumpy, allowing
Lumpy and CNVkit to run in parallel otherwise.
- Avoid issues with CNVkit bin size estimates for normal associated with
multiple tumors. Thanks to Ho Danliang.
- Fix double uploading of fast RNA-seq quantification.
- Output single-cell RNA-seq counts in annotated MatrixMarket format.

1.0.1

- Fix bug in 1.0.0 release with parallel calculations on whole genome samples.
The release version only parallelizes by chromosome instead of callable
regions, resulting in less parallelism. Thanks to Sven-Eric Schelhorn and
Neill Gibson.
- Generalize use of working directories to support runs on S3 mounted
filesystems. Ensures all work takes place inside transactional directories.
Thanks to Tetiana Khotiainsteva and Sven-Eric Schelhorn.
- Provide separate germline calling for somatic tumor/normal pairs. Supplements
somatic calls with standard germline calls on normal samples, including
ensemble and SV calling.
- Support creating GEMINI databases with new generic mechanism using vcfanno/vcf2db.
This allows creation of GEMINI output for any organism. Adds support for hg38
with annotations from dbSNP, Clinvar, ExAC and ESP.
- Support FreeBayes 1.1.0 for improved memory usage and 3-4x speedup.
Will default to 1.1.0 at next release. Validation work:
https://github.com/bcbio/bcbio.github.io/blob/master/_posts/2016-11-21-giab-hg38-freebayes.md
- Rework quality control for speed and output directory consistency. Avoid
re-duplicating calculations and put all output in qc directory to make re-runs
easier. Thanks to Vlad Saveliev.
- Fixes for Seq2C concurrency problems when preparing BED files. Thanks to Vlad
Saveliev.
- Update WHAM structural variant caller to support the latest release.
- Update delly structural variant caller to support the latest release.
- Improve dbSNP annotation speeds for adding rs IDs to VarDict output.
Thanks to Ben Liesfeld.
- Support for VEP 87 with additional plugins and generalization of fields.
Thanks to Matthias De Smet.
- Deprecate `clinical_reporting` parameter and introduce new
`effects_transcripts` parameter than enables more control over variant effects
prediction. Enable HGVS by default for human projects and separates from
transcript selection.
- For lumpy runs that use samblaster, use samtools sort instead of sambamba
sort. Avoids segfault issues with samblaster. Thanks to Oliver Hofmann.
- Pre-install capture region BED files and enable short hand specification in
sample configuration.
- Use vt normalize as part of GEMINI decomposition to clean up complex
multiallelic variants. Thanks to Sergey Naumenko.
- Testing suite cleanup. Move to py.test and separate integration and unit
tests. Thanks to Tetiana Khotiainsteva.
- Fix issue with cutadapt hanging on gzipped input. Thanks to Stephen Turner.
- Updated cutadapt to use single-pass trimming for paired-end files, improving
performance and hitting the disk less.
- Added support for cellular barcode error correction with single-cell RNA-seq
via the `cellular_barcode_correction` parameter. This corrects edit distances
up to the set value, defaults to 1.
- Add support for sample-based demultiplexing of single-cell RNA-seq runs.
- Move single-cell RNA-seq results to the upload directory.
- Make positional UMI default to off for single-cell RNA-seq.
- Add support for the Klein lab v3 version of the inDrop protocol.

1.0.0

- Default to no calling if `variantcaller` not specified, instead of old GATK
UnifiedGenotyper default.
- Use samtools depth instead of bedtools genomecov for depth calculations, and
calculate high depth regions during initial depth calculations.
Improves speed by more than 6x. Thanks to Brent Pedersen.
- Adjust de-duplication strategy to use bamsormadup from biobambam2 for most
cases and samblaster when split and discordant reads needed for SV calling
with lumpy.
- Fix handling of fresh installs with GATK 3.6 only included. Correctly handles
versioning from bioconda and lack of specifically defined jar directory.
- Unset JAVA_HOME when running gatk-framework and GATK > 3.6, forcing
use of bcbio installed Java 8. Thanks to Brad Wubbenhorst.
- Fix bug when running realignment without recalibration in GATK 3.6. Thanks to Pär Larsson.
- Get from GEO server, GSM FASTQ samples using bcbio_prepare_samples.py script
- Add seqcluster stats to QC folder
- Allow manual specification of total memory and core usage for machines in
`resources`. Thanks to Juan Caballero.
- Allow PED based gender specifications (1=male, 2=female). Thanks to Brent
Pedersen.
- Annotate validation variants with genome context from GA4GH and other sources
for interpreting true/false positives/negatives.
- Limit GATK cores used for GenotypeGVCFs to avoid excessive memory usage.
- VQSR: allow forcing GATK to try VQSR with tools_on. Generate VQSR plots.
Thanks to Zhengqiu Cai.
- Support ATAC-seq for chipseq pipeline.
- Remove duplicates after alignment for chipseq pipeline.
- Support for bzip2 input files during variant calling. Thanks to Paulo Silva.
- Allow non-positional UMI Rapmap quantified single-cell RNA-seq.
- Re-enable save_diskspace option to reduce disk usage during alignment
preparation and split alignments.
- Offload fixing the unmapped Tophat file to tophat-recondition. Thanks to Christian Brueffer.
- Add support for vcfanno (https://github.com/brentp/vcfanno) to annotate VCFs with other
VCFs/BED files.
- Mark possible RNA-edits for GRCh37/hg19 using RADAR coordinates. Thanks to
Sergey Naumenko for the suggestion.
- Add `local_controller` option to run the controller alongside the main bcbio
process. Thanks to Brent Pederson and Sven-Eric Schelhorn.

0.9.9

- Change defaults for recalibration and realignment to False. These have been
the recommended settings (http://bit.ly/bcbio-minimal) and no realignment now
matches Broad recommendations.
- Use conda installed Java instead of requiring external installation
for most tools.
- Support GATK 3.6 with Java 8 installed as part of anaconda. Older GATK
versions for calling and recalibration/realignment require external Java 7.
- Re-organization of variants stats using bcftools and
cleaning gemini queries to get individual samples metrics.
- Quality control back end revamped to support better parallelization
and pluggability of new QC metrics.
- Support CNV calling with Seq2C for exome, targeted or amplicon experiments.
Thanks to Vlad Saveliev.
- Add `fixrg` target to `bam_clean` to accept BAM inputs with correct
sorting and reads but that need an updated read group.
- More robust file transactions across network filesystems, avoiding failures
from partially transferred files. Thanks to Sven-Eric Schelhorn.
- Improved checking of BAM files during merge steps. Thanks to Sven-Eric Schelhorn.
- Add SAMPLE and PEDIGREE tags to tumor/normal VCF outputs to enable
easier post-analysis parsing of results.
- Add single point for annotation following variant calling to improve
pluggability of new annotation types.
- Add support for running germline and somatic calling with Sentieon
callers (https://peerj.com/preprints/1672/). Requires license from
Sentieon.
- Fix fusion calling using Tophat2. Thanks to csardas for raising the issue.
- Add support for kallisto quantification of single-cell RNA-seq data.
- Add `transcriptome_fasta` option to single-cell RNA-seq. This allows
the user to provide a transcriptome FASTA file to quantitate against rather
than use the `bcbio` provided annotation.
- Fix naming of vardict RNA-seq variant calls. Thanks to csardas.

0.9.8

- Correctly install all datatargets on new installation. Previously we'd
skipped installing default additional data unless specified.
- Use yamllint to find wrong syntaxes in the YAML file that are ignored
by pyyaml package and can affect the analysis.
- Improve choosing split regions for batch analysis to use the unionized
intersection of non-callable regions. This enables better use of batches
with different callable regions. Thanks to Neill Gibson.
- Fix HLA typing issues and handle HLA typing on split alignments.
Thanks to Miika Ahdesmaki.
- Set `align_split_size` automatically based on input file sizes, trying to
provide reasonable splits and avoid too many splits for large files.
- Fix high depth identification for whole genome runs, correctly calculating
it when also inferring coverage estimations. Thanks to Neill Gibson.
- Do not remove duplicates for GATK variant calling when mark_duplicates
is False or running amplicon sequencing.
- Fix installation of mutect jar via toolplus when mutect not previously
present in configuration.
- Platypus: revert filtering back to defaults after additional cross-validation:
http://i.imgur.com/szSo5M6.png
- Enable gVCF output with tools_on: [gvcf] for users who need gVCF output
for downstream analyses.
- Avoid downscaling memory when recalibrating/realigning with GATK, since we
should not longer need to work around Java issues. Thanks to Luca Beltrame.
- Do not use samblaster on genomes with greater than 32768 contigs, the
samblaster maximum. Thanks to morten (mattingsdal).
- Move to samtools for output CRAM support, using bamUtils for 8-bin compression
of read quality scores.
- Remove `merge_bamprep` option and always merge realigned BAM files if run.
- Correctly clean up additional problem characters in sample descriptions that
can confuse shell commands.

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.