Bcbio-nextgen

Latest version: v1.1.5

Safety actively analyzes 628989 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 8

1.0.9

- Use smoove for lumpy variant calling and genotyping, replacing custom lumpyexpress
implementation: [validation](https://github.com/bcbio/bcbio_validations/tree/master/NA24385_svsmoove-validation)
- Generalize exclusion of regions during variant calling with new
`exclude_regions` target. Includes previously available LCR and high depth
regions, in addition to removal of polyX and alternative contigs.
- Normalize allele frequency calculation and filtering for Strelka2 and MuTect2.
Thanks to Vlad Saveliev.
- CNVkit: enable specification of pre-built reference background cnn with
`background: cnv_reference`.
- CNVkit: handle projects with mixed CNVkit and non-CNVkit usage. Thanks to Luca
Beltrame.
- Improved Atropos trimming: better use of multicore parallelization in variant
and RNA-seq pipelines.
- Add support for polyG and polyX trimming to variant calling for NovaSeq 3' end
cleanup and generally avoiding low complexity reads.
- Structural variant: use SURVIVOR for validation comparisons.
- RNA-seq variant calling: use multiple cores for VarDict.
- Support miRge2.0 for alternative small RNA annotation. Users should
install the tool manually until compatible with bioconda.
- Add bamCoverage to chip-seq pipeline to calculate bigwig files.
- GATK4: Correctly use GATK4 GatherVcfs when tools_off: [gatk4] specified for
variant calling. Thanks to Luca Beltrame.
- variant: Default to `mark_duplicates: false` if alignment turned off
(`aligner: false`).
- variant: Fix race condition when preparing BED files for coverage and
sv_regions. Thanks to Tristan Lubinski.
- Fix `noalt_calling` to correctly avoid parallelizing on non-standard
chromosomes without a variant regions file.
- Fix broken `kraken` command. Thanks to choosehappy.

1.0.8

- GATK4 is the new default GATK release used in bcbio when running HaplotypeCaller or
Base Quality Score Recalibration. Use `tools_off: [gatk4]` to use older GATK
3.x versions.
- GATK4: Support 4.0 release with changed command line parameters. Re-enable
multicore calling for CWL runs.
- GATK4: remove older GATK3 based gatk-framework in favor of equivalent GATK4
commands.
- install: move to using recent IPython parallel to avoid dependency issues
- install: fix resolution issues due to conda 4.4.x (old ipython-cluster-helper,
missing libquadmath.so with numpy due to libgcc update)
- RNA-seq variant calling: improve joint calling and parallelization with move
to use GATK4 HaplotypeCaller.
- QC: improve read counting speed by moving to hts-nim-tools, replacing custom
samtools view counting
- Add `noalt_calling` to avoid variant calling on non standard chromosomes.
Thanks to Vlad Saveliev and Oliver Hofmann.
- variant alignment: improve core allocations for non-split alignments to avoid
memory issues on 4Gb/core machines with whole genome samples.
- Add Total number of reads and adapter found to metrics in small RNA-seq pipeline.
- Add mirtop to the tools used in small RNA-seq pipeline for miRNA annotation.
- delly: Support 0.7.8 release which calls all variant types together.
- gVCF: fix basic filtering for GATK and sentieon when running without joint
calling. Thanks to Tom Morris.

1.0.7

- Automatically include bcbio anaconda PATH when running tools. Also allow
custom BCBIOPATH specification to help with modules integration. Thanks to
Gabriel Berriz.
- vcfanno: only correct VCF headers to use Number=1 when decomposition takes
place. Avoids incorrect headers for non-decomposed inputs.
- ensemble: normalize and decompose variants prior to incorporating into
ensemble calls, handling MNPs called differently across callers. Thanks to
Vlad Saveliev.
- Avoid bgzipping and grabix indexing fastq inputs when not doing alignment
splitting to save processing.
- Initial support for minimap2 aligner in variant calling workflows. Still needs
validation and benchmarking in comparison to bwa.
- Standardize dbSNP annotation to use vcfanno for all variant callers. Remove
GATK custom annotations for non-GATK callers, which are not present in GATK4.
- CNVkit: drop low coverage contaminating regions in tumor calls. Thanks to
Eric Talevich.
- Expand `remove_extracontigs` for `bam_clean` to more consistently handle
compatible pre-aligned BAMs with different extra contigs in reference genome.
- Fix problem collapsing samples for QC when using RNA-seq variant calling with
gatk-haplotype. Thanks to Neill Gibson.
- Integrate ericscript RNA-seq fusion caller. Thanks to Tetiana Khotiainsteva
and Vang Le.
- Remove read backed phasing (`phasing: gatk`) for GATK runs in favor of
HaplotypeCaller internal phasing.
- disambiguation: ensure BAM index present for non-split alignments
- Use only end of reads to detect 3' adapters in small RNA-seq pipeline.
- Fix BCBIO_JAVA_HOME to correctly pass custom Java to GATK and Picard runs.
- ChIP-seq: add generation of greylist regions defined as regions of high
depth in the input file on a per sample pair basis.
- RNA-seq: STAR now outputs a MAPQ of 60 for uniquely mapped reads instead of
255.
- RNA-seq: Ensure BAM files fed into Cufflinks have 255 as the uniquely mapped
MAPQ instead of 60 as output by hisat2/STAR/etc.
- RNA-seq: omit duplicate files from stringtie assembly merging. Thanks to
mmoisse for the bug report.
- Add support for peddy (https://github.com/brentp/peddy) for PED file
correspondence/ancestry checking.
- ChIP-seq: pass through encode filtered BAMs to upload directory.
- seq2c: pass through mapping_reads.txt file to directory.

1.0.6

- Use mosdepth for callability calculations, replacing goleft depth. Centralize
coverage and QC depth calculations around single mosdepth runs.
- Improve representation of germline and somatic calls in MultiQC report and
output directory, avoiding confusing "-germline" extension. Thanks
to Vlad Saveliev.
- Structural variants: return combined tumor/normal calls instead of single
sample tumor for somatic calls in delly, lumpy, manta, and WHAM.
- VarDict: remove `-v 50` as required option for deep targeted panels (>5000x
average coverage). Recommend adding if needed by a `var2vcf` resource options.
- Templating: avoid automatically setting flowcell date to maintain consistency
between runs.
- Add `fusion_caller` as an optional algorithm field to turn on/off fusion
callers. Currently supports oncofuse and pizzly.
- RNA-seq: better appropriate kmer size estimation for reads < 60 bp for
Salmon/Rapmap/Sailfish index creation.
- RNA-seq variant calling: require gatk-haplotype instead of gatk as the caller.
- RNA-seq variant calling: support GATK4
- UMIs: move fgbio consensus calling to use filtering, adds `--max-reads` for
high depth regions and swaps `--min-consensus-base-quality` for `--min-base-quality`
- Correctly re-bgzip fastq inputs even if not using `align_split_size`.
- Fix bug when running with `lumpy_usecnv` that resulted in skipping CNVkit.
- GATK gVCF joint calling: avoid running through bcftools for header fixes,
using Picard instead. Avoids integer/double conversion incompatibilities.
- CWL: run variantcalling with multiple cores, reducing total jobs and enabling
mulicore supporting callers.
- CWL: support structural variant calling as part of variant pipelines.
- Add pizzly (http://www.biorxiv.org/content/early/2017/07/20/166322)
as a fusion caller when fusion mode is enabled.
- VEP: output an effect call per allele for multiallelic positions.
- Define separators for paired fastq files during bcbio_prepare_samples.py
- RNA-seq single-cell/DGE: add `transcriptome_gtf` as an option which will
collapse single-cell/DGE counts down to the gene level. This is recommended
for single-cell and DGE experiments.
- ChIP-seq: preliminary support for bwa for ChIP-seq alignment. Compared to bowtie2
on a test dataset this results in a superset of the bowtie2 peaks, with 95% of the
common peaks within 50 bases of each other. It calls about 50% more peaks
though using the bwa alignments, use with care.

1.0.5

- Add optional downsampling whole genome BAM files to a high maximum coverage
(200 times the average coverage) to avoid slow runtimes in collapsed repeats
and poly-ATGC regions. Downsampling happens in parallel with post alignment
sorting. Currently turned off by default pending runtime improvements.
Configure using `maxcov_downsample`.
- Separate post alignment recalibration and realignment. Recalibration now
occurs multicore to support GATK4 implementation. We generally recommend
skipping realignment.
- Provide multicore read trimming and streaming bgzip fastq output with atropos,
replacing cutadapt as the default trimmer.
- hg38 runs do not run bwakit's bwa-postproc.js cleanup scripts unless HLA
calling needed. Avoids slowdowns using this postprocessing script when running
bwa with multiple cores.
- Tumor-only prioritization uses vcfanno output instead of GEMINI,
allowing use without needing to build a full GEMINI database.
- Use samtools multicore indexing, replacing sambamba multicore index.
- Replace components of pipeline using single core sambamba view -c with
parallel samtools equivalents.
- Replace sambamba depth coverage calculations with mosdepth to improve
speed and parallelization.
- Multicore base quality score recalibration with GATK4 and Sentieon.
- GATK4: add support for gVCF based joint calling.
- GATK4: fix option usage for gVCF creation with HaplotypeCaller
- Allow overriding Java used in bcbio with `BCBIO_JAVA_HOME`
- Do not split individual sample VCFs during pooled batch calling. This
previously happened only for small batches with less than 5 samples, now we
avoid it entirely and let users do downstream sample extraction.
- Update OptiType HLA calling to use multicore CBC solver, also avoiding GLPK issues.
- Additional approach to retrieving cluster IP addresses for IPython and
logging, using the fully qualified domain name.
- Add `archive: [cram-lossless]` to do CRAM archiving of outputs without quality
score compression. Thanks to Alison Meynert.
- Add `tools_off: [lumpy-genotype]` option to skip Lumpy genotyping.
- CWL/WDL: use single file tarballs for complex collections of files like
aligner, RTG and snpEff indices.
- GC bias correction is now the default for Salmon read-based quantification.
See https://github.com/salmonteam/SalmonBlogResponse/blob/master/SalmonBlogResponse.md for the reasoning behind this change.
- Add kallisto support for non single-cell RNA-seq experiments.
- Salmon can now be run alongside other RNA-seq quantifiers.
- Cufflinks and Stringtie can be run alongside each other as RNA-seq
quantifiers.
- Check BED input files for coordinates off the ends of contigs.

1.0.4

- Initial support for GATK4 variant calling with HaplotypeCaller and MuTect2.
Requires `tools_on: [gatk4]` https://github.com/bcbio/bcbio_validations/tree/master/gatk4
- Enable adapter trimming for variant calling pipeline.
- Provide `trim_ends` command to quickly do defined end trimming as part of
variant calling fastq preparation.
- Support duplex UMIs, present as embedded barcodes on read 1 and read 2.
- Sort region based analyses like variant calling by interval size. Ensures
longest intervals run first avoiding delay at end of sample processing.
- Ensure FreeBayes dbSNP and GATK annotations passed into final file. Thanks
to semal.
- Use new Ensembl vep (variant effect predictor) with updated annotations.
Thanks to Matthias De Smet.
- Accept files from HTTP/FTP as input
- CWL: use json input files for passing inputs instead of flattened command
line arguments. Improves compatibility with multiple runners.
- Allow subsetting a pre-aligned BAM to only standard chromosomes, removing non
chr1-22,X,Y for human. This allows runs of pre-aligned data with different
extra chromosomes than the bcbio reference builds. Thanks to Oliver Hofmann.
- Improved support for pre-aligned BAMs by using contigs in BAM file for
coverage calculations.
- Avoid grabix race conditions with multiple identical input files. Thanks to
Andrey Tovchigrechko.
- Remove usage of lxml for qsignature and qualimap to avoid icu library errors.
- CNVkit: merge adjacent calls with identical copy numbers
- Add support for triple-barcoded cellular barcodes.
- Add support for Illumina's SureCell single-cell RNA-seq.

Page 2 of 8

Releases

Has known vulnerabilities

Previous Next

Bcbio-nextgen

Page 2 of 8

1.0.9

1.0.8

1.0.7

1.0.6

1.0.5

1.0.4

Page 2 of 8

Links

Releases