Bcbio-nextgen

Latest version: v1.1.5

Safety actively analyzes 628989 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

0.9.7

- Use MultiQC (github.com/ewels/MultiQC) as main package to process all
QC metrics.
- New install procedure for data: `--datatarget` allows installation of sub-sets
of supplemental data for smaller installs for small RNA only analysis. Also
provides a consistent framework for installing larger data types.
- VEP data no longer installed by default. Requires `--datatarget vep`
- During install, `--toolplus` only used for third party tools like GATK and
MuTect and not data installation, which moved to `--datatarget`
- Provide `data_versions.csv` in output folder that has versions of reference
data used in the analysis.
- Use sample description for BAM read group IDs, instead of lane index. This
allows remixing of samples after processing without potential collisions. Thanks
to Neill Gibson.
- Use sample description for file names instead of lane/flowcall information.
Makes re-runs more stable when using template and files easier to interpret.
Back compatible with re-runs of old work directories.
- Finalize support for MuTect2 with validation against the DREAM synthetic 4
dataset (http://imgur.com/CLqJlNF). Thanks to Alessandro (apastore).
- Do not bgzip inputs when they are already gzipped and do not require
parallelization or format conversion. Thanks to Miika Ahdesmaki.
- Use new snpEff annotations (ANN) instead of older approach (EFF). The
new annotations are more interoperable and supported by GEMINI.
- Lazy import of matplotlib libraries to avoid slow startup times.
- Only apply ploidyfix to all female batches to remove Y chromosome. Avoids
confusion with file produced in other cases without any changes.
- Improvement to bcbio CWL integration: support parallel alignment and variant
calling.
- Support for Salmon and RapMap added.
- FastRNA-seq pipeline implemented that does nothing but run Salmon with no QC.
- Singlecell RNA-seq pipeline implemented that uses https://github.com/vals/umis
to handle the UMI and cellular barcode, aligns with RapMap and quantitates
by counting, scaling ambiguous reads by the number of transcripts they could have
come from.
- Migrate bowtie and bowtie2 to handle split input alignments, bgzipped inputs,
and produce sorted, de-duplicated BAM files. This allows use in additional
standard pipelines. Thanks to Luca Beltrame.
- Switch final upload directories for salmon and sailfish results to be of the
form samplename/salmon instead of samplename/salmon/samplename.

0.9.6

- Installation uses conda packages from bioconda for Python dependencies and
third party tools.
- Add macs2 to chipseq pipeline.
- Add germline output files for somatic calling pipelines. The standard variant
calls identify somatic mutations different from a normal, while the
germline has pre-existing mutations which might contribute to cancer
development.
- Use parallel bgzip for preparation of input fastq files for parallelization
and alignment. Thanks to Guillermo Carrasco.
- Avoid extacting individual sample calls from pooled variant call runs for
samples with more than 5 individuals in a batch. Avoids slow extraction run
times. Thanks to Neill Gibson.
- Add explicit check for BED file mismatches with reference genome.
- During validation, report truth counts relative to initial truth set
representation and pick best metric for plotting ROC scores.
- Remove `--sudo` flag from installer. bcbio requires install into a directory
structure with user permissions.
- Add ability to tweak fastq preparation for alignment splitting so we can
explore alternative approaches to bgzip and grabix index.
- Re-enable `stringtie` as an expression caller.
- Allow `stringtie` as a transcript assembler.
- Replace the `assemble_transcriptome` option with `transcript_assembler`, which
accepts a list of assemblers to run. The output of all the assemblers is
merged at the end with Cuffmerge.
- Move Picard to use conda installed `picard` single executable instead of
custom installed java directory of jars.
- Add library type option to Cufflinks assembly. Thanks to Konstantin (dezzan).
- Tag variants decomposed with vcfallelicprimitives. Thanks to Neill Gibson.
- Fix Platypus problem where we weren't correctly specifying BED regions since
latest update skips over files not ending with".txt" or ".bed".

0.9.5

- Add miRDeep2 to small RNA-seq analysis and quantify the novel miRNAs for
all samples.
- Enable calling of HLA alleles with human build 38 (hg38). Turn on with the
`hlacaller` option.
- Structural variant prioritization with BED files of known biologically
important regions. Extracts SV calls in these regions and produces a tab
delimited high level summary. Use the `svprioritize` option to enable.
- Add tRNA count and figures by tdrmapper for srna-seq pipeline.
- Avoid running callability checks on smaller chromosomes less than 1 million
basepairs. Saves computation and disk IO on alt and support regions we don't
split on.
- Enable nested batch specifications, allowing samples in partially overlapping
batches.
- Speed improvements for Lumpy genotyping. Move to latest svtyper and avoid
genotyping breakends.
- Allow use of VEP annotations on non-human analyses.
- Filter VarDict calls with poor mapping quality support (-Q 10) which
trigger low frequency false positives.
- Remove ENCODE blacklist regions when calling with VarDict and FreeBayes on
whole genomes. Avoids long run times due to collapsed repeats near centromeres.
- Update VarScan to 2.4.0 and rework support to allow piping between mpileup
and VarScan to avoid filesystem IO.
- Annotate ensemble calls with information about supporting callers. Thanks to
Pär Larsson and Son Pham.
- Move eXpress to expression_caller instead of being run by default.
- rRNA calculation uses the count file instead of using counts from GATK.
- Merge STAR fusion calls back into the BAM file. Thanks to Miika Ahdesmaki.
- Added preliminary support for the hisat2 aligner.
- Swapped STAR indexing to use on the fly splice junction indexing.
- Slightly inceased default DEXseq memory requirements in bcbio_system.yaml.
- Add support for RNA-seq for hg38 and hg38-noalt
- Make Sailfish the default for non-count based expression estimation.
Produces isoform-level (combined.isoform.sf.tpm) and gene-level
(combined.gene.sf.tpm) TPM expression estimation.
- Move Cufflinks to be off by default for expression estimation (turn on via
expression_callers if needed).
- Add STAR fusion gene parameters suggested by felixschlesinger.
- Add disambiguation to Sailfish by creating a master FASTA file of all
transcripts from all organisms, quantitating each and separating out the
organism-specific transcripts after.
- Add VarDict support for RNA-seq variant calling. Thanks to Miika Ahdesmaki and
Sven-Eric Schelhorn.

0.9.4

- Ensure genome data sort order is identical to BED files when annotating
structural variant calls. Thanks To Miika Ahdesmaki.
- Improve low frequency calling for VarDict using vaidation against DREAM
synthetic dataset 4.
- Install truth sets for germline and cancer calling automatically as part of
bcbio and make it easy to include them in the configuration files for
validation.
- Avoid need to set LD_LIBRARY_PATH and PERL5LIB on installations.
- Update Scalpel to latest version (0.5.1) and improve sensitivity for low
frequency indels: http://imgur.com/a/7Dzd3
- Drop `coverage_depth_max` for downsampling, which no longer works in GATK 3.4.
The option wasn't supported by other callers so was more confusing than useful.
- Fix missing BAM index when running with `align: false`. Thanks to Stephan
Pabinger and Severine Catreux.
- Annotate structural variant files with snpEff. Initial steps towards
summarized structural variant reporting.
- Add ability to specify platform unit (PU) and library (LB) in BAM header.
Thanks to Brad Wubbenhorst.
- Update gatk-framework to 3.4-46 to avoid errors dealing with new gVCF output.
- Set java.io.tmpdir to avoid filling up global temporary space with snpEff.
Thanks to Oliver Hofmann.
- Speed up transcriptome-only processing. Thanks to Sven-Eric Schelhorn.
- Add bamtools output to RNA-seq quality metrics. Thanks to Sven-Eric Schelhorn.
- Expand input quality format detection to detect full range of possible Sanger values.

0.9.3

- Fix bug when using tumors with multiple normals and no CNV calling. Additional
tumor sample would get lost due to lack of early (CNV-based) calling. Thanks
to Miika Ahdesmaki.
- Include R and Rscript in the installation with conda packages and use for
installing and running R-based tools. Avoids issues with alternative R
versions and need for a separate installation.
- Fix bug when using CNVkit on disambiguated inputs. Thanks to Miika Ahdesmaki.
- Re-work structural variant infrastructure to provide plug-in parallel ensemble calling,
removing the previous overlap-based ensemble calls. Currently supports MetaSV for
ensemble calls. Also re-works validation to not rely on ensemble-overlap calls.
- Default to using Real Time Genomics vcfeval (https://github.com/RealTimeGenomics/rtg-tools)
for validation instead of bcbio.variation. Improves speed and resolution of
closely spaced variants. The old funtionality is still available with
`validate_method: bcbio.variation`.
- Correctly apply BQSR when using recalibration with PrintReads by using GATK
full instead of the open source GATK framework which silently ignores BQSR
option. Thanks to Severine Catreux.
- Require larger blocks (250bp, moved from 100bp) to find regions for splitting analysis
to avoid too tight splitting around small homozygous deletions.
- Adjust mapping quality (MQ) filter for GATK SNP hard filters to improve sensitivity
http://imgur.com/a/oHRVB
- Ensure memory specification passed to sambamba and samtools sort during
disambiguation and RNA-seq. Thanks to Sven-Eric Schelhorn.
- Fix compatbility with bedtools groupby in v2.25.0, which needs short
parameters instead of long parameter names.
- Allow turning off variant quality score recalibration with `tools_off: [vqsr]`
- Generalize group size for batching gVCFs prior to joint calling with
`joint_group_size`. Thanks to Severine Catreux.
- Support GEMINI 0.17.0, which does not have a --no-bcolz option since that is
the default.
- Remove test_run parameter since it was poorly supported and not used much.
- Fix issue with featureCounts sorting not working in parallel by pre-sorting
and filtering the BAM file.
- Unified stock coverage and experimental coverage reporting.
- Deprecated `report` and `coverage_experimental` as algorithm keys.

0.9.2

- Support IPython 4.0 with ipyparallel
- Fix bug in writing BAM and VCF indexes to final directory. Correctly add
indexes as bam.bai and vcf.gz.tbi.
- Fix bug in queryname sorting on split files for feeding into diambiguation.
Ensure proper sorting with explicity sambamba sort. Thanks to Sven-Eric
Schelhorn.
- Ensure extra FreeBayes alleles get removed prior to vcfallelicprimatives,
avoiding leaving incorrect genotype allele fields. Thanks to Michael
Schroeder.
- Split CNVkit processing into individual components, enabling better
parallelization and control over parameters.
- Genotype Lumpy structural variant calls with SVtyper.
- Initial support for small RNA pipeline. Thanks to Lorena Pantano.
- Support for MetaSV to prepare combined structural variant calls.
- Add smallRNA-seq pipeline
- Test automatic report for variants calling and standard pipeline.
- Allow Cufflinks to be turned off via tools_off.

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.