Snp-pipeline

Latest version: v2.2.1

Safety actively analyzes 629678 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.4.1

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Fixed a Python 2.6 incompatibility with the new consensus caller.

**Other Changes:**

* Added Tox support for automatically testing installation and execution with multiple Python versions.

0.4.0

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* When run on Grid Engine with the default settings, bowtie2 was consuming all available CPU cores
per node while scheduled with Grid to use only 8 cores. On a lightly loaded cluster, this bug made
the pipeline run faster, but when the cluster was full or nearly full, it would cause contention
for available CPU resources and cause jobs to run more slowly. Changed to use only 8 CPU cores
by default.
* The consensus snp caller miscounted the number of reference bases when the pileup record
contained the ^ symbol marking the start of a read segment followed by a dot or comma. In this
situation, the dot or comma should not be counted as reference bases.


**Other Changes:**

* Added support for the Smalt aligner. You can choose either bowtie2 or smalt in the configuration file.
A new parameter in the configuration file, ``SnpPipeline_Aligner``, selects the aligner to use.
Two additional configuration parameters, ``SmaltIndex_ExtraParams`` and ``SmaltAlign_ExtraParams``
can be configured with any Smalt command line options. See :ref:`tool-selection-label`. The
default aligner is still bowtie2.
* Split the create_snp_matrix.py script into two pieces. The new script, call_consensus.py, is a redesigned
consensus caller which is run in parallel to call snps for multiple samples concurrently. The
create_snp_matrix.py script simply merges the consensus calls for all samples into a multi-fasta file.
* The new consensus caller has the following adjustable parameters.
See the :ref:`cmd-ref-call-consensus` command reference.

* ``minBaseQual`` : Mimimum base quality score to count a read.
* ``minConsFreq`` : Minimum consensus frequency.
* ``minConsStrdDpth`` : Minimum consensus-supporting strand depth.
* ``minConsStrdBias``: Strand bias.
* Added the capability to generate VCF files. By default, a file named consensus.vcf is generated
by the consensus caller for each sample, and the merged multi-sample VCF file is called snpma.vcf.
This capability introduces a new dependency on bgzip, tabix, and bcftools. You can disable VCF file
generation by removing the ``--vcfFileName`` option in the configuration file. Also, be aware the
contents of the VCF files may change in future versions of the SNP Pipeline.
* Added configuration parameters ``Torque_StripJobArraySuffix`` and ``GridEngine_StripJobArraySuffix`` to
improve compatibility with some HPC environments where array job id suffix stripping is
incompatible with qsub.
* Renamed the configuration parameter ``PEname`` to ``GridEngine_PEname``.

0.3.4

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* The referenceSNP.fasta file was missing newlines between sequences when the reference fasta file
contained multiple sequences. In addition, each sequence was written as a single long string of
characters. Changed to emit a valid fasta file. Updated the expected result files for the
datasets included with the distribution accordingly.
* Changed the run_snp_pipeline.sh script to allow blank lines in the file of sample directories
when called with the -S option.
* Changed the run_snp_pipeline.sh script to allow trailing slashes in the file of sample directories
when called with the -S option.
* Do not print system environment information when the user only requests command line help.
* Fixed the broken pypi downloads per month badge on the readme page.

**Other Changes:**

* Changed the default configuration file to specify the ``-X 1000`` option to the bowtie2 aligner. This
parameter is the maximum inter-mate distance (as measured from the furthest extremes of the mates)
for valid concordant paired-end alignments. Previously this value was not explicitly set and
defaulted to 500. As a result of this change, the generated SAM files may have a different number
of mapped reads, the pileup files may have different depth, and the number of snps called may change.
* We now recommend using VarScan version 2.3.9 or later. We discoved VarScan v2.3.6 was occasionally
omitting the header section of the generated VCF files. This in turn, caused the SNP Pipeline
to miss the first snp in the VCF file. This is not a SNP Pipeline code change, only a
documentation and procedural change.
* Updated the result files in the included data sets with the results obtained using VarScan v2.3.9
and the Bowtie -X 1000 option.
* Log the Java classpath to help determine which version of VarScan is executed.
* Changed the python unit tests to execute the non-python processes in a temporary directory instead
of assuming the processes were already run in the test directory.

0.3.3

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Improve HPC qsub submission speed throttling to avoid errors with the HPC job scheduler when
submitting large and small jobs. Dynamically adjust the delays between HPC array job submission so
small datasets have small delays and large datasets have large delays between qsub submissions.
* Process the sample directories in order by size, largest first, considering only the size of fastq
files and ignoring all other files. Previously non-fastq files were affecting the processing order.
* Fixed divide-by-zero error in create_snp_matrix when no snps are detected.
* Don't skip the last sample when run_snp_pipeline is started with the -S option and the file of
sample directories is not terminated with a newline.
* Gracefully exit run_snp_pipeline with error messages when run with -S option and any of the sample
directories in the sample directory file is missing, empty, or does not contain fastq files.
* Gracefully exit run_snp_pipeline with an error message when run with -s option and the samples directory
is empty or contains no subdirectories with fastq files.
* Fixed the sun grid engine "undefined" task id reported in non-array job log files.

**Other Changes:**

* Sample Metrics. The pipeline generates a table of sample metrics capturing various alignment, coverage, and snp statistics per sample.
See :ref:`metrics-usage-label`.
* Explicitly expose the ``minConsFreq`` parameter in the supplied default configuration file to make it easier to adjust.
* Updated the FAQ with instructions to install to an older version.

0.3.2

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Fixed (again) a Python 2.6 incompatibility with formatting syntax when printing the available RAM.
This affected the shell scripts (prepReference.sh, alignSampleToReference.sh, prepSamples.sh).
* Improved installation in a Python 2.6 environment. Added several Python packages to the automatic
setup script.

**Other Changes:**

* Added support for the Grid Engine job queue manager. See :ref:`hpc-usage-label`.
* Added a configurable parameter, ``minConsFreq``, to the create_snp_matrix.py script. This parameter specifies
the mimimum fraction of reads that must agree at a position to make a consensus call. Prior to version
0.3.2, the snp pipeline required that a majority (more than half) of the reads must agree to make
a snp call. In version 0.3.2, the default behavior requires at least 60% of reads must
agree to make a consensus call.
* Changed the included snp matrix files for the agona and listeria data sets to match the new results
obtained by setting minConsFreq=0.6. The lambda virus results were not impacted by this change.
* Revised the Installation instructions with more detailed step-by-step procedures.
* Added a Dockerfile for automated docker builds. This feature is still experimental.

0.3.1

~~~~~~~~~~~~~~~~~~

**Bug fixes:**

* Fixed a Python 2.6 incompatibility with formatting syntax when printing the available RAM.
Also added the Python version to the log files.

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.