Goby

Latest version: v2.0

Safety actively analyzes 628919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 9

2.3.6

- Improve performance of realignment around indels when processing RNA-Seq reads. Previous versions of Goby had
scalability issues and kept data around from previous chromosomes. This was OK when processing DNA-Seq inside GobyWeb,
which splits data into genomic slices, but not when trying to process one or more RNA-Seq alignment files.
Performance has also been dramatically improved by fixing a bug on indel equality.

2.3.5

- Add a mode to infer sex of samples from data (tested on exome data). Useful as quality control to check the
data you get checks out with respect to the what is known about the samples. See --mode infer-sex. Works
faster on sorted alignments where the index is used to jump quickly to the human sex chromosome.
- Prevent AbstractAlignmentToCompactMode to print more than 10 warnings if quality scores are not available in
an alignment.
- suggest-position-slices: fix a bug in that caused some slices to overlap. Found with a job with hundreds of
alignments, so not common.

2.3.4.1

- Add an option to the fasta-to-compact mode that will convert a set of files and concatenate the result
to a single compact-reads file (see new --concat option).
- Add a mode to test that the connection from Goby to R is working (requires JRI and R built
with shared library support). The mode is called test-r-connection (tcr).
- Restore STRICT_SOMATIC filter.
- Close files opened when loading Goby Alignment header and index files. This fixes a too many file error
that could occur when loading hundreds of alignments simultaneously.
- Allow lenient import mode for TSV files. This makes it possible to convert TSV files to lucene.index when
they have been created with Goby in the past with a \t character as last character of the column line.
- Fix a bug that caused some slices to occur within annotations, despite the --annotation option being given
on the command line. The problem was that the chromosome index was not /obtained from the genome and was set
to zero, always.

2.3.4

- Optimize the speed of genotyping when some sites have very high coverage (>500M bases).
Now sub-sampling to keep a random set of 10,000 bases for such sites. Expose the default
sub-sample size with a dynamic option called sub-sample-size in IterateSortedAlignmentsListImpl.
(-x IterateSortedAlignmentsListImpl:sub-sample-size <int>)
- LastToCompact mode now supports the import of paired end alignments produced by Last's last-pair-probs.sh.
- LastToCompact mode now supports the import of quality scores (lastal must be done with -Q1 since the
import assumes Phred quality scores on the q lines).
- Add two methods to AlignmentReader to determine the minimum and maximum genomic locations represented
in the reader. This is useful when suggesting slices to split a set of alignments. This commit includes
a fix for possible null start or end positions in slices generated with suggest-position-slices.
- Fix a problem with run-in-parallel where some threads would never finish when they do not detect
the keyword. Now indicate that the thread finished so that others can start when the processing
completes.
- reads-file-stats: remove any path from basename in the output.

2.3.3

- IterateSortedAlignmentsListImpl: Use a WarningCounter to limit warnings to 10 instances. This is needed to
avoid writing Gb of log output when the threshold is met.
- discover-sequence-variants somatic output: Make it possible to run a simple trio design by removing the
requirement for a germline sample.
- discover-sequence-variants somatic output: Earlier versions were reporting somatic variation candidates
when two parents are homozygotes and the somatic samples was Het (the fisher p-value with each parent is
very significant in this case, but does not indicate a somatic change). This also improves q-values because
they are less results that need to be corrected.
- discover-sequence-variants somatic output: Add an error message when a sample is mis-spelled in the covariates
file.
- Refactor code base to keep base counts for forward and reverse strands separately in SampleCountInfo.
- Normalize somatic priority score by number of mapped reads, and number of parents and germline samples used in
the calculation.
- Add a StrandBiasFilter in somatic analyses. The filter rejects variations that are not represented on both
strands when at least j reads support the variation. The value of j is set to 9 by default, so a variation with
10 bases needs to have at least the two strands represented.
- Remove candidate somatic variation that can occur when the germline samples have less coverage than the
somatic sample. Now require at least twice the coverage in the somatic sample than the minimum coverage
in the germline samples.
- Add a STRICT_SOMATIC filter that flags genomic sites where some bases appear in support of the variation
in the parents or germline samples. Please note the VCF spec semantic: PASS indicates that all filters passed.
This means that lines with the STRICT_SOMATIC value in the FILTER column failed that test.
- Fix a bug in FDR mode that would not handle vcf files with non default FILTER values.

2.3.2

- run-parallel-mode now supports paired input files.
- fasta-to-compact: add --force-quality-encoding option to force the quality values within the specified
encoding range.
- suggest-position-slices: fix problem where first slice of genome was omitted from output (with new split
by number of bytes option introduced in 2.3).

Page 2 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.