Goby

Latest version: v2.0

Safety actively analyzes 628919 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 9

1.9.2

- Fixed a major bug in discover-sequence-variants that sometimes could cause confusion in the group of origin of a
variation. This bug could affect between group p-values. A Junit test now checks for the error condition and
is part of regression testing.
- sam-to-extract mode: append ".compact-reads" to output filename when the extension is missing.
- Added a mode to display aligned reads for a region of the reference sequences. The reads are written
in fasta format, suitable for viewing with a sequence alignment viewer such as JalView, CINEMA, etc.
The mode is called alignment-to-pileup.
- ConcatenateAlignmentReader would consume excessive amounts of memory when several large alignments
(e.g., with >100 million reads) were concatenated. The reader was trying to allocate very large queryLength
arrays, even though each underlying reader indicated that it its entries carried the queryLength.
The fix consists in detecting that all the concatenated readers support queryLength in entries, and
not allocating these arrays at all. This is a major bug fix that makes makes it possible to run more
instances of goby modes on the same server (i.e., differential expression and sequence variant discovery
modes have significantly improved memory usage).
- Mode sam-extract-reads now supports an optional --quality-encoding argument. Default is BAM encoding.
- QualityEncoding now supports BAM encoding (no offset or adjustment, the value of the character in
ascii is the Phred score).
- Fixed sam-extract-reads. Was not extracting sequences from BAM files.
- compact-to-fasta mode: now supports reading an arbitrary slice of input.
- sam-to-compact mode: draft support for importing SAM files produced by BSMAP.
- fixed a bug that prevented running sam-to-compact mode from command line. An assertion prevented the code
from running from the command line. Clarified the text of the assertion error and read the required parameter
from the command line argument so that the mode will run again on SAM files generated outside of Goby.
- reformat-compact-reads must trim quality scores in the same way that it trims the sequence. Quality scores
were not trimmed in previous versions. This is now fixed.
- reformat-compact-reads now correctly processes sequence pairs. Sequence pairs and quality scores can now
be trimmed in the same way as the primary sequence.
- Expose sampleFraction via API and command line for read-quality-stats mode
- Make fasta-to-compact mode more callable via API
- reformat-compact-reads during 'mutate' will no longer complain when there is no sequence-pair that it
cannot mutate (mutation will not be attempted nor complained about if sequence.length is zero).

1.9.1

- fasta-to-compact mode: fix bug that prevented checking that quality encoding are in the allowed range.
quality score must now be converted within the correct score range before the compact-reads file can
be written successfully.
- Paralellize the estimation of statistics. This can speed up mode alignment-to-annotation-counts.
- Introduced a field spliced_alignment_link and spliced_flags in AlignmentEntry to represent relation
between parts of reads that span exon-exon junctions.
- Introduced insert_size in Alignment entry to represent the size of the insert used when making
the sequence library.
- Introduced meta-data in compact-reads files. Meta-data provide a way to document how the sample
was opbtained. Suggested information to be recorded includes when the library was sequenced (useful
to detect batch-effect, as suggested by a participant to the SEQC meeting at the NIH Bethesda campus),
as well as sequencing instrument. Modes fasta-to-compact, compact-file-stats and reformat-compact-reads
have been updated to define, transfer or display meta-data when appropriate.
- Mode compact-alignment-stats now prints statistics about paired-end reads.
- Removed spurious SAM header when writing alignments in plain text format.

1.9

- New fdr mode provides a tool to combine tab delimited file where some columns contain P-values and
adjust selected P-values for multiple testing with the Benjamini Hochberg method. The tool is efficient
in that it only keep P-values that need to be adjusted in memory, but otherwise keeps other column on disk.
This strategy is expected to scale to hundreds of millions of lines of information.
- Add a way to open only a slice of an indexed alignment file by position. This feature makes it possible
to retrieve all alignment entries that start between specific position boundaries. See new constructor
in AlignmentReader and ConcatSortedAlignmentReader.
- The mode discover-sequence-variant has been updated to take advantage of the alignment position slicing
feature introduced in Goby 1.9. See the new arguments --start-position and --end-position.
- Fix a bug in skipTo that caused some alignment entries to fail to be returned (skipTo previoulsy ignored
entries that occured in the chunk just before where the index points). This behaviour is incorrect because
the chunk just before where the index points may contain entries with positions equal to the skipTo requested
position. The index contract is to return the chunk that starts with an entry with the requested location.
Because chunks contain multiple entries with increasing positions, the chunk immediately before the indexed
chunk must be scanned and filtered to remove entries with positions before the skipTo requested position.
A new test was written to check for this issue (TestSkipTo.testFewSkips4).
- Provide Building/Installation instructions for the Goby C++/C API.
- Implemented a fast concatenation operation for read files. The new -q flag in ConcatenateCompactReadsMode
activates the fast concatenation. Chunks of compressed data are appended without requiring decompression and
compression of the entries. This results in much faster concatenation that are bounded only by available IO.
- Add mapping_quality field to AlignmentEntry protobuf schema.
- Add aligner name and version in AlignmentHeader protobuf schema.
- Added C/C++ api methods to set aligner name and version, and alignment entry mapping quality.
- Updated the C API to be more generic, less oriented toward any one
particular 3rd party tool. The read-API is now more generic, the write-API
hasn't changed. The C API files, including the .h header files, have been renamed.
- In C_Alignments.c/.h & C_CompactHelpers.h added CSamHelper and samHelper_* methods to assist
with conversion of BWA to support CompactAlignments as the data stored in BWA just prior
to writing alignments is effectively already in SAM format. These methods make it possible
to reconstruct the aligned query and reference so data can be written in compact alignment.
- Goby C/C++ API now requires the pcre (regex) >=8.10 library. See http://www.pcre.org/
- Compact alignments now support paried-end alignments in Java / C++ / C APIs.
- In alignment-to-text mode, output support in PLAIN and SAM for Paired End alignments
- in alignemt .stats file rename the stat "number.aligned.reads" to the more accurate name
of "number.alignment.entries" for both the Java API and the C++ api.

1.8

- C API introduced to support native Goby support in GSNAP.
- We now distribute a subset of Goby as the Goby IO API. This subset is packaged in the goby-io.jar
file and released under the LGPL3 license. This was done to make it possible to include Goby format
input output code directly into other software licensed under the LGPL3.
- Fixed a bug that prevented Goby opening large alignment files (>3Gb).
- Fixed a bug in AlignmentIterator triggered when reading alignment files with targetIndices starting at
numbers larger than zero.
- Removed dependency on colt (because it is not a pure LGPL license by adding restriction in military
applications)
- SGE helper scripts bz2compact.sh and keep-unique-reads.sh help process hundred of lanes in
parallel on an SGE grid. bz2compact extracts fastq files compressed with BZip2 and converts
them to compact-reads format. keep-unique-reads.sh determines the set of reads that are unique
in each input <file>.compact-reads and writes this information to a <file>.uniqset-keep.filter
- Mode concatenate-compact-reads now supports read index filters. This makes it possible to
concatenate and keep only reads that are unique within each file.
- Draft helper to iterate through individual reference positions of a sorted set of alignments
(see IterateSortedAlignments).
- Alternative implementation of sequence-variation-stats mode (called sequence-variation-stats2)
that determines the number of reference bases matched at a given read index. This info is needed
to call sequence variants, but slows down the stats. The initial implementation is preserved for
compatibility.
- New mode discover-sequence-variants will either (i) identify sequence variants within a group of sample
or (ii) identify variants whose frequency is significantly enriched in one of two groups.
This mode requires sorted/indexed alignments as input.
- SamToCompact mode now populates the read quality scores for sequence variations (toQuality field).
- Update picard/samtools to version 1.25.
- In the mode "alignment-to-annotation-counts" the "--eval" options supports
a new value "counts" which will output a format specifically designed
for use with R's DESeq and notably for the R script geneDESeqAnalysis.R
which is used with GobyWeb.
- Fix bug in extract sequence variations for SAM format, where matches on the
reverse strand got a read-index larger than one from the correct value.
- By default, don't use "counts" in DiffExp as it is a specialized output for preparing for DESeq.
- API interface for ReadsToWeightsMode.
- LastToCompactMode wasn't writing target lengths. Fixed.
- Read TMH in Python using Gzip.
- Fixed Python utilies so -o actually writes to a file.
- Added transcript-align.sh script to assist with aligning via transcripts.
- In MessageChunksWriter, flush logic should occure on a COMPLETELY empty file, but otherwise it
should only occure if entries have been added since the last flush(). In both C++ and Java.
- DiffAlignmentMode can better compare differences when alignments were done by two different
aligners and the Target Indexes are the same in label but not the same TargetIndex
by building a master TargetIndex and translation maps for the two different alignments.
Targets are now shown by label name instead of TargetIndex.
- CompactFileStats --verbose on a compact alignment shows the targetIndex -> targetIdentifier
map and also displays the targetLength for that targetIndex.

1.7

- Extended fasta-to-compact and compact-to-fasta to handle paired end runs. See new command
line arguments --paired-end and pair-indicator arguments in fasta-to-compact and
--pair-output argument in compact-to-fasta.
- Draft support for paired sequence runs. The compact file format is extended to store
sequence, sequence length and quality scores for the paired run. This extension makes
it possible to store both paired end runs in a single compact file. This should help
keep the data together.
- Implemented translation back and from Solexa quality score encoding in fasta-to-compact
and compact-to-fasta. Thanks to Cock PJA et al NAR 2010 for the clear description of the
Solexa base quality scores.
- The sort mode now supports reading only a slice of an input alignment (see options
--start-position and --end-position).
- Refactored CompactAlignmentToAnnotationCountsMode to use IterateAlignments (provides
large speed ups when working with sorted/indexed alignments and selecting a subset of
reference sequences for DE).
- IterateAlignments now takes advantage of the skipTo method when the alignment is sorted
and indexed. This provides large performance improvements when one needs to access data
for only a few reference sequences in an alignments. All the modes that use
IterateAlignments benefit, including display-sequence-variations, and
sequence-variation-stats.
- Index alignments that are sorted upon writing. The skipTo method leverages the index
to provide fast semi-random access to entries by genomic location. This feature is used
by the IGV Goby plugin, which requires Goby 1.7+.
- Concatenate alignment now produces sorted alignments if all the input alignments
are sorted.
- Added a mode to sort alignment by reference sequence and then by position
on the reference sequence.
- Support to estimate read weights described in Hansen KD et al NAR 2010.
See http://campagnelab.org/software/goby/tutorials/estimate-heptamer-weights/
In contrast to the initial publication, Goby supports using the weights to
reweight annotation counts and transcript counts.
- Support to estimate GC content weights for reads and to reweight raw counts to
remove the dependence of counts on GC read content.
- Preliminary support for barcoded reads (barcodes in the sequence), see new
mode decode-barcodes (and tutorial online at
http://campagnelab.org/software/goby/tutorials/handling-barcoded-reads/).
- alignment-to-*-counts: New --eval argument allows to specify which statistics
to evaluate when comparing samples.
- alignment-to-*-counts: New eval options 'samples' will write a column per sample
for RPKM, log2(RPKM) and raw counts. RPKM and log2(RPKM) are written once per sample
and global normalization method.
- Reduce memory requirements when concatenating many alignments. A change
introduced in 1.6 caused more memory than needed to be allocated for each
split of an alignment (as much as the number of reads in the file that
was split). Each split now uses only as much memory as needed to keep
query lengths for the split.
- Dramatically improved performance for differential expression tests with millions of
differentially expressed elements (e.g., exon+gene+other). The code previously
incorrectly grew internal arrays from zero to the number of new DE element described
in the annotation file.

Changes that impact the compact alignment format:
- The compact file format is extended to store sequence, sequence length and quality scores
for the paired run. This extension makes it possible to store both paired end runs in a
single compact file. This should help keep the data together.
- Moved query lengths from header to alignment entries. This scales much
better when processing large alignment files (generated from more than
a few hundred million reads).
- The optional 'sorted' attribute in header indicates if an alignment has been sorted.

1.6

- First draft of the Goby Python API and demonstration tools (see
directory python).
- Fix bug where compact file stats mode reported that a compact alignment
had query identifiers but actually did not
- Added within-group-variability mode. This mode estimates Fisher P-values
between pairs of samples taken from a group of homogeneous samples.
Summary statistics such as average p-value, or minimum p-value are
reported for each gene in each pair considered.
- Update JRI.jar to version 0.8-4 which now works properly with 64-bit
Windows.
- Update commons-lang to version 2.5.
- Optimized DE type storage.
- Fixed a race condition in CompactAlignmentToAnnotationCountsMode.java
when running in parallel by moving .reserve() out of the for loop.
- Renamed DifferentialExpression.ElementTypes enum to ElementType
- Fixed a bug in the DifferentialExpressionCalculator which reset
ElementType for a value from the actual value to OTHER (in occurred
in CompactAlignmentToAnnotationCountsMode). Now once ElementTypes
is set for a label it cannot be changed.
- CompactFileStatsMode now supports an optional -o to write the output
to a file. If not specified the output will be written to stdout.
- Reformat reads now preserve read indices from the input file.
This is necessary when using concat alignment with
--adjust-query-indices false

Page 7 of 9

Releases

Has known vulnerabilities

Previous Next

Goby

Page 7 of 9

1.9.2

1.9.1

1.9

1.8

1.7

1.6

Page 7 of 9

Links

Releases