Goby

Latest version: v2.0

Safety actively analyzes 628919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 9

2.1

- Improve compression of hybrid-1 codec by about 8% on average at similar speed. You can enable this improvement with
option -x AlignmentCollectionHandler:symbol-modeling=plus. This option will be made the default in a future release.
It is not currently the default since Goby 2.1 has not been integrated into IGV and will need time to propagate from
IGV dev to production builds.
- Remove import of NH:i bam tags as read-origin-index, since the NH tag seems to contain different types of data
depending on the aligner that produced the alignment.
- compact-to-sam mode: fix bug where bam tags containing a colon character (:) would be truncated after the first
colon. Thanks to Vadim Zalunin for reporting this problem.
- compact-file-stats: Add a feature to scan only alignment headers.
- VCFParser group associations: Make it possible to lookup an INFO column by either INFO/colname or colname.
- NonAmbiguousAlignmentReader: fix an NPE when reading alignments where all entries have the ambiguity field.
- Fix a problem where AlignmentReaderImpl.canRead would return true when the file ended with an incorrect extension
(this problem could create subtle issues when the goby tried to access .info.txt files on a web server that did not
return 404 errors for missing content). Thanks to Jim Robinson and Helga Thorvaldsdottir for reporting this issue.

2.0.1

- Release Goby C/C++ APIs under the LGPL license version 3 to make it possible for companies to incorporate support
for Goby formats in their tools. Thanks to Collin Hercus for the suggestion. Please note that part of the Goby
Java APIs are already licensed under the LGPL (anything packaged under the Goby-io.jar file).
- C++ API: Support to set placed unmapped (i.e., mate that does not map is recorded with the read that mapped)
and clipleft/clipright with quality scores.
- Fix problem when using a genome backed by a samtools/picard faidx file. In some cases, read bases would be returned
shifted by one position. Thanks to James Bonfield for reporting this problem.
- SAM/BAM tags start at column 12, index 11. --preserve-all-tags could skip the first tag on some datasets (e.g.,
dataset where the first tag was not a MD:Z or RG:Z). Thanks to James Bonfield for reporting this problem.
- Introduce interface for ReadsWriter. Introduce mock implementation to write reads to text. This is useful to write
more intelligible JUnit tests.
- mode sam-to-compact now supports option --read-names-are-query-indices to indicate that the read names are integers
(typically produced by compact-to-fasta from a chunk of a large file).
- Fix a bug in reformat-compact-reads which did not trim quality scores for paired end reads correctly.

2.0

- Support multiple group comparisons for RNA-Seq diff exp (mode compact-alignment-to-annotation-counts).
- Added a mode sam-comparison to compare a source SAM/BAM file with one that generated after sam-to-compact then
compact-to-sam.
- Refactor AlignmentWriter to introduce an interface and make it easier to create facades that modify the behaviour
of the default writer. For instance, such a facade is BufferedSortingAlignmentWriter, which keeps a number of entries
in memory to re-sort these entries by genomic position. This feature is used when importing already sorted SAM/BAM
files to create sorted Goby alignments and the files contain spliced alignments that would cause mis-ordering during
conversion.
- Make default chunk-size dependent on the type of chunk codec used. This is useful because hybrid compression does
better with larger chunk sizes (default chunk size for hybrid is 30000, 20000 for bzip2 and 10000 for gzip). The
default chunk size can be overriden with -x MessageChunksWriter:chunk-size=int
- Add ability to preserve SAM/BAM read groups. Read groups are automatically preserved if present in the input BAM file.
The concatenate mode automatically reassigns read_origin indices (see field read_origin_index) to prevent conflicts
when Goby files from different origins are concatenated. The approach we use is to keep the most specific read origin
information, and let the client decide what origins/groups are equivalent given the type of analysis at hand.
Read groups are supported by the hybrid codec (and therefore stored very efficiently), are imported from BAM with
sam-to-compact and are exported back to SAM/BAM with the compact-to-bam mode.
- Add ability to preserve all BAM attributes during import and export. Use --preserve-all-tags in mode sam-to-compact
to enable this.
- Add ability to preserve all quality scores. Use --preserve-all-mapped-qualities in mode sam-to-compact.
- Supports bzip2 compression in fasta-to-compact mode and sam-extract-reads (use the -x MessageChunksWriter:codec=bzip2
dynamic option).
- Renamed SortMode to Sort1Mode. Renamed SortLargeMode to SortMode.
- Added SortLargeMode which can sort compact alignments of any size, multithreaded.
- Fixes to sam-to-compact mode. Previous versions could fail for a variety of reasons. We have stress tested this mode
throwing at it various input BAM files, sorted or not and fixed the bugs we found. For instance, the --sorted option
would not work in some 1.9 versions of Goby after samtools/picard changed the semantic of the record comparator Goby
relied upon to verify the input was indeed sorted by position. This made it impossible to convert already sorted BAM
files as sorted Goby alignments).
- Moved error messages produced when parsing the command line of a mode to after usage. This is a simple change that
will make it easier to diagnose problems on a command line without having to scroll back up the console.
- Prevent logging when the log4j system has not been configured. For some reason, LOG.isDebugEnabled can return true
when the logging system is not initialized. For SamHelper, this means calling String.Format million of times to
create debug output that is never shown. This change dramatically improves the performance of the sam-to-compact mode
when logging is not properly configured.
- Refactor dynamic options with a central registry, and make GobyDriver handle option parsing.
This removes duplication of code parsing for each mode that would need dynamic options.
- methylation region can now estimate empirical p-values. Empirical P-values require biological replicates in at least
one of the groups under analysis. Two passes over the data are required. In the first pass, the empirical null
distribution is observed by comparing pairs of samples in the same group. In the second pass, this distribution is
used to estimate the p-value of observing the between group differences. Such empirical p-values can control FWER
in the strong sense.
- Support empirical p-value for individual bases (VCF output). Write a DMR INFO field that stores how many significant
sites were found in a moving window that ends at the site (significance is judged according to a configurable
threshold on the empirical p-value).
- New empirical-p mode to estimate p-values from data in text files. This makes it easier to derive p-values for
simulated data or counts generated by other tools than Goby.
- Make it possible to open Goby alignments through HTTP. Simply specify a URL as a basename as argument to the goby
tools. This is supported broadly by the API, so the concatenation reader also supports URLs, for instance. TMH files
currently cannot be loaded remotely. Alignments that require upgrading will also fail to load remotely.
- Fix issues with the barcode-decode mode. Add support for processing fasta/fastq files.
- vcf methylation format: removed space in name of C and Cm group INFO fields.
- Add a draft implementation of random access sequence interface that can read a fasta file indexed with faidx.
- Introduce chunk codecs for protocol buffer encoded collection messages (supports both reads and alignments).
- Added the ability in alignment-to-text mode to output HTML (-f html), to start/end at offsets (-s/-e) in the alignments and
to limit the number of alignment entries to output (-n).
- The RandomAccessSequenceCache had problems with bases that weren't G/A/T/C/N. Such bases would be skipped silently,
causing rare, but potentially significant, problems (such as on human chr 3 of the 1000g genome reference where a
R base appears). Bases not in the group G/A/T/C/N would introduce position shifts for bases immediately following
the offending character. Now bases other than G/A/T/C are stored as N and maintain the position of the following
bases. Please note that the problem was in a library used by RandomAccessSequenceCache, we updated the library in
this release, and no change to the code of RandomAccessSequenceCache was needed to fix the problem.
- last-to-compact: add option to substitute some bases with others in the aligned read.
- Add test and fix for bug that went back to start of alignment file, even though iterate alignment was created for a
slice of input. The problem only affected the IterateAlignments class because it was calling reposition(0,0) and the
method did not enforce slice limits.
- The code base was simplified by removing the now obsolete align mode.
- Fix a problem where sample names with several dots were stripped of too many extensions. For instance, a.b.c.entries
would be reduced to a, which could be non-unique across the remaining samples. Problem reported by Fang Fang in her
data on GobyWeb.
- DistinctIntValueCounterBitSet now uses LongArrayBitVector as its bit set implementation. The java BitSet implementation
was found to throw java.lang.ArrayIndexOutOfBoundsException for indices that should fit easily in a bit array (e.g.,
2,080,948 which can stored with about 230 MB).
- AlignmentEntry field insertSize is now stored in protobuf with sint32 rather than uint32 since negative values can be
stored in this field.
- Support multiple group comparisons for RNA-Seq diff exp (mode compact-alignment-to-annotation-counts).
- The mode sample-quality-scores now supports .sam, .sam.gz, and .bam files to make a guess at the scale of
the quality scores contained in the file.
- Added a mode sam-comparison to compare a source SAM/BAM file with one that generated after sam-to-compact then
compact-to-sam.
- Fixed a problem with concatenate-compact-reads that previously transferred only specific fields of a read to the
output file. concatenate-compact-reads now transfers all fields (including pair sequence and quality score).
- version mode now prints an official version number if the jar constains a VERSION.txt file.

1.9.8.3.1

- Fix a bug related to writing paired end alignments in the Gsnap parser (C API)

1.9.8.3

- Added a methylation_region format capable of averaging methylation rates for different cytosine contexts over
arbitrarily defined regions.
- Added a diploid genotype filter to use when calling genotypes in a diploid genome.
- discover-sequence-variants format compare_groups: Write distinct fisher p-values for each comparison pair
- Fix FDR mode output for TSV format. Make open --column-selection-filter work.
- Fix bug that prevented methylation vcf output from writing any line.

1.9.8.2.1

- Fix bug in GenotypesOutputFormat that caused GenotypesOutputFormat to throw an exception when processing some sites.

Page 4 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.