Igdiscover

Latest version: v0.15.1

Safety actively analyzes 613482 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.15

------------------

* Change the algorithm used for describing how the discovered V gene differs from the
germline gene (the ``database_changes`` column). This gives more sensible descriptions when
the V gene is truncated at one end.
* Faster startup time (mostly noticable when using ``--version`` or ``--help``)
* Ensure candidates get a unique name even if the hashes (``_Sxxxx``) collide
* 108 Print a sensible error message when the GUI cannot be started.

0.14

------------------

* Fix a crash (``KeyError``) during "igdiscover augment" when region info
for a database sequence could not be obtained.

0.13

------------------

* IgDiscover now uses AIRR-formatted files:
See the `AIRR rearrangement schema <https://docs.airr-community.org/en/stable/datarep/rearrangements.html>`_
* IgBLAST is run with the appropriate parameters to produce AIRR-compliant files
* ``assigned.tab.gz`` and ``filtered.tab.gz`` contain this IgBLAST output plus extra columns
that IgDiscover needs (the AIRR schema allows extra columns)
*``assigned.tab.gz`` and ``filtered.tab.gz`` are now called ``assigned.tsv.gz`` and
``filtered.tsv.gz`` (the ``.tsv`` extension is required by the AIRR specification)
* One downside is that, because there are more columns than before, the "assigned" and "filtered"
files are larger than before.
* The upside is that these files can be used with other tools that accept AIRR-compliant files.
* Old "assigned" and "filtered" files can still be read by most IgDiscover commands. Output will
always use new column names.
* The ``VDJ_nt`` column was removed to reduce file size somewhat. It is now recomputed when
necessary from the appropriate offsets.
* Update to IgBLAST 1.17

0.12

------------------

* The ``discoverj`` command was renamed to ``discoverjd`` to reflect that it
also supports D gene discovery.
* Previously, the ``why_filtered`` column would show a generic ``is_duplicate``
reason for filters that compare candidates to each other. Now each filter
criterion can be distinguised.
* The somewhat vague “too similar sequence” germline filter criterion
incorrectly removed some candidates that have a mutation close to the 3’ end.
This was replaced with a simpler filter that only ensures that there are no
two candidates with the same sequence.
* Use IgBLAST 1.10
* Get rid of some unnecessary dependencies by no longer requiring the
unmaintained ``sqt`` library. Installation with Conda is now faster and
requires half the disk space.
* Add a *full_exact* column to ``candidates.tab``

0.11

------------------

* The IgBLAST cache is now disabled by default. We assume that, in most cases,
datasets will not be re-run with the exact same parameters, and then it only
fills up the disk. Delete your cache with ``rm -r ~/.cache/igdiscover`` to
reclaim the space. To enable the cache, create a file
``~/.config/igdiscover.conf`` with the contents ``use_cache: true``.
* If you choose to enable the cache, results from the PEAR merging step will
now also be cached. See also the :ref:`caching documentation <caching>`.
* Added detection of chimeras to the (pre-)germline filters. Any novel allele
that can be explained as a chimera of two unmodified reference alleles is
marked in the ``new_V_germline.tab`` file. This is a bit sensitive, so the
candidate is currently not discarded.
* Two additional files ``annotated_V_germline.tab`` and
``annotated_V_pregermline.tab`` are created in each iteration during the
germline filtering step. These are identical to the ``candidates.tab``
file, except that they contain a ``why_filtered`` column that describes
why a sequence was filtered. See the :ref:`documentation for this feature
<annotated_v_tab>`.
* A more realistic test dataset (v0.5), now based on human instead of rhesus
data, was prepared. The :ref:`testing instructions <test>` have been
updated accordingly.
* J discovery has been tuned to give fewer truncated sequences.
* Statistics are written to ``stats/stats.json``.
* V SHM distribution plots are created automatically and written written to
``v-shm-distributions.pdf`` in each iteration folder.
* An ``igdiscover dbdiff`` subcommand was added that can compare two FASTA
files.

0.10

------------------

* When computing a consensus sequence, allow some sequences to be truncated in
the 3' end. Many of the discovered novel V alleles were truncated by one
nucleotide in the 3' end because IgBLAST does not always extend the
alignment to the end of the V sequence. If these slightly too short V
sequences were in the majority, their consensus would lead to a truncated
sequence as well. The new consensus algorithm allows for this effect at the
3' end and can therefore more often than previously find the full sequence.
Example::

TACTGTGCGAGAGA (seq 1)
TACTGTGCGAGAGA (seq 2)
TACTGTGCGAGAG- (seq 3)
TACTGTGCGAG--- (seq 4)
TACTGTGCGAG--- (seq 5)

TACTGTGCGAGAG (previous consensus)
TACTGTGCGAGAGA (new consensus)
* Add a column ``database_changes`` to the ``new_V_germline.tab`` file that
describes how the novel sequence differs from the database sequence. Example:
``93C>T; 114A>G``
* Allow filtering by ``CDR3_shared_ratio`` and do so by default (needs
documentation)
* Cache the edit distance when computing the distance matrix. Speeds up the
``discover`` command slightly.
* ``discover``: Use more than six CPU cores if available
* ``igblast``: Print progress every minute

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.