Scikit-bio

Latest version: v0.6.0

Safety actively analyzes 629908 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 4

0.5.4

Features

* Added `FSVD`, an alternative fast heuristic method to perform Principal Coordinates Analysis, to `skbio.stats.ordination.pcoa`.

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]

Performance enhancements

* Added optimized utility methods `f_matrix_inplace` and `e_matrix_inplace` which perform `f_matrix` and `e_matrix` computations in-place and are used by the new `center_distance_matrix` method in `skbio.stats.ordination`.

Bug fixes

Deprecated functionality [stable]

Deprecated functionality [experimental]

Miscellaneous

0.5.3

Features
* Added `unpack` and `unpack_by_func` methods to `skbio.tree.TreeNode` to unpack one or multiple internal nodes. The `unpack` operation removes an internal node and regrafts its children to its parent while retaining the overall length. ([1572](https://github.com/scikit-bio/scikit-bio/pull/1572))
* Added `support` to `skbio.tree.TreeNode` to return the support value of a node.
* Added `permdisp` to `skbio.stats.distance` to test for the homogeniety of groups. ([1228](https://github.com/scikit-bio/scikit-bio/issues/1228)).

* Added `pcoa_biplot` to `skbio.stats.ordination` to project descriptors into a PCoA plot.

* Fixed pandas to 0.22.0 due to this: https://github.com/pandas-dev/pandas/issues/20527

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]

Performance enhancements

Bug fixes

* Relaxing type checking in diversity calculations. ([1583](https://github.com/scikit-bio/scikit-bio/issues/1583)).

Deprecated functionality [stable]

Deprecated functionality [experimental]

Miscellaneous

0.5.2

Features
* Added ``skbio.io.format.embl`` for reading and writing EMBL files for ``DNA``, ``RNA`` and ``Sequence`` classes.

* Removing ValueError check in `skbio.stats._subsample.subsample_counts` when `replace=True` and `n` is greater than the number of items in counts. [1527](https://github.com/scikit-bio/scikit-bio/pull/1527)

* Added ``skbio.io.format.gff3`` for reading and writing GFF3 files for ``DNA``, ``Sequence``, and ``IntervalMetadata`` classes. ([1450](https://github.com/scikit-bio/scikit-bio/pull/1450))

* `skbio.metadata.IntervalMetadata` constructor has a new keyword argument, `copy_from`, for creating an `IntervalMetadata` object from an existing `IntervalMetadata` object with specified `upper_bound`.

* `skbio.metadata.IntervalMetadata` constructor allows `None` as a valid value for `upper_bound`. An `upper_bound` of `None` means that the `IntervalMetadata` object has no upper bound.

* `skbio.metadata.IntervalMetadata.drop` has a new boolean parameter `negate` to indicate whether to drop or keep the specified `Interval` objects.

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]

Performance enhancements
* `skbio.tree.nj` wall-clock runtime was decreased by 99% for a 500x500 distance matrix and 93% for a 100x100 distance matrix. ([1512](https://github.com/scikit-bio/scikit-bio/pull/1512), [#1513](https://github.com/scikit-bio/scikit-bio/pull/1513))

Bug fixes
* The `include_self` parameter was not being honored in `skbio.TreeNode.tips`. The scope of this bug was that if `TreeNode.tips` was called on a tip, it would always result in an empty `list` when unrolled.

* In `skbio.stats.ordination.ca`, `proportion_explained` was missing in the returned `OrdinationResults` object. ([1345](https://github.com/scikit-bio/scikit-bio/issues/1345))

* `skbio.diversity.beta_diversity` now handles qualitative metrics as expected such that `beta_diversity('jaccard', mat) == beta_diversity('jaccard', mat > 0)`. Please see [1549](https://github.com/scikit-bio/scikit-bio/issues/1549) for further detail.

* `skbio.stats.ordination.rda` The occasional column mismatch in output `biplot_scores` is fixed ([1519](https://github.com/scikit-bio/scikit-bio/issues/1519)).

Deprecated functionality [stable]

Deprecated functionality [experimental]

Miscellaneous
* scikit-bio now depends on pandas >= 0.19.2, and is compatible with newer pandas versions (e.g. 0.20.3) that were previously incompatible.
* scikit-bio now depends on `numpy >= 1.17.0, < 1.14.0` for compatibility with Python 3.4, 3.5, and 3.6 and the available numpy conda packages in `defaults` and `conda-forge` channels.
* added support for running tests from `setup.py`. Both `python setup.py nosetests` and `python setup.py test` are now supported, however `python setup.py test` will only run a subset of the full test suite. ([1341](https://github.com/scikit-bio/scikit-bio/issues/1341))

0.5.1

Features
* Added `IntervalMetadata` and `Interval` classes in `skbio.metadata` to store, query, and manipulate information of a sub-region of a sequence. ([1414](https://github.com/scikit-bio/scikit-bio/issues/1414))
* `Sequence` and its child classes (including `GrammaredSequence`, `RNA`, `DNA`, `Protein`) now accept `IntervalMetadata` in their constructor API. Some of their relevant methods are also updated accordingly. ([1430](https://github.com/scikit-bio/scikit-bio/pull/1430))
* GenBank parser now reads and writes `Sequence` or its subclass objects with `IntervalMetadata`. ([1440](https://github.com/scikit-bio/scikit-bio/pull/1440))
* `DissimilarityMatrix` now has a new constructor method called `from_iterable`. ([1343](https://github.com/scikit-bio/scikit-bio/issues/1343)).
* `DissimilarityMatrix` now allows non-hollow matrices. ([1343](https://github.com/scikit-bio/scikit-bio/issues/1343)).
* `DistanceMatrix.from_iterable` now accepts a `validate=True` parameter. ([1343](https://github.com/scikit-bio/scikit-bio/issues/1343)).
* ``DistanceMatrix`` now has a new method called ``to_series`` to create a ``pandas.Series`` from a ``DistanceMatrix`` ([1397](https://github.com/scikit-bio/scikit-bio/issues/1397)).
* Added parallel beta diversity calculation support via `skbio.diversity.block_beta_diversity`. The issue and idea is discussed in ([1181](https://github.com/scikit-bio/scikit-bio/issues/1181), while the actual code changes are in [#1352](https://github.com/scikit-bio/scikit-bio/pull/1352)).

Backward-incompatible changes [stable]
* The constructor API for `Sequence` and its child classes (including `GrammaredSequence`, `RNA`, `DNA`, `Protein`) are changed from `(sequence, metadata=None, positional_metadata=None, lowercase=False)` to `(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False)`

The changes are made to allow these classes to adopt `IntervalMetadata` object for interval features on the sequence. The `interval_metadata` parameter is added imediately after `positional_metadata` instead of appended to the end, because it is more natural and logical and, more importantly, because it is unlikely in practice to break user code. A user's code would break only if they had supplied `metadata`, `postional_metadata`, and `lowercase` parameters positionally. In the unlikely event that this happens, users will get an error telling them a bool isn't a valid `IntervalMetadata` type, so it won't silently produce buggy behavior.

Backward-incompatible changes [experimental]
* Modifying basis handling in `skbio.stats.composition.ilr_inv` prior to checking for orthogonality. Now the basis is strictly assumed to be in the Aitchison simplex.
* `DistanceMatrix.from_iterable` default behavior is now to validate matrix by computing all pairwise distances. Pass `validate=False` to get the previous behavior (no validation, but faster execution).([1343](https://github.com/scikit-bio/scikit-bio/issues/1343)).
* GenBank I/O now parses sequence features into the attribute of `interval_metadata` instead of `positiona_metadata`. And the key of `FEATURES` is removed from `metadata` attribute.

Performance enhancements
* `TreeNode.shear` was rewritten for approximately a 25% performance increase. ([1399](https://github.com/scikit-bio/scikit-bio/pull/1399))
* The `IntervalMetadata` allows dramatic decrease in memory usage in reading GenBank files of feature rich sequences. ([1159](https://github.com/scikit-bio/scikit-bio/issues/1159))

Bug fixes

* `skbio.tree.TreeNode.prune` and implicitly `skbio.tree.TreeNode.shear` were not handling a situation in which a parent was validly removed during pruning operations as may happen if the resulting subtree does not include the root. Previously, an `AttributeError` would raise as `parent` would be `None` in this situation.
* numpy linking was fixed for installation under El Capitan.
* A bug was introduced in 1398 into `TreeNode.prune` and fixed in 1416 in which, under the special case of a single descendent existing from the root, the resulting children parent references were not updated. The cause of the bug was a call made to `self.children.extend` as opposed to `self.extend` where the former is a `list.extend` without knowledge of the tree, while the latter is `TreeNode.extend` which is able to adjust references to `self.parent`.

Miscellaneous

* Removed deprecated functions from `skbio.util`: `is_casava_v180_or_later`, `remove_files`, and `create_dir`.
* Removed deprecated `skbio.Sequence.copy` method.

0.5.0

**IMPORTANT**: scikit-bio is no longer compatible with Python 2. scikit-bio is compatible with Python 3.4 and later.

Features
* Added more descriptive error message to `skbio.io.registry` when attempting to read without specifying `into` and when there is no generator reader. ([1326](https://github.com/scikit-bio/scikit-bio/issues/1326))
* Added support for reference tags to `skbio.io.format.stockholm` reader and writer. ([1348](https://github.com/scikit-bio/scikit-bio/issues/1348))
* Expanded error message in `skbio.io.format.stockholm` reader when `constructor` is not passed, in order to provide better explanation to user. ([1327](https://github.com/scikit-bio/scikit-bio/issues/1327))
* Added `skbio.sequence.distance.kmer_distance` for computing the kmer distance between two sequences. ([913](https://github.com/scikit-bio/scikit-bio/issues/913))
* Added `skbio.sequence.Sequence.replace` for assigning a character to positions in a `Sequence`. ([1222](https://github.com/scikit-bio/scikit-bio/issues/1222))
* Added support for `pandas.RangeIndex`, lowering the memory footprint of default integer index objects. `Sequence.positional_metadata` and `TabularMSA.positional_metadata` now use `pd.RangeIndex` as the positional metadata index. `TabularMSA` now uses `pd.RangeIndex` as the default index. Usage of `pd.RangeIndex` over the previous `pd.Int64Index` [should be transparent](http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#range-index), so these changes should be non-breaking to users. scikit-bio now depends on pandas >= 0.18.0 ([1308](https://github.com/scikit-bio/scikit-bio/issues/1308))
* Added `reset_index=False` parameter to `TabularMSA.append` and `TabularMSA.extend` for resetting the MSA's index to the default index after appending/extending.
* Added support for partial pairwise calculations via `skbio.diversity.partial_beta_diversity`. ([1221](https://github.com/scikit-bio/scikit-bio/issues/1221), [#1337](https://github.com/scikit-bio/scikit-bio/pull/1337)). This function is immediately deprecated as its return type will change in the future and should be used with caution in its present form (see the function's documentation for details).
* `TemporaryFile` and `NamedTemporaryFile` are now supported IO sources for `skbio.io` and related functionality. ([1291](https://github.com/scikit-bio/scikit-bio/issues/1291))
* Added `tree_node_class=TreeNode` parameter to `skbio.tree.majority_rule` to support returning consensus trees of type `TreeNode` (the default) or a type that has the same interface as `TreeNode` (e.g. `TreeNode` subclasses) ([1193](https://github.com/scikit-bio/scikit-bio/pull/1193))
* `TreeNode.from_linkage_matrix` and `TreeNode.from_taxonomy` now support constructing `TreeNode` subclasses. `TreeNode.bifurcate` now supports `TreeNode` subclasses ([1193](https://github.com/scikit-bio/scikit-bio/pull/1193))
* The `ignore_metadata` keyword has been added to `TabularMSA.iter_positions` to improve performance when metadata is not necessary.
* Pairwise aligners in `skbio.alignment` now propagate per-sequence `metadata` objects (this does not include `positional_metadata`).

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]
* `TabularMSA.append` and `TabularMSA.extend` now require one of `minter`, `index`, or `reset_index` to be provided when incorporating new sequences into an MSA. Previous behavior was to auto-increment the index labels if `minter` and `index` weren't provided and the MSA had a default integer index, otherwise error. Use `reset_index=True` to obtain the previous behavior in a more explicit way.
* `skbio.stats.composition.ancom` now returns two `pd.DataFrame` objects, where it previously returned one. The first contains the ANCOM test results, as before, and the second contains percentile abundances of each feature in each group. The specific percentiles that are computed and returned is controlled by the new `percentiles` parameter to `skbio.stats.composition.ancom`. In the future, this second `pd.DataFrame` will not be returned by this function, but will be available through the [contingency table API](https://github.com/scikit-bio/scikit-bio/issues/848). ([#1293](https://github.com/scikit-bio/scikit-bio/issues/1293))
* `skbio.stats.composition.ancom` now performs multiple comparisons correction by default. The previous behavior of not performing multiple comparisons correction can be achieved by passing ``multiple_comparisons_correction=None``.
* The ``reject`` column in the first ``pd.DataFrame`` returned from `skbio.stats.composition.ancom` has been renamed ``Reject null hypothesis`` for clarity. ([1375](https://github.com/scikit-bio/scikit-bio/issues/1375))

Bug fixes
* Fixed row and column names to `biplot_scores` in the `OrdinationResults` object from `skbio.stats.ordination`. This fix affect the `cca` and `rda` methods. ([1322](https://github.com/scikit-bio/scikit-bio/issues/1322))
* Fixed bug when using `skbio.io.format.stockholm` reader on file with multi-line tree with no id. Previously this raised an `AttributeError`, now it correctly handles this type of tree. ([1334](https://github.com/scikit-bio/scikit-bio/issues/1334))
* Fixed bug when reading Stockholm files with GF or GS features split over multiple lines. Previously, the feature text was simply concatenated because it was assumed to have trailing whitespace. There are examples of Stockholm files with and without trailing whitespace for multi-line features, so the `skbio.io.format.stockholm` reader now adds a single space when concatenating feature text without trailing whitespace to avoid joining words together. Multi-line trees stored as GF metadata are concatenated as they appear in the file; a space is not added when concatenating. ([1328](https://github.com/scikit-bio/scikit-bio/issues/1328))
* Fixed bug when using `Sequence.iter_kmers` on empty `Sequence` object. Previously this raised a `ValueError`, now it returns
an empty generator.
* Fixed minor bug where adding sequences to an empty `TabularMSA` with MSA-wide `positional_metadata` would result in a `TabularMSA` object in an inconsistent state. This could happen using `TabularMSA.append` or `TabularMSA.extend`. This bug only affects a `TabularMSA` object *without* sequences that has MSA-wide `positional_metadata` (for example, `TabularMSA([], positional_metadata={'column': []})`).
* `TreeNode.distance` now handles the situation in which `self` or `other` are ancestors. Previosly, a node further up the tree was used resulting in inflated distances. ([807](https://github.com/scikit-bio/scikit-bio/issues/807))
* `TreeNode.prune` can now handle a root with a single descendent. Previously, the root was ignored from possibly having a single descendent. ([1247](https://github.com/scikit-bio/scikit-bio/issues/1247))
* Providing the `format` keyword to `skbio.io.read` when creating a generator with an empty file will now return an empty generator instead of raising `StopIteration`. ([1313](https://github.com/scikit-bio/scikit-bio/issues/1313))
* `OrdinationResults` is now importable from `skbio` and `skbio.stats.ordination` and correctly linked from the documentation ([1205](https://github.com/scikit-bio/scikit-bio/issues/1205))
* Fixed performance bug in pairwise aligners resulting in 100x worse performance than in 0.2.4.

Deprecated functionality [stable]
* Deprecated use of the term "non-degenerate", in favor of "definite". `GrammaredSequence.nondegenerate_chars`, `GrammaredSequence.nondegenerates`, and `GrammaredSequence.has_nondegenerates` have been renamed to `GrammaredSequence.definite_chars`, `GrammaredSequence.definites`, and `GrammaredSequence.has_definites`, respectively. The old names will be removed in scikit-bio 0.5.2. Relevant affected public classes include `GrammaredSequence`, `DNA`, `RNA`, and `Protein`.

Deprecated functionality [experimental]
* Deprecated function `skbio.util.create_dir`. This function will be removed in scikit-bio 0.5.1. Please use the Python standard library
functionality described [here](https://docs.python.org/2/library/os.html#os.makedirs). ([833](https://github.com/scikit-bio/scikit-bio/issues/833))
* Deprecated function `skbio.util.remove_files`. This function will be removed in scikit-bio 0.5.1. Please use the Python standard library
functionality described [here](https://docs.python.org/2/library/os.html#os.remove). ([833](https://github.com/scikit-bio/scikit-bio/issues/833))
* Deprecated function `skbio.util.is_casava_v180_or_later`. This function will be removed in 0.5.1. Functionality moved to FASTQ sniffer.
([833](https://github.com/scikit-bio/scikit-bio/issues/833))

Miscellaneous
* When installing scikit-bio via `pip`, numpy must now be installed first ([1296](https://github.com/scikit-bio/scikit-bio/issues/1296))

0.4.2

Minor maintenance release. **This is the last Python 2.7 compatible release. Future scikit-bio releases will only support Python 3.**

Features
* Added `skbio.tree.TreeNode.bifurcate` for converting multifurcating trees into bifurcating trees. ([896](https://github.com/scikit-bio/scikit-bio/issues/896))
* Added `skbio.io.format.stockholm` for reading Stockholm files into a `TabularMSA` and writing from a `TabularMSA`. ([967](https://github.com/scikit-bio/scikit-bio/issues/967))
* scikit-bio `Sequence` objects have better compatibility with numpy. For example, calling `np.asarray(sequence)` now converts the sequence to a numpy array of characters (the same as calling `sequence.values`).
* Added `skbio.sequence.distance` subpackage for computing distances between scikit-bio `Sequence` objects ([913](https://github.com/scikit-bio/scikit-bio/issues/913))
* Added ``skbio.sequence.GrammaredSequence``, which can be inherited from to create grammared sequences with custom alphabets (e.g., for use with TabularMSA) ([1175](https://github.com/scikit-bio/scikit-bio/issues/1175))
* Added ``skbio.util.classproperty`` decorator

Backward-incompatible changes [stable]
* When sniffing or reading a file (`skbio.io.sniff`, `skbio.io.read`, or the object-oriented `.read()` interface), passing `newline` as a keyword argument to `skbio.io.open` now raises a `TypeError`. This backward-incompatible change to a stable API is necessary because it fixes a bug (more details in bug fix section below).
* When reading a FASTQ or QSEQ file and passing `variant='solexa'`, `ValueError` is now raised instead of `NotImplementedError`. This backward-incompatible change to a stable API is necessary to avoid creating a spin-locked process due to [a bug in Python](https://bugs.python.org/issue25786). See [#1256](https://github.com/scikit-bio/scikit-bio/issues/1256) for details. This change is temporary and will be reverted to `NotImplementedError` when the bug is fixed in Python.

Backward-incompatible changes [experimental]
* `skbio.io.format.genbank`: When reading GenBank files, the date field of the LOCUS line is no longer parsed into a `datetime.datetime` object and is left as a string. When writing GenBank files, the locus date metadata is expected to be a string instead of a `datetime.datetime` object ([1153](https://github.com/scikit-bio/scikit-bio/issues/1153))
* `Sequence.distance` now converts the input sequence (`other`) to its type before passing both sequences to `metric`. Previous behavior was to always convert to `Sequence`.

Bug fixes
* Fixed bug when using `Sequence.distance` or `DistanceMatrix.from_iterable` to compute distances between `Sequence` objects with differing `metadata`/`positional_metadata` and passing `metric=scipy.spatial.distance.hamming` ([1254](https://github.com/scikit-bio/scikit-bio/issues/1254))
* Fixed performance bug when computing Hamming distances between `Sequence` objects in `DistanceMatrix.from_iterable` ([1250](https://github.com/scikit-bio/scikit-bio/issues/1250))
* Changed `skbio.stats.composition.multiplicative_replacement` to raise an error whenever a large value of `delta` is chosen ([1241](https://github.com/scikit-bio/scikit-bio/issues/1241))
* When sniffing or reading a file (`skbio.io.sniff`, `skbio.io.read`, or the object-oriented `.read()` interface), passing `newline` as a keyword argument to `skbio.io.open` now raises a `TypeError`. The file format's `newline` character will be used when opening the file. Previous behavior allowed overriding the format's `newline` character but this could cause issues with readers that assume newline characters are those defined by the file format (which is an entirely reasonable assumption). This bug is very unlikely to have surfaced in practice as the default `newline` behavior is *universal newlines mode*.
* DNA, RNA, and Protein are no longer inheritable because they assume an IUPAC alphabet.
* `DistanceMatrix` constructor provides more informative error message when data contains NaNs ([1276](https://github.com/scikit-bio/scikit-bio/issues/1276))

Miscellaneous
* Warnings raised by scikit-bio now share a common subclass ``skbio.util.SkbioWarning``.

Page 2 of 4

Releases

Has known vulnerabilities

Previous Next

Scikit-bio

Page 2 of 4

0.5.4

0.5.3

0.5.2

0.5.1

0.5.0

0.4.2

Page 2 of 4

Links

Releases