Scikit-bio

Latest version: v0.6.0

Safety actively analyzes 621673 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.6.0

Performance enhancements

* Launched the new scikit-bio website: https://scikit.bio. The previous domain names _scikit-bio.org_ and _skbio.org_ continue to work and redirect to the new website.
* Migrated the scikit-bio website repo from the `gh-pages` branch of the `scikit-bio` repo to a standalone repo: [`scikit-bio.github.io`](https://github.com/scikit-bio/scikit-bio.github.io).
* Replaced the [Bootstrap theme](https://sphinx-bootstrap-theme.readthedocs.io/en/latest/) with the [PyData theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) for building documentation using Sphinx. Extended this theme to the website. Customized design elements ([#1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Improved the calculation of Fisher's alpha diversity index (`fisher_alpha`). It is now compatible with optimizers in SciPy 1.11+. Edge cases such as all singletons can be handled correctly. Handling of errors and warnings was improved. Documentation was enriched ([1890](https://github.com/scikit-bio/scikit-bio/pull/1890)).
* Allowed `delimiter=None` which represents whitespace of arbitrary length in reading lsmat format matrices ([1912](https://github.com/scikit-bio/scikit-bio/pull/1912)).

Features

* Added biom-format Table import and updated corresponding requirement files ([1907](https://github.com/scikit-bio/scikit-bio/pull/1907)).
* Added biom-format 2.1.0 IO support ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Added `Table` support to `alpha_diversity` and `beta_diversity` drivers ([1984](https://github.com/scikit-bio/scikit-bio/pull/1984)).
* Implemented a mechanism to automatically build documentation and/or homepage and deploy them to the website ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Added the Benjamini-Hochberg method as an option for FDR correction (in addition to the existing Holm-Bonferroni method) for `ancom` and `dirmult_ttest` ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Added function `dirmult_ttest`, which performs differential abundance test using a Dirichilet multinomial distribution. This function mirrors the method provided by ALDEx2 ([1956](https://github.com/scikit-bio/scikit-bio/pull/1956)).
* Added method `Sequence.to_indices` to convert a sequence into a vector of indices of characters in an alphabet (can be from a substitution matrix) or unique characters observed in the sequence. Supports gap masking and wildcard substitution ([1917](https://github.com/scikit-bio/scikit-bio/pull/1917)).
* Added class `SubstitutionMatrix` to support subsitution matrices for nucleotides, amino acids are more general cases ([1913](https://github.com/scikit-bio/scikit-bio/pull/1913)).
* Added alpha diversity metric `sobs`, which is the observed species richness (S_{obs}) of a sample. `sobs` will replace `observed_otus`, which uses the historical term "OTU". Also added metric `observed_features` to be compatible with the QIIME 2 terminology. All three metrics are equivalent ([1902](https://github.com/scikit-bio/scikit-bio/pull/1902)).
* `beta_diversity` now supports use of Pandas a `DataFrame` index, issue [1808](https://github.com/scikit-bio/scikit-bio/issues/1808).
* Added alpha diversity metric `phydiv`, which is a generalized phylogenetic diversity (PD) framework permitting unrooted or rooted tree, unweighted or weighted by abundance, and an exponent parameter of the weight term ([1893](https://github.com/scikit-bio/scikit-bio/pull/1893)).
* Adopted NumPy's new random generator `np.random.Generator` (see [NEP 19](https://numpy.org/neps/nep-0019-rng-policy.html)) ([#1889](https://github.com/scikit-bio/scikit-bio/pull/1889)).
* SciPy 1.11+ is now supported ([1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).
* Removed IPython as a dependency. Scikit-bio continues to support displaying plots in IPython, but it no longer requires importing IPython functionality ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Made Matplotlib an optional dependency. Scikit-bio no longer requires Matplotlib except for plotting, during which it attempts to import Matplotlib if it is present in the system, and raises an error if not ([1901](https://github.com/scikit-bio/scikit-bio/pull/1901)).
* Ported the QIIME 2 metadata object into skbio. ([1929](https://github.com/scikit-bio/scikit-bio/pull/1929))
* Python 3.12+ is now supported, thank you actapia ([1930](https://github.com/scikit-bio/scikit-bio/pull/1930))
* Introduced native character conversion ([1971])(https://github.com/scikit-bio/scikit-bio/pull/1971)

Backward-incompatible changes [experimental]

* Beta diversity metric `kulsinski` was removed. This was motivated by that SciPy replaced this distance metric with `kulczynski1` in version 1.11 (see SciPy issue [2009](https://github.com/scipy/scipy/issues/2009)), and that both metrics do not return 0 on two identical vectors ([#1887](https://github.com/scikit-bio/scikit-bio/pull/1887)).

Bug fixes

* Fixed documentation interface of `vlr` and relevant functions ([1934](https://github.com/scikit-bio/scikit-bio/pull/1934)).
* Fixed broken link in documentation of Simpson's evenness index. See issue [1923](https://github.com/scikit-bio/scikit-bio/issues/1923).
* Safely handle `Sequence.iter_kmers` where `k` is greater than the sequence length ([1723](https://github.com/scikit-bio/scikit-bio/issues/1723))
* Re-enabled OpenMP support, which has been mistakenly disabled in 0.5.8 ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* `permanova` and `permdist` operate on a `DistanceMatrix` and a grouping object. Element IDs must be synchronized to compare correct sets of pairwise distances. This failed in case the grouping was provided as a `pandas.Series`, because it was interpreted as an ordered `list` and indices were ignored (see issue [1877](https://github.com/scikit-bio/scikit-bio/issues/1877) for an example). Note: `pandas.DataFrame` was handled correctly. This behavior has been fixed with PR [#1879](https://github.com/scikit-bio/scikit-bio/pull/1879)
* Fixed slicing for `TabularMSALoc` on Python 3.12. See issue [1926](https://github.com/scikit-bio/scikit-bio/issues/1926).

Miscellaneous

* Replaced the historical term "OTU" with the more generic term "taxon" (plural: "taxa"). As a consequence, the parameter "otu_ids" in phylogenetic alpha and beta diversity metrics was replaced by "taxa". Meanwhile, the old parameter "otu_ids" is still kept as an alias of "taxa" for backward compatibility. However it will be removed in a future release.
* Revised contributor's guidelines.
* Renamed function `multiplicative_replacement` as `multi_replace` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Renamed parameter `multiple_comparisons_correction` as `p_adjust` of function `ancom` for conciseness ([1988](https://github.com/scikit-bio/scikit-bio/pull/1988)).
* Enabled code coverage reporting via Codecov. See [1954](https://github.com/scikit-bio/scikit-bio/pull/1954).
* Renamed the default branch from "master" to "main". See [1953](https://github.com/scikit-bio/scikit-bio/pull/1953).
* Enabled subclassing of DNA, RNA and Protein classes to allow secondary development.
* Dropped support for NumPy < 1.17.0 in order to utilize the new random generator.
* Use CYTHON by default during build ([1874](https://github.com/scikit-bio/scikit-bio/pull/1874))
* Implemented augmented assignments proposed in issue [1789](https://github.com/scikit-bio/scikit-bio/issues/1789)
* Incorporated Ruff's formatting and linting via pre-commit hooks and GitHub Actions. See PR [1924](https://github.com/scikit-bio/scikit-bio/pull/1924).
* Improved docstrings for functions accross the entire codebase. See [1933](https://github.com/scikit-bio/scikit-bio/pull/1933) and [#1940](https://github.com/scikit-bio/scikit-bio/pull/1940)
* Removed API lifecycle decorators in favor of deprecation warnings. See [1916](https://github.com/scikit-bio/scikit-bio/issues/1916)

0.5.9

Features

* Adding Variance log ratio estimators in `skbio.stats.composition.vlr` and `skbio.stats.composition.pairwise_vlr` ([1803](https://github.com/scikit-bio/scikit-bio/pull/1803))
* Added `skbio.stats.composition.tree_basis` to construct ILR bases from `TreeNode` objects. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))
* `IntervalMetadata.query` now defaults to obtaining all results, see [1817](https://github.com/scikit-bio/scikit-bio/issues/1817).

Backward-incompatible changes [experimental]
* With the introduction of the `tree_basis` object, the ILR bases are now represented in log-odds coordinates rather than in probabilities to minimize issues with numerical stability. Furthermore, the `ilr` and `ilr_inv` functions now takes the `basis` input parameter in terms of log-odds coordinates. This affects the `skbio.stats.composition.sbp_basis` as well. ([1862](https://github.com/scikit-bio/scikit-bio/pull/1862))

Important

* Complex multiple axis indexing operations with `TabularMSA` have been removed from testing due to incompatibilities with modern versions of Pandas. ([1851](https://github.com/scikit-bio/scikit-bio/pull/1851))
* Pinning `scipy <= 1.10.1` ([1851](https://github.com/scikit-bio/scikit-bio/pull/1867))

Bug fixes

* Fixed a bug that caused build failure on the ARM64 microarchitecture due to floating-point number handling. ([1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Never let the Gini index go below 0.0, see [1844](https://github.com/scikit-bio/scikit-bio/issue/1844).
* Fixed bug [1847](https://github.com/scikit-bio/scikit-bio/issues/1847) in which the edge from the root was inadvertantly included in the calculation for `descending_branch_length`

Miscellaneous

* Replaced dependencies `CacheControl` and `lockfile` with `requests` to avoid a dependency inconsistency issue of the former. (See [1863](https://github.com/scikit-bio/scikit-bio/pull/1863), merged in [#1859](https://github.com/scikit-bio/scikit-bio/pull/1859))
* Updated installation instructions for developers in `CONTRIBUTING.md` ([1860](https://github.com/scikit-bio/scikit-bio/pull/1860))

0.5.8

Features

* Added NCBI taxonomy database dump format (`taxdump`) ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* Added `TreeNode.from_taxdump` for converting taxdump into a tree ([1810](https://github.com/scikit-bio/scikit-bio/pull/1810)).
* scikit-learn has been removed as a dependency. This was a fairly heavy-weight dependency that was providing minor functionality to scikit-bio. The critical components have been implemented in scikit-bio directly, and the non-criticial components are listed under "Backward-incompatible changes [experimental]".
* Python 3.11 is now supported.

Backward-incompatible changes [experimental]
* With the removal of the scikit-learn dependency, three beta diversity metric names can no longer be specified. These are `wminkowski`, `nan_euclidean`, and `haversine`. On testing, `wminkowski` and `haversine` did not work through `skbio.diversity.beta_diversity` (or `sklearn.metrics.pairwise_distances`). The former was deprecated in favor of calling `minkowski` with a vector of weights provided as kwarg `w` (example below), and the latter does not work with data of this shape. `nan_euclidean` can still be accessed fron scikit-learn directly if needed, if a user installs scikit-learn in their environment (example below).

counts = [[23, 64, 14, 0, 0, 3, 1],
[0, 3, 35, 42, 0, 12, 1],
[0, 5, 5, 0, 40, 40, 0],
[44, 35, 9, 0, 1, 0, 0],
[0, 2, 8, 0, 35, 45, 1],
[0, 0, 25, 35, 0, 19, 0],
[88, 31, 0, 5, 5, 5, 5],
[44, 39, 0, 0, 0, 0, 0]]

new mechanism of accessing wminkowski
from skbio.diversity import beta_diversity
beta_diversity("minkowski", counts, w=[1,1,1,1,1,1,2])

accessing nan_euclidean through scikit-learn directly
import skbio
from sklearn.metrics import pairwise_distances
sklearn_dm = pairwise_distances(counts, metric="nan_euclidean")
skbio_dm = skbio.DistanceMatrix(sklearn_dm)

Deprecated functionality [experimental]
* `skbio.alignment.local_pairwise_align_ssw` has been deprecated ([1814](https://github.com/scikit-bio/scikit-bio/issues/1814)) and will be removed or replaced in scikit-bio 0.6.0.

Bug fixes
* Use `oldest-supported-numpy` as build dependency. This fixes problems with environments that use an older version of numpy than the one used to build scikit-bio ([1813](https://github.com/scikit-bio/scikit-bio/pull/1813)).

0.5.7

Features

* Introduce support for Python 3.10 ([1801](https://github.com/scikit-bio/scikit-bio/pull/1801)).
* Tentative support for Apple M1 ([1709](https://github.com/scikit-bio/scikit-bio/pull/1709)).
* Added support for reading and writing a binary distance matrix object format. ([1716](https://github.com/scikit-bio/scikit-bio/pull/1716))
* Added support for `np.float32` with `DissimilarityMatrix` objects.
* Added support for method and number_of_dimensions to permdisp reducing the runtime by 100x at 4000 samples, [issue 1769](https://github.com/scikit-bio/scikit-bio/pull/1769).
* OrdinationResults object is now accepted as input for permdisp.

Performance enhancements

* Avoid an implicit data copy on construction of `DissimilarityMatrix` objects.
* Avoid validation on copy of `DissimilarityMatrix` and `DistanceMatrix` objects, see [PR 1747](https://github.com/scikit-bio/scikit-bio/pull/1747)
* Use an optimized version of symmetry check in DistanceMatrix, see [PR 1747](https://github.com/scikit-bio/scikit-bio/pull/1747)
* Avoid performing filtering when ids are identical, see [PR 1752](https://github.com/scikit-bio/scikit-bio/pull/1752)
* center_distance_matrix has been re-implemented in cython for both speed and memory use. Indirectly speeds up pcoa [PR 1749](https://github.com/scikit-bio/scikit-bio/pull/1749)
* Use a memory-optimized version of permute in DistanceMatrix, see [PR 1756](https://github.com/scikit-bio/scikit-bio/pull/1756).
* Refactor pearson and spearman skbio.stats.distance.mantel implementations to drastically improve memory locality. Also cache intermediate results that are invariant across permutations, see [PR 1756](https://github.com/scikit-bio/scikit-bio/pull/1756).
* Refactor permanova to remove intermediate buffers and cythonize the internals, see [PR 1768](https://github.com/scikit-bio/scikit-bio/pull/1768).

Bug fixes

* Fix windows and 32bit incompatibility in `unweighted_unifrac`.

Miscellaneous

* Python 3.6 has been removed from our testing matrix.
* Specify build dependencies in pyproject.toml. This allows the package to be installed without having to first manually install numpy.
* Update hdmedians package to a version which doesn't require an initial manual numpy install.
* Now buildable on non-x86 platforms due to use of the [SIMD Everywhere](https://github.com/simd-everywhere/simde) library.
* Regenerate Cython wrapper by default to avoid incompatibilities with installed CPython.
* Update documentation for the `skbio.stats.composition.ancom` function. ([1741](https://github.com/scikit-bio/scikit-bio/pull/1741))

0.5.6

Features

* Added option to return a capture group compiled regex pattern to any class inheriting ``GrammaredSequence`` through the ``to_regex`` method. ([1431](https://github.com/scikit-bio/scikit-bio/issues/1431))

* Added `Dissimilarity.within` and `.between` to obtain the respective distances and express them as a `DataFrame`. ([1662](https://github.com/scikit-bio/scikit-bio/pull/1662))

* Added Kendall Tau as possible correlation method in the `skbio.stats.distance.mantel` function ([1675](https://github.com/scikit-bio/scikit-bio/issues/1675)).

* Added support for IUPAC amino acid codes U (selenocysteine), O (pyrrolysine), and J (leucine or isoleucine). ([1576](https://github.com/scikit-bio/scikit-bio/issues/1576)

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]

* Changed `skbio.tree.TreeNode.support` from a method to a property.
* Added `assign_supports` method to `skbio.tree.TreeNode` to extract branch support values from node labels.
* Modified the way a node's label is printed: `support:name` if both exist, or `support` or `name` if either exists.

Performance enhancements

Bug fixes

* Require `Sphinx <= 3.0`. Newer Sphinx versions caused build errors. [1719](https://github.com/scikit-bio/scikit-bio/pull/1719)

* * `skbio.stats.ordination` tests have been relaxed. ([1713](https://github.com/scikit-bio/scikit-bio/issues/1713))

* Fixes build errors for newer versions of NumPy, Pandas, and SciPy.

* Corrected a criticial bug in `skbio.alignment.StripedSmithWaterman`/`skbio.alignment.local_pairwise_align_ssw` which would cause the formatting of the aligned sequences to misplace gap characters by the number of gap characters present in the opposing aligned sequence up to that point. This was caused by a faulty implementation of CIGAR string parsing, see [1679](https://github.com/scikit-bio/scikit-bio/pull/1679) for full details.

* Fixes build errors for newer versions of NumPy, Pandas, and SciPy.

* Corrected a criticial bug in `skbio.alignment.StripedSmithWaterman`/`skbio.alignment.local_pairwise_align_ssw` which would cause the formatting of the aligned sequences to misplace gap characters by the number of gap characters present in the opposing aligned sequence up to that point. This was caused by a faulty implementation of CIGAR string parsing, see [1679](https://github.com/scikit-bio/scikit-bio/pull/1679) for full details.

Deprecated functionality [stable]

Deprecated functionality [experimental]

Miscellaneous

* `skbio.diversity.beta_diversity` now accepts a pandas DataFrame as input.

* Avoid pandas 1.0.0 import warning ([1688](https://github.com/scikit-bio/scikit-bio/issues/1688))

* Added support for Python 3.8 and dropped support for Python 3.5.

* This version now depends on `scipy >= 1.3` and `pandas >= 1.0`.

0.5.5

Features

* `skbio.stats.composition` now has methods to compute additive log-ratio transformation and inverse additive log-ratio transformation (`alr`, `alr_inv`) as well as a method to build a basis from a sequential binary partition (`sbp_basis`).

Backward-incompatible changes [stable]

Backward-incompatible changes [experimental]

Performance enhancements

Bug fixes

Deprecated functionality [stable]

Deprecated functionality [experimental]

Miscellaneous
* Python 3.6 and 3.7 compatibility is now supported

* A pytest runner is shipped with every installation ([1633](https://github.com/scikit-bio/scikit-bio/pull/1633))

* The nosetest framework has been replaced in favor of pytest ([1624](https://github.com/scikit-bio/scikit-bio/pull/1624))

* The numpy docs are deprecated in favor of [Napoleon](http://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html) ([#1629](https://github.com/scikit-bio/scikit-bio/pull/1629))

* This version is now compatible with numpy >= 1.17.0 and Pandas >= 0.23. ([1627](https://github.com/scikit-bio/scikit-bio/pull/1627))

Page 1 of 4

Releases

Has known vulnerabilities

Scikit-bio

Page 1 of 4

0.6.0

0.5.9

0.5.8

0.5.7

0.5.6

0.5.5

Page 1 of 4

Links

Releases