This is an alpha release of scikit-bio. At this stage, major backwards-incompatible API changes can and will happen. Unified I/O with the scikit-bio I/O registry was the focus of this release.
Features
* Added ``strict`` and ``lookup`` optional parameters to ``skbio.stats.distance.mantel`` for handling reordering and matching of IDs when provided ``DistanceMatrix`` instances as input (these parameters were previously only available in ``skbio.stats.distance.pwmantel``).
* ``skbio.stats.distance.pwmantel`` now accepts an iterable of ``array_like`` objects. Previously, only ``DistanceMatrix`` instances were allowed.
* Added ``plot`` method to ``skbio.stats.ordination.OrdinationResults`` for creating basic 3-D matplotlib scatterplots of ordination results, optionally colored by metadata in a ``pandas.DataFrame`` (see [518](https://github.com/scikit-bio/scikit-bio/issues/518)). Also added ``_repr_png_`` and ``_repr_svg_`` methods for automatic display in the IPython Notebook, with ``png`` and ``svg`` properties for direct access.
* Added ``skbio.stats.ordination.assert_ordination_results_equal`` for comparing ``OrdinationResults`` objects for equality in unit tests.
* ``BiologicalSequence`` (and its subclasses) now optionally store Phred quality scores. A biological sequence's quality scores are stored as a 1-D ``numpy.ndarray`` of nonnegative integers that is the same length as the biological sequence. Quality scores can be provided upon object instantiation via the keyword argument ``quality``, and can be retrieved via the ``BiologicalSequence.quality`` property. ``BiologicalSequence.has_quality`` is also provided for determining whether a biological sequence has quality scores or not. See [616](https://github.com/scikit-bio/scikit-bio/issues/616) for more details.
* Added ``BiologicalSequence.sequence`` property for retrieving the underlying string representing the sequence characters. This was previously (and still is) accessible via ``BiologicalSequence.__str__``. It is provided via a property for convenience and explicitness.
* Added ``BiologicalSequence.equals`` for full control over equality testing of biological sequences. By default, biological sequences must have the same type, underlying sequence of characters, identifier, description, and quality scores to compare equal. These properties can be ignored via the keyword argument ``ignore``. The behavior of ``BiologicalSequence.__eq__``/``__ne__`` remains unchanged (only type and underlying sequence of characters are compared).
* Added ``BiologicalSequence.copy`` for creating a copy of a biological sequence, optionally with one or more attributes updated.
* ``BiologicalSequence.__getitem__`` now supports specifying a sequence of indices to take from the biological sequence.
* Methods to read and write taxonomies are now available under ``skbio.tree.TreeNode.from_taxonomy`` and ``skbio.tree.TreeNode.to_taxonomy`` respectively.
* Added ``SequenceCollection.update_ids``, which provides a flexible way of updating sequence IDs on a ``SequenceCollection`` or ``Alignment`` (note that a new object is returned, since instances of these classes are immutable). Deprecated ``SequenceCollection.int_map`` in favor of this new method; it will be removed in scikit-bio 0.3.0.
* Added ``skbio.util.cardinal_to_ordinal`` for converting a cardinal number to ordinal string (e.g., useful for error messages).
* New I/O Registry: supports multiple file formats, automatic file format detection when reading, unified procedural ``skbio.io.read`` and ``skbio.io.write`` in addition to OOP interfaces (``read/write`` methods) on the below objects. See ``skbio.io`` for more details.
- Added "clustal" format support:
* Has sniffer
* Readers: ``Alignment``
* Writers: ``Alignment``
- Added "lsmat" format support:
* Has sniffer
* Readers: ``DissimilarityMatrix``, ``DistanceMatrix``
* Writers: ``DissimilarityMatrix``, ``DistanceMatrix``
- Added "ordination" format support:
* Has sniffer
* Readers: ``OrdinationResults``
* Writers: ``OrdinationResults``
- Added "newick" format support:
* Has sniffer
* Readers: ``TreeNode``
* Writers: ``TreeNode``
- Added "phylip" format support:
* No sniffer
* Readers: None
* Writers: ``Alignment``
- Added "qseq" format support:
* Has sniffer
* Readers: generator of ``BiologicalSequence`` or its subclasses, ``SequenceCollection``, ``BiologicalSequence``, ``NucleotideSequence``, ``DNASequence``, ``RNASequence``, ``ProteinSequence``
* Writers: None
- Added "fasta"/QUAL format support:
* Has sniffer
* Readers: generator of ``BiologicalSequence`` or its subclasses, ``SequenceCollection``, ``Alignment``, ``BiologicalSequence``, ``NucleotideSequence``, ``DNASequence``, ``RNASequence``, ``ProteinSequence``
* Writers: same as readers
- Added "fastq" format support:
* Has sniffer
* Readers: generator of ``BiologicalSequence`` or its subclasses, ``SequenceCollection``, ``Alignment``, ``BiologicalSequence``, ``NucleotideSequence``, ``DNASequence``, ``RNASequence``, ``ProteinSequence``
* Writers: same as readers
Bug fixes
* Removed ``constructor`` parameter from ``Alignment.k_word_frequencies``, ``BiologicalSequence.k_words``, ``BiologicalSequence.k_word_counts``, and ``BiologicalSequence.k_word_frequencies`` as it had no effect (it was never hooked up in the underlying code). ``BiologicalSequence.k_words`` now returns a generator of ``BiologicalSequence`` objects instead of strings.
* Modified the ``Alignment`` constructor to verify that all sequences have the same length, if not, raise an ``AlignmentError`` exception. Updated the method ``Alignment.subalignment`` to calculate the indices only once now that identical sequence length is guaranteed.
Deprecated functionality
* Deprecated ``constructor`` parameter in ``Alignment.majority_consensus`` in favor of having users call ``str`` on the returned ``BiologicalSequence``. This parameter will be removed in scikit-bio 0.3.0.
* Existing I/O functionality deprecated in favor of I/O registry, old functionality will be removed in scikit-bio 0.3.0. All functionality can be found at ``skbio.io.read``, ``skbio.io.write``, and the methods listed below:
* Deprecated the following "clustal" readers/writers:
- ``write_clustal`` -> ``Alignment.write``
- ``parse_clustal`` -> ``Alignment.read``
* Deprecated the following distance matrix format ("lsmat") readers/writers:
- ``DissimilarityMatrix.from_file`` -> ``DissimilarityMatrix.read``
- ``DissimilarityMatrix.to_file`` -> ``DissimilarityMatrix.write``
- ``DistanceMatrix.from_file`` -> ``DistanceMatrix.read``
- ``DistanceMatrix.to_file`` -> ``DistanceMatrix.write``
* Deprecated the following ordination format ("ordination") readers/writers:
- ``OrdinationResults.from_file`` -> ``OrdinationResults.read``
- ``OrdinationResults.to_file`` -> ``OrdinationResults.write``
* Deprecated the following "newick" readers/writers:
- ``TreeNode.from_file`` -> ``TreeNode.read``
- ``TreeNode.from_newick`` -> ``TreeNode.read``
- ``TreeNode.to_newick`` -> ``TreeNode.write``
* Deprecated the following "phylip" writers:
- ``Alignment.to_phylip`` -> ``Alignment.write``
* Deprecated the following "fasta"/QUAL readers/writers:
- ``SequenceCollection.from_fasta_records`` -> ``SequenceCollection.read``
- ``SequenceCollection.to_fasta`` -> ``SequenceCollection.write``
- ``fasta_from_sequences`` -> ``skbio.io.write(obj, into=<file>, format='fasta')``
- ``fasta_from_alignment`` -> ``Alignment.write``
- ``parse_fasta`` -> ``skbio.io.read(<fasta>, format='fasta')``
- ``parse_qual`` -> ``skbio.io.read(<fasta>, format='fasta', qual=<file>)``
- ``BiologicalSequence.to_fasta`` -> ``BiologicalSequence.write``
* Deprecated the following "fastq" readers/writers:
- ``parse_fastq`` -> ``skbio.io.read(<fastq>, format='fastq')``
- ``format_fastq_record`` -> ``skbio.io.write(<fastq>, format='fastq')``
Backward-incompatible changes
* ``skbio.stats.distance.mantel`` now returns a 3-element tuple containing correlation coefficient, p-value, and the number of matching rows/cols in the distance matrices (``n``). The return value was previously a 2-element tuple containing only the correlation coefficient and p-value.
* ``skbio.stats.distance.mantel`` reorders input ``DistanceMatrix`` instances based on matching IDs (see optional parameters ``strict`` and ``lookup`` for controlling this behavior). In the past, ``DistanceMatrix`` instances were treated the same as ``array_like`` input and no reordering took place, regardless of ID (mis)matches. ``array_like`` input behavior remains the same.
* If mismatched types are provided to ``skbio.stats.distance.mantel`` (e.g., a ``DistanceMatrix`` and ``array_like``), a ``TypeError`` will be raised.
Miscellaneous
* Added git timestamp checking to checklist.py, ensuring that when changes are made to Cython (.pyx) files, their corresponding generated C files are also updated.
* Fixed performance bug when instantiating ``BiologicalSequence`` objects. The previous runtime scaled linearly with sequence length; it is now constant time when the sequence is already a string. See [623](https://github.com/scikit-bio/scikit-bio/issues/623) for details.
* IPython and six are now required dependencies.