Msprime

Latest version: v1.3.1

Safety actively analyzes 629639 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 10

0.5.0

This is a major update to the underlying data structures in msprime to generalise the information that can be modelled, and allow for data from external sources to be efficiently processed. The new Tables API enables efficient interchange of tree sequence data using numpy arrays. Many updates have also been made to the tree sequence API to make it more Pythonic and general. Most changes are backwards compatible, however.

**Breaking changes**:

- The ``SparseTree.mutations()`` and ``TreeSequence.mutations()`` iterators no longer support tuple-like access to values. For example, code like

for x, u, j in ts.mutations():
print("mutation at position", x, "node = ", u)

will no longer work. Code using the old ``Mutation.position`` and ``Mutation.index`` will still work through deprecated aliases, but new code should access these values through ``Site.position``
and ``Site.id``, respectively.

- The ``TreeSequence.diffs()`` method no longer works. Please use the ``TreeSequence.edge_diffs()`` method instead.

- ``TreeSequence.get_num_records()`` no longer works. Any code using this or the ``records()`` iterator should be rewritten to work with the ``edges()`` iterator and num_edges instead.

- Files stored in the HDF5 format will need to upgraded using the ``msp upgrade`` command.

**New features**:

- The API has been made more Pythonic by replacing (e.g.) ``tree.get_parent(u)`` with ``tree.parent(u)``, and
``tree.get_total_branch_length()`` with ``tree.total_branch_length``. The old forms have been maintained as deprecated aliases. (64)

- Efficient interchange of tree sequence data using the new Tables API. This consists of classes representing the various tables (e.g. ``NodeTable``) and some utility functions (such as ``load_tables``, ``sort_tables``, etc).

- Support for a much more general class of tree sequence topologies. For example, trees with multiple roots are fully supported.

- Substantially generalised mutation model. Mutations now occur at specific sites, which can be associated with zero to many mutations. Each site has an ancestral state (any character string) and each mutation a derived state (any character string).

- Substantially updated documentation to rigorously define the underlying data model and requirements for imported data.

- The ``variants()`` method now returns a list of alleles for each site, and genotypes are indexes into this array. This is both consistent with existing usage and works with the newly generalised mutation model, which allows arbitrary strings of characters as mutational states.

- Add the formal concept of a sample, and distinguished from 'leaves'. Change ``tracked_leaves``, etc. to ``tracked_samples`` (225). Also rename ``sample_size`` to ``num_samples`` for consistency (227).

- The simplify() method returns subsets of a large tree sequence.

- TreeSequence.first() returns the first tree in sequence.

- Windows support. Msprime is now routinely tested on Windows as part of the suite of continuous integration tests.

- Newick output is not supported for more general trees. (117)

- The ``genotype_matrix`` method allows efficient access to the full genotype matrix. (306)

- The variants iterator no longer uses a single buffer for genotype data, removing a common source of error (253).

- Unicode and ASCII output formats for ``SparseTree.draw()``.

- ``SparseTree.draw()`` renders tree in the more conventional 'square shoulders' format.

- ``SparseTree.draw()`` by default returns an SVG string, so it can be easily displayed in a Jupyter notebook. (204)

- Preliminary support for a broad class of site-based statistics, including Patterson's f-statistics, has been added, through the `SiteStatCalculator`, and its branch length analog, `BranchLengthStatCalculator`. The interface is still in development, and is expected may change.

**Bug fixes**:

- Duplicate site no longer possible (159)

- Fix for incorrect population sizes in DemographyDebugger (66).

**Deprecated**:

- The ``records`` iterator has been deprecated, and the underlying data model has moved away from the concept of coalescence records. The structure of a tree sequence is now defined in terms of a set of nodes
and edges, essentially a normlised version of coalescence records.

- Changed ``population_id`` to ``population`` in various DemographicEvent classes for consistency. The old ``population_id`` argument is kept as a deprecated alias.

- Changed ``destination`` to ``dest`` in MassMigrationEvent. The old ``destination`` argument is retained as a deprecated alias.

- Changed ``sample_size`` to ``num_samples`` in TreeSequence and SparseTree. The older versions are retained as deprecated aliases.

- Change ``get_num_leaves`` to ``num_samples`` in SparseTree. The ``get_num_leaves`` method (and other related methods) that have been retained for backwards compatability are semantically incorrect,
in that they now return the number of **samples**. This should have no effect on existing code, since samples and leaves were synonymous. New code should use the documented ``num_samples`` form.

- Accessing the ``position`` attribute on a ``Mutation`` or ``Variant`` object is now deprecated, as this is a property of a ``Site``.

- Accessing the ``index`` attribute on a ``Mutation`` or ``Variant`` object is now deprecated. Please use ``variant.site.id`` instead. In general, objects with IDs (i.e., derived from tables) now have an ``id`` field.

- Various ``get_`` methods in TreeSequence and SparseTree have been replaced by more Pythonic alternatives.

0.5.0b2

This release completes the documentation and API changes for the 0.5.0 series, and is a pre-release for testing purposes.

0.5.0b1

This is a pre-release for version 0.5.0, which is a major update to the msprime API. This beta release is intended as a preview for the new [tree sequence interchange APIs](http://msprime.readthedocs.io/en/latest/interchange.html), and also a means for existing users to test their code.

Large changes have been made under the hood in to enable us to handle external input and much more general tree sequences. There have also been many updates to the existing API, which will be listed in the final release. There should be no breaking changes to existing code, **except for** one case.

The ``set_mutations`` method is no longer supported, but is replaced by the much more powerful and general tables API. Please see the [tutorial](http://msprime.readthedocs.io/en/latest/tutorial.html#editing-tree-sequences) for an example of how to use this new API

0.4.0

Major release providing new functionality and laying groundwork for
upcoming functionality.

**Breaking changes**:
- The HDF5 file format has been changed to allow for non-binary trees
and to improve performance. It is now both smaller and faster to
load. However, msprime cannot directly load tree sequence files
written by older versions. The `msp upgrade` utility has been
developed to provide an upgrade path for existing users, so that
files written by older versions of msprime can be converted to the
newer format and read by version 0.4.x of msprime.
- The tuples returned by the `mutations` method contains an element.
This will break code doing things like

for pos, node in ts.mutations():
print(pos, node)

For better forward compatibility, code should use named attributes
rather than positional access:

for mutation in ts.mutations():
print(mutation.position, mutation.node)

- Similarly, the undocumented `variants` method has some major changes:
1. The returned tuple has two new values, `node` and `index`
in the middle of the tuple (but see the point above about using
named attributes).
2. The returned genotypes are by default numpy arrays. To revert
to the old behaviour of returning Python bytes objects, use the
`as_bytes` argument to the `variants()` method.

**New features**:
- Historical samples. Using the `samples` argument to `simulate`
users can specify the location and time of all samples explicitly.
- HDF5 file upgrade utility `msp upgrade`
- Support for non-binary trees in the tree sequence, and relaxation
of the requirements on input tree sequences using the read_txt()
function.
- Integration with numpy, with zero-copy access to the low-level C API.
- Documented the variants() method that provides access to the sample
genotypes as either numpy arrays or Python bytes objects.
- New LdCalculator class that allows very fast calculation of r^2 values.
- Initial support for threading.
- The values returned mutations() method now also contain an `index`
attribute. This makes many operations simpler.
- New TreeSequence.get_time() method that returns the time a sample
was sampled at.

**Performance improvements**:
- File load times substantially reduced by pre-computing and storing
traversal indexes.
- O(1) implementation of TreeSequence.get_num_trees()
- Improved control of enabled tree features in TreeSequence.trees()
method using the `leaf_lists` and `leaf_counts` arguments.

**Bug fixes**:
- Fixed a precision problem with DemographyDebugger. 37
- Segfault on large haplotypes. 29

0.3.2

Feature release adding new import and export features to the API
and CLI.
- New `TreeSequence.write_records` and `TreeSequence.write_mutations`
methods to serialise a tree sequence in a human readable text format.
- New `msprime.load_txt()` method that parses the above formats, and
allows msprime to read in data from external sources.
- New `TreeSequence.write_vcf` method to write mutation information
in VCF format.
- Miscellaneous documentation fixes.

0.3.1

Feature release adding population related methods to the API.
- New `TreeSequence.get_population(sample_id)` method.
- New `TreeSequence.get_samples(population_id)` method.
- Added the optional `samples` argument to the
`TreeSequence.get_pairwise_diversity` method.
- Fixed a potential low-level buffer overrun problem.

Page 6 of 10

Releases

Has known vulnerabilities

Previous Next

Msprime

Page 6 of 10

0.5.0

0.5.0b2

0.5.0b1

0.4.0

0.3.2

0.3.1

Page 6 of 10

Links

Releases