Recordlinkage

Latest version: v0.16

Safety actively analyzes 630052 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.11.1

- Fix installation issue. Submodule 'preprocessing' was not added to the
source distribution.

0.11.0

- The submodule 'standardise' is renamed. The new name is 'preprocessing'.
The submodule 'standardise' will get deprecated in a next version.
- Deprecation errors were not visible for many users. In this version, the
errors are better visible.
- Improved and new logs for indexing, comparing and classification.
- Faster comparing of string variables. Thanks Joel Becker.
- Changes make it possible to pickle Compare and Index objects. This makes it
easier to run code in parallel. Tests were added to ensure that pickling
remains possible.
- Important change. MultiIndex objects with many record pairs were split into
pieces to lower memory usage. In this version, this automatic splitting is
removed. Please split the data yourself.
- Integer indexing. Blog post will follow on this.
- The metrics submodule has changed heavily. This will break with the previous
version.
- repr() and str() will return informative information for index and compare
objects.
- It is possible to use abbreviations for string similarity methods. For example
'jw' for the Jaro-Winkler method.
- The FEBRL dataset loaders can now return the true links as a
pandas.MultIndex for each FEBRL dataset. This option is disabled by default.
See the [FEBRL datasets][febrl_datasets] for details.
- Fix issue with automatic recognision of license on Github.
- Various small improvements.

[febrl_datasets]: http://recordlinkage.readthedocs.io/en/latest/ref-datasets.html#recordlinkage.datasets.load_febrl1

Note: In the next release, the Pairs class will get removed. Migrate now.

0.10.1

- print statement in the geo compare algorithm removed.
- String, numeric and geo compare functions now raise directly when an
incorrect algorithm name is passed.
- Fix unit test that failed on Python 2.7.

0.10.0

- A new compare API. The new Compare class no longer takes the datasets and
pairs as arguments. The actual computation is now performed when calling
`.compute(PAIRS, DF1, DF2)`. The documentation is updated as well, but
still needs improvement.
- Two new string similarity measures are added: Smith Waterman
(smith_waterman) and Longest Common Substring (lcs). Thanks to Joel Becker
and Jillian Anderson from the Networks Lab of the University of Waterloo.
- Added and/or updated a large amount of unit tests.
- Various small improvements.

0.9.0

- A new index API. The new index API is no longer a single class
(``recordlinkage.Pairs(...)``) with all the functionality in it. The new API
is based on Tensorflow and FEBRL. With the new structure, it easier to
parallise the record linkage process. In future releases, this will be
implemented natively. `See the reference page for more information and migrating. <http://recordlinkage.readthedocs.io/en/latest/ref-index.html>`_
- Significant speed improvement of the Sorted Neighbourhood Indexing
algorithm. Thanks to perryvais (PR 32).
- The function ``binary_comparisons`` is renamed. The new name of the function
is ``binary_vectors``. Documentation added to RTD.
- Added unit tests to test the generation of random comparison vectors.
- Logging module added to separate module logs from user logs. The
implementation is based on Tensorflow.

0.8.1

- Issues solved with rendering docs on ReadTheDocs. Still not clear what is
going on with the `autodoc_mock_imports` in the sphinx conf.py file. Maybe
a bug in sphinx.
- Move six to dependencies.
- The reference part of the docs is split into separate subsections. This
makes the reference better readable.
- The landing page of the docs is slightly changed.

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.