Recordlinkage

Latest version: v0.16

Safety actively analyzes 630052 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.8.0

- Add additional arguments to the function that downloads and loads the
krebsregister data. The argument `missing_values` is used to fill missing
values. Default: nothing is done. The argument `shuffle` is used to
shuffle the records. Default is True.
- Remove the lastest traces of the old package name. The new package name is
'Python Record Linkage Toolkit'
- Better error messages when there are only matches or non-matches are passed
to train the classifier.
- Add AirSpeedVelocity tests to test the performance.
- Compare for deduplication fixed. It was broken.
- Parameterized tests for the `Compare` class and its algorithms. Making use
of `nose-parameterized` module.
- Update documentation about contributing.
- Bugfix/improvement when blocking on multiple columns with missing values.
- Fix bug 29. Package
not working with pandas 0.18 and 0.17. Dropped support pandas 0.17 and fixed
support for 0.18. Also added multi-dendency tests for TravisCI.
- Support for dedicated deduplication algorithms
- Special algorithm for full index in case of finding duplicates. Performce is
100x better.
- Function `max_number_of_pairs` to get the maximum number of pairs.
- `low_memory` for compare class.
- Improved performance in case of comparing a large number of record pairs.
- New documentation about custom algorithms
- New documentation about the use of classifiers.
- Possible to compare arrays and series directly without using labels.
- Make a dataframe with random comparison vectors with the
`binary_comparisons` in the `recordlinkage.datasets.random` module.
- Set KMeans cluster centers by hand.
- Various documentation updates and improvements.
- Jellyfish is now a required dependency. Fixes bug 30.
- Added `tox.ini` to test packaging and installation of package.
- Drop requirements.txt file.
- Many small fixes and changes. Most of the changes cover the `Compare`
module. Especially label handling is improved.

0.7.2

0.7.1

0.6.0

This version includes the following updates:
- Reformatting the code such that it follows PEP8.
- Add Travis-CI and codecov support.
- Switch to distributing wheels.
- Fix bugs with depreciated pandas functions. `__sub__` is no longer used for computing the difference of Index objects. It is now replaced by ``INDEX.difference(OTHER_INDEX).
- Exclude pairs with NaN's on the index-key in Q-gram indexing.
- Add tests for krebsregister dataset.
- Fix Python3 bug on krebsregister dataset.
- Improve unicode handling in phonetic encoding functions.
- Strip accents with the `clean` function.
- Add documentation
- Bug for random indexing with incorrect arguments fixed and tests added.
- Improved deployment workflow
- And much more

0.5.0

- Batch comparing added. Signifant speed improvement.
- rldatasets are now included in the package itself.
- Added an experimental gender imputation tool.
- Blocking and SNI skip missing values
- No longer need for different index names
- FEBRL datasets included
- Unit tests for indexing and comparing improved
- Documentation updated

0.4.0

- Fixes a serious bug with deduplication (thanks to https://github.com/dserban).
- Fixes undesired behaviour for sorted neighbourhood indexing with missing values.
- Add new datasets to the package like Febrl datasets
- Move Krebsregister dataset to this package.
- Improve and add some tests
- Various documentation updates

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.