- Add additional arguments to the function that downloads and loads the
krebsregister data. The argument `missing_values` is used to fill missing
values. Default: nothing is done. The argument `shuffle` is used to
shuffle the records. Default is True.
- Remove the lastest traces of the old package name. The new package name is
'Python Record Linkage Toolkit'
- Better error messages when there are only matches or non-matches are passed
to train the classifier.
- Add AirSpeedVelocity tests to test the performance.
- Compare for deduplication fixed. It was broken.
- Parameterized tests for the `Compare` class and its algorithms. Making use
of `nose-parameterized` module.
- Update documentation about contributing.
- Bugfix/improvement when blocking on multiple columns with missing values.
- Fix bug 29. Package
not working with pandas 0.18 and 0.17. Dropped support pandas 0.17 and fixed
support for 0.18. Also added multi-dendency tests for TravisCI.
- Support for dedicated deduplication algorithms
- Special algorithm for full index in case of finding duplicates. Performce is
100x better.
- Function `max_number_of_pairs` to get the maximum number of pairs.
- `low_memory` for compare class.
- Improved performance in case of comparing a large number of record pairs.
- New documentation about custom algorithms
- New documentation about the use of classifiers.
- Possible to compare arrays and series directly without using labels.
- Make a dataframe with random comparison vectors with the
`binary_comparisons` in the `recordlinkage.datasets.random` module.
- Set KMeans cluster centers by hand.
- Various documentation updates and improvements.
- Jellyfish is now a required dependency. Fixes bug 30.
- Added `tox.ini` to test packaging and installation of package.
- Drop requirements.txt file.
- Many small fixes and changes. Most of the changes cover the `Compare`
module. Especially label handling is improved.