PyUp Safety actively tracks 295,363 Python packages for vulnerabilities and notifies you when to upgrade.
Fixed - Fix result conversion process.extract (see 79)
Changed - string_metric.normalized_levenshtein supports now all weights - when different weights are used for Insertion and Deletion the strings are not swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported. - replace C++ implementation with a Cython implementation. This has the following advantages: - The implementation is less error prone, since a lot of the complex things are done by Cython - slighly faster than the current implementation (up to 10% for some parts) - about 33% smaller binary size - reduced compile time - Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer - Add max argument to hamming distance - Add support for whole Unicode range to utils.default_process Performance - replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
Fixed - The bitparallel LCS algorithm in fuzz.partial_ratio did not find the longest common substring properly in some cases. The old algorithm is used again until this bug is fixed.
Changed - string_metric.normalized_levenshtein supports now the weights (1, 1, N) with N >= 1 Performance Improvements - The Levenshtein distance with the weights (1, 1, >2) do now use the same implementation as the weight (1, 1, 2), since `Substitution > Insertion + Deletion` has no effect Fixed - fix uninitialized variable in bitparallel Levenshtein distance with the weight (1, 1, 1)
Changed - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions Performance Improvements - Added bit-parallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bit-parallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` Deprecated - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. Added - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff Fixed - multiple bugs in extractOne when used with a scorer, that's not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalization causing zero division
Fixed - utf8 usage in the copyright header caused problems with python2.7 on some platforms (see 70)
Fixed - when a custom processor like `lambda s: s` was used with any of the methods inside fuzz.* it always returned a score of 100. This release fixes this and adds a better test coverage to prevent this bug in the future.
Added - added hamming distance metric in the levenshtein module Changed - improved performance of default_process by using lookup table
Fixed - Add missing virtual destructor that caused a segmentation fault on Mac Os
Added - C++11 Support - manylinux
Fixed - Levenshtein was not imported from \_\_init\_\_ - The reference count of a Python Object inside process.extractOne was decremented to early
Improved - process.extractOne exits early when a score of 100 is found. This way the other strings do not have to be preprocessed anymore.
Fixed - string objects passed to scorers had to be strings even before preprocessing them. This was changed, so they only have to be strings after preprocessing similar to process.extract/process.extractOne Improved - process.extractOne is now implemented in C++ making it a lot faster - When token_sort_ratio or partial_token_sort ratio is used inprocess.extractOne the words in the query are only sorted once to improve the runtime Changed - process.extractOne/process.extract do now return the index of the match, when the choices are a list. - process.extractIndices got removed, since the indices are now already returned by process.extractOne/process.extract
Fixed - fix documentation of process.extractOne (see 48)
Changed - Added wheels for - CPython 2.7 on windows 64 bit - CPython 2.7 on windows 32 bit - PyPy 2.7 on windows 32 bit
Fixed - fix bug in partial_ratio (see 43)
Fixed - fix inconsistency with fuzzywuzzy in partial_ratio when using strings of equal length
Fixed - MSVC has a bug and therefore crashed on some of the templates used. This Release simplifies the templates so compiling on msvc works again
Improved - partial_ratio is using the Levenshtein distance now, which is a lot faster. Since many of the other algorithms use partial_ratio, this helps to improve the overall performance
Fixed - fix partial_token_set_ratio returning 100 all the time
Changed - add rapidfuzz.\_\_author\_\_, rapidfuzz.\_\_license\_\_ and rapidfuzz.\_\_version\_\_
Fixed - do not use auto junk when searching the optimal alignment for partial_ratio
Changed - support for python 2.7 added 40 - add wheels for python2.7 (both pypy and cpython) on MacOS and Linux
Changed - wheels are now build for Python3.9 aswell Fixed - tuple scores in process.extractOne are now supported 39