Errant

Latest version: v3.0.0

Safety actively analyzes 621622 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

3.0.0

1. Finally updated ERRANT to support Spacy 3!
* I specifically tested Spacy 3.2 - 3.7 and found a negligible difference in performance on the BEA19 dev set.
* This update also comes with an unexpected 10-20% speed gain.

2. Added a `.gitignore` file. [39](https://github.com/chrisjbryant/errant/issues/39)

3. Renamed `master` branch to `main`.

2.3.3

1. Missed one case of changing Levenshtein to rapidfuzz... Now fixed.

2.3.2

1. Add more details to verbose ERRANT scoring. [29](https://github.com/chrisjbryant/errant/pull/29)
2. Simplified the new rapidfuzz functions. [35](https://github.com/chrisjbryant/errant/pull/35)

2.3.1

1. Replaced the dependency on [python-Levenshtein](https://pypi.org/project/python-Levenshtein/) with [rapidfuzz](https://pypi.org/project/rapidfuzz/) to overcome a licensing conflict. ERRANT and its dependencies now all use the MIT license. [#34](https://github.com/chrisjbryant/errant/issues/34)

2.3.0

1. Added some new rules to reduce the number of OTHER-type 1:1 edits and classify them as something else. Specifically, there are now ~40% fewer 1:1 OTHER edits and ~15% fewer n:n OTHER edits overall (tested on the FCE and W&I training sets combined). The changes are as follows:

* A possessive suffix at the start of a merge sequence is now always split:

| Example | people life -> people 's lives |
|---------|------------------------------------------------------------|
| Old | _life_ -> _'s lives_ (R:OTHER) |
| New | _ε_ -> _'s_ (M:NOUN:POSS), _life_ -> _lives_ (R:NOUN:NUM) |

* NUM <-> DET edits are now classified as R:DET; e.g. _one (cat)_ -> _a (cat)_. Thanks to [katkorre](https://github.com/katkorre/ERRANT-reclassification)!

* Changed the string similarity score in the classifier from the Levenshtein ratio to the normalised Levenshtein distance based on the length of the longest input string. This is because we felt some ratio scores were unintuitive; e.g. _smt_ -> _something_ has a ratio score of 0.5 despite the insertion of 6 characters (the new normalised score is 0.33).

* The non-word spelling error rules were updated slightly to take the new normalised Levenshtein score into account. Additionally, dissimilar strings are now classified based on the POS tag of the correction rather than as OTHER; e.g. _amougnht_ -> _number_ (R:NOUN).

* The new normalised Levenshtein score is also used to classify many of the remaining 1:1 replacement edits that were previously classified as OTHER. Many of these are real-word spelling errors (e.g. _their_ <-> _there_), but there are also some morphological errors (e.g. _health_ -> _healthy_) and POS-based errors (e.g. _transport_ -> _travel_). Note that these rules are a little complex and depend on both the similarity score and the length of the original and corrected strings. For example, _form_ -> _from_ (R:SPELL) and _eventually_ -> _finally_ (R:ADV) both have the same similarity score of 0.5 yet are differentiated as different error types based on their string lengths.

2. Various minor updates:
* `out_m2` in `parallel_to_m2.py` and `m2_to_m2.py` is now opened and closed properly. [20](https://github.com/chrisjbryant/errant/pull/20)
* Fixed a bracketing error that deleted a valid edit in rare circumstances. [26](https://github.com/chrisjbryant/errant/issues/26) [#28](https://github.com/chrisjbryant/errant/issues/28)
* Updated the English wordlist.
* Minor changes to the readme.
* Tidied up some code comments.

2.2.3

1. Changed the dependency version requirements in `setup.py` since ERRANT v2.2.x is not compatible with spaCy 3.

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.