Errant

Latest version: v3.0.0

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

2.2.2

1. Added a copy of the NLTK Lancaster stemmer to `errant.en.lancaster` and removed the NLTK dependency. It was overkill to require the entire NLTK package just for this stemmer so we now bundle it with ERRANT.

2. Replaced the deprecated `tokens_from_list` function from spaCy v1 with the `Doc` function from spaCy v2 in `Annotator.parse`.

2.2.1

Fixed key error in the classifier for rare spaCy 2 POS tags: _SP, BES, HVS.

2.2.0

1. ERRANT now works with spaCy v2.2. It is 4x slower, but this change was necessary to make it work on Python 3.7.

2. SpaCy 2 uses slightly different POS tags to spaCy 1 (e.g. auxiliary verbs are now tagged AUX rather than VERB) so I updated some of the merging rules to maintain performance.

2.1.0

1. The character level cost in the sentence alignment function is now computed by the much faster [python-Levenshtein](https://pypi.org/project/python-Levenshtein/) library instead of python's native `difflib.SequenceMatcher`. This makes ERRANT 3x faster!

2. Various minor updates:
* Updated the English wordlist.
* Fixed a broken rule for classifying contraction errors.
* Changed a condition in the calculation of transposition errors to be more intuitive.
* Partially updated the ERRANT POS tag map to match the updated [Universal POS tag map](https://universaldependencies.org/tagset-conversion/en-penn-uposf.html). Specifically, EX now maps to PRON rather than ADV, LS maps to X rather than PUNCT, and CONJ has been renamed CCONJ. I did not change the mapping of RP from PART to ADP yet because this breaks several rules involving phrasal verbs.
* Added an `errant.__version__` attribute.
* Added a warning about using ERRANT with spaCy 2.
* Tidied some code in the classifier.

2.0.0

1. ERRANT has been significantly refactored to accommodate a new API (see README). It should now also be much easier to extend to other languages.

2. Added a `setup.py` script to make ERRANT `pip` installable.

3. The Damerau-Levenshtein alignment code has been rewritten in a much cleaner Python implementation. This also makes ERRANT ~20% faster.

Note: All these changes do **not** affect system output compared with the previous version. For the first `pip` release, we wanted to make sure v2.0.0 was fully compatible with the [BEA-2019 shared task](https://www.cl.cam.ac.uk/research/nl/bea2019st/) on Grammatical Error Correction.

Thanks to [sai-prasanna](https://github.com/sai-prasanna) for inspiring some of these changes!

1.4

1. The `compare_m2.py` evaluation script was refactored to make it easier to use.

2. We tweaked the alignment code and merging rules to not only make ERRANT ~700% faster, but also slightly more accurate.

Specifically, we simplified the lemma cost to not repeatedly call the lemmatiser for different parts-of-speech, and also replaced the character cost with python's native `difflib.SequenceMatcher` instead of a character based Damerau-Levenshtein alignment.

This significantly increased the speed, but also slightly decreased performance (~0.5 F1 worse), so we additionally revisited the merging rules. The new implementation now processes the largest combinations of adjacent non-matches first, instead of processing one alignment at a time, and now also features some new or slightly modified rules (see `scripts/align_text.py` for more information).

The differences between the old and new version are summarised in the following table.

| Dataset | Sents | Setting | P | R | F1 | Time<br>(secs) |
|--------------|------:|-----------:|---------------:|---------------:|-------------------:|----------------:|
| FCE Dev | 2371 | Old<br>New | 82.77<br>84.00 | 85.22<br>85.52 | 83.98<br>**84.75** | 260<br>**40** |
| FCE Test | 2805 | Old<br>New | 83.88<br>85.17 | 85.84<br>85.93 | 84.85<br>**85.55** | 300<br>**45** |
| FCE Train | 30200 | Old<br>New | 82.69<br>84.06 | 85.12<br>85.38 | 83.89<br>**84.72** | 2965<br>**340** |
| CoNLL-2013 | 1381 | Old<br>New | 82.64<br>83.27 | 82.45<br>82.24 | 82.54<br>**82.75** | 315<br>**45** |
| CoNLL-2014.0 | 1312 | Old<br>New | 78.48<br>79.02 | 80.38<br>80.18 | 79.42<br>**79.59** | 350<br>**45** |
| CoNLL-2014.1 | 1312 | Old<br>New | 82.50<br>84.04 | 82.73<br>82.85 | 82.61<br>**83.44** | 385<br>**50** |
| NUCLE | 57151 | Old<br>New | 70.14<br>73.20 | 80.27<br>81.16 | 71.95<br>**76.97** | 7565<br>**725** |

Page 2 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.