Somajo

Latest version: v2.4.2

Safety actively analyzes 628918 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 9

2.2.3

- Improvements to tokenization: Roman ordinals, abbreviation “Art.”
preceding a number, certain units of measurement at the end of a
sentence (e.g. km/h).

2.2.2

- Bugfix: Command-line option --sentence_tag implies option --split_sentences.

2.2.1

- Bugfix: Command-line option --strip-tags implies option --xml.

2.2.0

- New feature: Prune XML tags and their contents from the input before
tokenization (via the command line option --prune TAGNAME1 --prune
TAGNAME2 … or by passing prune_tags=["TAGNAME1", "TAGNAME2", …] to
tokenize_xml or tokenize_xml_file). This can be useful when
processing HTML files, e.g. for removing any <script> and <style>
tags from the input.

2.1.6

- Recognize more URLs without protocol.
- Fix a small bug in implementation of doubly linked lists.

2.1.5

- Split sequences of hashtags without spaces.
- Add legal abbreviations (issue 21).

Page 2 of 9

Releases

Has known vulnerabilities

Previous Next

Somajo

Page 2 of 9

2.2.3

2.2.2

2.2.1

2.2.0

2.1.6

2.1.5

Page 2 of 9

Links

Releases