Somajo

Latest version: v2.4.2

Safety actively analyzes 628918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 9

1.2.0

Two new options added: With -s/--paragraph_separator, you can specify
how paragraphs are delimited in the input data, i.e. by empty lines or
by single newlines. The --parallelization option makes it possible to
use a pool of worker processes to speed up tokenization.

1.1.2

The example in the documentation is now self-contained: Sample input
has been added and the output will be printed.

1.1.1

The link in the Evaluation section of the Readme now points to the
complete gold standard data.

1.1.0

SoMaJo can now output additional information about the original
spelling of the tokens, i.e. if a token was followed by whitespace or
if a token contained internal whitespace (according to the
tokenization guidelines, things like “: )” get normalized to “:)”). To
use this feature, provide the tokenizer script with the -e option.

1.0.3

This version works around a bug in the regex module that caused
exponential runtimes on certain inputs.

Page 9 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.