Uttut

Latest version: v1.4.10

Safety actively analyzes 629788 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

1.3.3

Add a new operation: CustomWordTokenizer

This operation tokenizes the input string according to the given user words.
If the substring of the input string matches the user words, it would be chunked as a single token.
Otherwise, the substring would be tokenized as a list of characters.

1.3.2

Add a new operation: Token2IndexwithHash.

This operation maps input tokens to indices given token2index dictionary.
If the token is not in the given dictionary, we hash the token and mod it with
the size of the dictionary.

1.3.1

Main Modification:
Added new Operator PunctuationExceptEndpointToWhitespace

1.3.0

1. The interface of `Operator` has Huge breaking.
- The inputs of `Operator.transform` are changed. (output_sequence, labels -> output_sequence)
- The outputs of `Operator.transform` are changed.
(output_sequence, output_labels, realigner -> output_sequence, label_aligner)
2. `LabelAligner` substitutes `Realigner`.
3. Add lighter Pipe transformation - `transform_sequence`.
4. Cythonize the elements of edit, including `Replacement`, `ReplacmentGroup`, `Span` and `SpanGroup`.
5. Cythonize label propagation.
6. Add document for `Transformer`.

Note: the modification of 1~5 can cut time spending on Pipe transformation by more than 80%.

1.2.0

1. Add building blocks for BERT tokenizer construction, including `AddWhitespaceAroundCJK`, `AddWhitespaceAroundPunctuation`, `MergeWhiteSpaceCharacters`, `StripWhiteSpaceCharacters`,
`StripAccentToken`, `WhiteSpaceTokenizer` and `SpanSubwords`.
2. Bert pipes are created in `uttut/pipeline/bert/`.

1.1.0

1. Implement Operator `__eq__` : Operators can compare.
2. Refactor common tests for Operators: more extensible to add test function.
3. Add Lowercase Operator: Convert characters to lowercase.
4. Add checkpoints: uttut pipe can output intermediate result by adding checkpoints.
For example,
python
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('op_1', checkpoint='result_of_1')
>>> p.add('op_2')

>>> _, _, _, _, intermediate = p.transform(...)
>>> intermediate.get_from_checkpoint('result_of_1')
output the intermediate result of op_1 including output_sequence, entity_labels

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.