Spacy-conll

Latest version: v3.4.0

Safety actively analyzes 613649 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

3.4.0

User matgrioni rightfully pointed out that the default fields in the library are not the real CoNLL-U fields. They
should be in all caps, and for xpostag it should be XPOS and for upostag UPOS. Additionally, the user asked for more
control over these fields. This release accommodates that request.

- **[formatter]** Breaking change: `CONLL_FIELD_NAMES` now is in line with the CoNLL-U descriptor field names:
"ID", "FORM", "LEMMA", "UPOS", "XPOS", "FEATS", "HEAD", "DEPREL", "DEPS", "MISC". These are the new default fields.
- **[formatter]** New parameter: `field_names` that allows you to override the fields that were described above.
Simply add a dictionary of `{"default field name": "new field name"}`, e.g. `{"UPOS": "upostag"}

3.3.0

Since spaCy 3.2.0, the data that is passed to a spaCy pipeline has become more strict. This means that passing
a list of pretokenized tokens (`["This", "is", "a", "pretokenized", "sentence"]`) is not accepted anymore. Therefore,
the `is_tokenized` option needed to be adapted to reflect this. It is still possible to pass a string where tokens
are separated by whitespaces, e.g. `"This is a pretokenized sentence"`, which will continue to work for spaCy and
stanza. Support for pretokenized data has been dropped for UDPipe.

Specific changes:

- **[conllparser]** Breaking change: `is_tokenized` is not a valid argument to `ConllParser` any more.
- **[utils/conllparser]** Breaking change: when using UDPipe, pretokenized data is not supported any more.
- **[utils]** Breaking change: `SpacyPretokenizedTokenizer.__call__` does not support a list of tokens any more.

3.2.0

- **[conllformatter]** Fixed an issue where `SpaceAfter=No` was not added correctly to tokens
- **[conllformatter]** Added `ConllFormatter` as an entry point, which means that you do not have to import
`spacy_conll` anymore when you want to add the pipe to a parser! spaCy will know where to look for the CoNLL
formatter when you use `nlp.add_pipe("conll_formatter")` without you having to import the component manually
- **[conllformatter]** Now adds the component constructor on a construction function rather than directly on the class
as recommended by spacy. The formatter has also been re-written as a dataclass
- **[conllformatter/utils]** Moved `merge_dicts_strict` to utils, outside the formatter class
- **[conllparser]** Make ConllParser directly importable from the root of the library, i.e.,
`from spacy_conll import ConllParser`
- **[init_parser]** Allow users to exclude pipeline components when using the spaCy parser with the
`exclude_spacy_components` argument
- **[init_parser]** Fixed an issue where disabling sentence segmentation would not work if your model does
not have a parser
- **[init_parser]** Enable more options when using stanza in terms of pre-segmented text. Now you can also disable
sentence segmentation for stanza (but still do tokenization) with the `disable_sbd` option
- **[utils]** Added SpacyDisableSentenceSegmentation as an entry-point custom component so that you can use it in your
own code, by calling `nlp.add_pipe("disable_sbd", before="parser")`

3.1.0

- **[conllparser]** The CoNLLParser can now parse a given CoNLL string or text file into a spaCy Doc.
([14](https://github.com/BramVanroy/spacy_conll/pull/14) Parse conllu 2 spacy object, contributed by
[shaked571](https://github.com/shaked571))

3.0.2

- **[conllparser]** Fix: fixed an issue with no_split_on_newline in combination with `nlp.pipe`

3.0.1

- **[conllparser]** Fix: make sure the parser also runs if stanza and UDPipe are not installed

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.