Spacy-conll

Latest version: v3.4.0

Safety actively analyzes 627182 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

3.0.0

- **[general]** Breaking change: spaCy v3 required (closes [8](https://github.com/BramVanroy/spacy_conll/issues/8))
- **[init_parser]** Breaking change: in all cases, `is_tokenized` now disables sentence segmentation
- **[init_parser]** Breaking change: no more default values for parser or model anywhere. Important to note here that
spaCy does not work with short-hand codes such as `en` any more. You have to provide the full model name, e.g.
``en_core_web_sm``
- **[init_parser]** Improvement: models are automatically downloaded for Stanza and UDPipe
- **[cli]** Reworked the position of the CLI script in the directory structure as well as the arguments. Run
`parse-as-conll -h` for more information.
- **[conllparser]** Made the [`ConllParser`](spacy_conll/parser.py) class available as a utility to easily create a
wrapper for a spaCy-like
parser which can return the parsed CoNLL output of a given file or text
- **[conllparser,cli]** Improvements to usability of `n_process`. Will try to figure out whether multiprocessing
is available for your platform and if not, tell you so. Such a priori error messages can be disabled, with
`ignore_pipe_errors`, both on the command line as in ConllParser's parse methods

2.1.0

Preparing for v3 release

- Last version to support spaCy v2. New versions will require spaCy v3
- Last version to support ``spacy-stanfordnlp``. ``spacy-stanza`` is still supported

2.0.0

**Fully reworked version!**

- Tested support for both `spacy-stanza` and `spacy-udpipe`! (Not included as a dependency, install manually)
- Added a useful utility function `init_parser` that can easily initialise a parser together with the custom
pipeline component. (See the README or [`examples`](examples/).)
- Added the `disable_pandas` flag the formatter class in case you would want to disable setting the pandas
attribute even when pandas is installed.
- Added custom properties for Tokens as well. So now a Doc, its sentence Spans as well as Tokens have custom attributes
- Reworked datatypes of output. In version 2.0.0 the data types are as follows:
- `._.conll`: raw CoNLL format
- in `Token`: a dictionary containing all the expected CoNLL fields as keys and the parsed properties as
values.
- in sentence `Span`: a list of its tokens' `._.conll` dictionaries (list of dictionaries).
- in a `Doc`: a list of its sentences' `._.conll` lists (list of list of dictionaries).
- `._.conll_str`: string representation of the CoNLL format
- in `Token`: tab-separated representation of the contents of the CoNLL fields ending with a newline.
- in sentence `Span`: the expected CoNLL format where each row represents a token. When
`ConllFormatter(include_headers=True)` is used, two header lines are included as well, as per the
[`CoNLL format`](https://universaldependencies.org/format.html#sentence-boundaries-and-comments).
- in `Doc`: all its sentences' `._.conll_str` combined and separated by new lines.
- `._.conll_pd`: ``pandas`` representation of the CoNLL format
- in `Token`: a `Series` representation of this token's CoNLL properties.
- in sentence `Span`: a `DataFrame` representation of this sentence, with the CoNLL names as column
headers.
- in `Doc`: a concatenation of its sentences' `DataFrame`'s, leading to a new a `DataFrame` whose
index is reset.
- `field_names` has been removed, assuming that you do not need to change the column names of the CoNLL properties
- Removed the `Spacy2ConllParser` class
- Many doc changes, added tests, and a few examples

1.3.0

- **IMPORTANT**: This will be the last release that supports the deprecated `Spacy2ConllParser` class!
- Community addition (KoichiYasuoka): add `SpaceAfter=No` to the Misc field when applicable.
- Fixed failing tests

1.2.0

- **BREAKING**: `._.conll` now outputs a dictionary for sentences `fieldname: [value1, value2...]`, and
a list of such dictionaries for a `Doc`
- Added a `conversion_maps` argument where one can define a mapping to have better control over the model's tagset
(see the advanced example in README.md)
- Tests for usage with `spacy-stanfordnlp`
- Better documentation, including advanced example

1.1.0

Include dependencies in `setup.py` rather than expecting users to install dependencies manually.

Page 2 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.