- find_ngrams in the util module did not properly match
- conllable is now properly included in wildcard imports from pyconll.
- Issue when loading a CoNLL file over a network if the file contained
UTF-8 characters. requests default assumes ASCII enconding on
- The Token columns deps and feats were not properly sorted by
attribute (either numeric index or case invariant
lexicographic sort) on serialization
- Clearer and more consise documentation
- find_ngrams now returns the matched tokens as the last element of
the yielded tuple.
- Document and paragraph ids on Sentences
- Line numbers on Tokens and Sentences
- Equality comparison on Tokens and Sentences. These types are mutable
and implementing equality (with no hash overriding) causes issues
for API clients.
- SentenceTree module. This functionaliy was moved to the Sentence
class method to_tree.
- to_tree method on Sentence that returns the Tree representing the
Sentence dependency structure
- Updates to requirements.txt to patch Jinja2 and requests
- Parsing of underscore's for the form and lemma field, would
automatically default to None, rather than the intended behavior.
- When used on Windows, the default encoding of Windows-1252 was used
when loading CoNLL-U files, however, CoNLL-U is UTF-8. This is
- _Getting Started_ page on the documentation to make easier for
- Versioning on docs page which had not been properly updated
- Some documentation errors
- requests version used in requirements.txt was insecure and updated
to newer version
- The pyconll.tree module was not properly included before in setup.py
- pylint to build process
- Conllable abstract base class to mark CoNLL serializable components
- Tree data type construction of a sentence
- Linting patches suggested by pylint.
- Removed _end_line_number from Sentence constructor. This is an
internal patch, as this parameter was not meant to be used
- New, improved, and clearer documentation
- Update of requests dependency due to security flaw
- Removed test packages from final shipped package.
- There is now a FormatError to help make debugging easier if the
internal data of a Token is put into an invalid state. This error
will be seen on running Tokenconll.
- Certain token fields with empty values, were not output when calling
Tokenconll and were instead ignored. This situation now causes
- Stricter parsing and validation of general CoNLL guidelines.
- DEPS parsing was broken before and assumed that there was less
information than is actually possible in the UD format. This means
that now deps is a tuple with cardinality 4.
- Fixed issue with submodules not being packaged in build
- Ability to easily load CoNLL files from a network path (url)
- Some parsing validation. Before the error was not caught up front so
the error could unexpectedly later show up.
- Sentence slicing had an issue before if either the start or end
- More documentation and examples.
- Conll is now a MutableSequence, so it handles methods beyond its
implementation as well as defined by python.
- Some small bug fixes with parsing the token dicts.
- Issues with documentation since docstrings were not in RST. Fixed by
using napoleon sphinx extension
- A little more docs
- More README info
- Better examples
- Installation issues again with wheel when using pip.
- Installation issues when using pip
- More documentation
- Util package for convenient and common logic
- Documentation which can be found here.
- Small documentation changes on methods.
- Everything. This is the first release of this package. The most
notable absence is documentation which will be coming in a