- Sanity-check input: Warn if there are extremely long sentences (≥ 500 words) in the input as this might indicate missing sentence boundaries. - Use np.frombuffer() instead of np.fromstring() to fix a DeprecationWarning.
1.6.1
- New option -v/--version to output version information. - Explicitly specify input encoding as UTF-8. - Fixed a bug in progress display.
1.6.0
- New method tag_xml_sentence for simplified processing of SoMaJo's output for XML files. - Updated regular expressions for emojis (taken from SoMaJo). - Fixed a bug where SoMeWeTa could not be installed when numpy was not already there.
1.5.1
- Got rid of FutureWarning about possible nested sets in regular expression.
1.5.0
- Added support for parallel tagging of XML input. - New option --progress for showing tagging progress and remaining time. - Fixed calculation of confidence interval when reporting crossvalidation results.
1.4.0
- Replaced XML parsing with a shallower approach. When tagging an XML file, we do no longer have to keep the whole file in memory. - Minor improvements regarding URLs and emojis.