- Clean word lists in `pythainlp.corpus` (remove duplicates, etc.) - Fix/add return type hinting for functions in `pythainlp.corpus` - Fix deprecated inline flag for regular expression in `pythainlp.corpus.tnc` (Thai National Corpus) - Bug fix: reorder condition checks in `pythainlp.tokenize.dict_trie` so it catch `Trie` before `Iterable`
2.0.4
- `word_tokenize()`'s argument `whitespaces` is now `keep_whitespace` to make is more explicit, default behavior is to keep whitespaces - `word_tokenize()` can now take a custom dictionary throught `custom_dict` parameter - `dict_word_tokenize()` will be deprecated soon
2.0.3
- Fix TCC (Thai Textbook Corpus) corpus always downloading new file issue - Words and their frequencies from TTC (Thai Textbook Corpus) now has a local copy at `ttc_freq.txt` inside `pythainlp.corpus`. - Other refactoring and code improvements, including ones related to subword tokenization (Thai Character Cluster / TCC and ETCC), see 193
2.0.2
- Fixed tree map - Subword tokeniser documentation improvement https://github.com/PyThaiNLP/pythainlp/pull/190
2.0.1
- Add Tokenizer from pythainlp.tokenize.Tokenizer 79432c2 - NER fixes, code cleaning, and type hinting 186