pythainlp Changelog

2.3.1dev0

**Bug Fixed**
- Fix gensim 546

Documentation: [https://pythainlp.github.io/dev-docs/index.html](https://pythainlp.github.io/dev-docs/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)

See [PyThaiNLP 2.3 change log](https://github.com/PyThaiNLP/pythainlp/issues/445) #445

2.3.0

Documentation: [https://pythainlp.github.io/docs/2.3/index.html](https://pythainlp.github.io/docs/2.3/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)

You can install or upgrade using *pip install -U pythainlp*

See [PyThaiNLP 2.3 change log](https://github.com/PyThaiNLP/pythainlp/issues/445) #445

Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. `pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')` (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer
- 484 Add: model option for `attacut.tokenize()`
- 502 Add: `corpus.util.revise_wordset()` to revise tokenization dictionary
- 503 Add: `NERCut` tokenization engine

Corpus
- **License change:**
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0).
- All language models created by PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by).
- 449 Fix: remove instances with `[` or `]` from etcc.txt
- 467 Add: `corpus.common.provinces()` can now return romanized names
- 476 Add: `thai_family_names()` to get a set of Thai family names
- 487 Fix: `thailand_provinces_th.csv` not found issue
- 492 Fix: remove erroneous `AITT` tag from ORCHID to UD table -- thanks c4n for the fix

POS Tagger
- 464 Add: `LST20` language model for part-of-speech tagging
- 468 Add: port `PerceptronTagger` from NTLK. POS tagging no longer needs NLTK for dependency.
- 478 Update: ORCHID POS tags documentation

Name Entity Tagging
- 526 Update ThaiNER 1.4 to ThaiNER 1.5
- 538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate
- 485 Fixed Romanize failed in some examples
- 511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize
- 523 Add mT5 text summarize to `pythainlp.summarize`

Chunk parser
- 524 Add `pythainlp.tag.chunk`

Util
- 481 Fix: `remove_repeat_vowels()` bug that remove spaces between different vowels
- 483 Add: add `remove()` method to remove a word from a trie -- thanks korakot
- 490 Fix: `thai_strftime()` - normalize output for unsupported directive (running in glibc and musl should produce the same output)
- 512 Add: `emoji_to_thai()` to convert emoji to Thai description -- thanks ppirch for the development
- 513 Add: `thai_keyboard_dist()` to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks ppirch for the development

Thanks all the [contributors](https://github.com/PyThaiNLP/pythainlp/graphs/contributors). (Image made with [contributors-img](https://contributors-img.firebaseapp.com))
<a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors">
<img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" />
</a>

We build Thai NLP.

PyThaiNLP

2.3.0beta1

Documentation: [https://pythainlp.github.io/dev-docs/index.html](https://pythainlp.github.io/dev-docs/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)

See [PyThaiNLP 2.3 change log](https://github.com/PyThaiNLP/pythainlp/issues/445) #445

Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. `pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')` (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer
- 484 Add: model option for `attacut.tokenize()`
- 502 Add: `corpus.util.revise_wordset()` to revise tokenization dictionary
- 503 Add: `NERCut` tokenization engine

Corpus
- **License change:**
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0).
- All language models created by PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by).
- 449 Fix: remove instances with `[` or `]` from etcc.txt
- 467 Add: `corpus.common.provinces()` can now return romanized names
- 476 Add: `thai_family_names()` to get a set of Thai family names
- 487 Fix: `thailand_provinces_th.csv` not found issue
- 492 Fix: remove erroneous `AITT` tag from ORCHID to UD table -- thanks c4n for the fix

POS Tagger
- 464 Add: `LST20` language model for part-of-speech tagging
- 468 Add: port `PerceptronTagger` from NTLK. POS tagging no longer needs NLTK for dependency.
- 478 Update: ORCHID POS tags documentation

Name Entity Tagging
- 526 Update ThaiNER 1.4 to ThaiNER 1.5
- 538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate
- 485 Fixed Romanize failed in some examples
- 511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize
- 523 Add mT5 text summarize to `pythainlp.summarize`

Chunk parser
- 524 Add `pythainlp.tag.chunk`

Util
- 481 Fix: `remove_repeat_vowels()` bug that remove spaces between different vowels
- 483 Add: add `remove()` method to remove a word from a trie -- thanks korakot
- 490 Fix: `thai_strftime()` - normalize output for unsupported directive (running in glibc and musl should produce the same output)
- 512 Add: `emoji_to_thai()` to convert emoji to Thai description -- thanks ppirch for the development
- 513 Add: `thai_keyboard_dist()` to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks ppirch for the development

Links

- Website: https://pythainlp.github.io
- Docs: https://pythainlp.github.io/dev-docs/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues

<a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors">
<img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" />
</a>

Thanks all the [contributors](https://github.com/PyThaiNLP/pythainlp/graphs/contributors). (Image made with [contributors-img](https://contributors-img.firebaseapp.com))

We build Thai NLP.

PyThaiNLP

2.3.0dev1

Documentation: [https://pythainlp.github.io/dev-docs/index.html](https://pythainlp.github.io/dev-docs/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)

See [PyThaiNLP 2.3 change log](https://github.com/PyThaiNLP/pythainlp/issues/445) #445

Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. `pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')` (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer
- 484 Add: model option for `attacut.tokenize()`
- 502 Add: `corpus.util.revise_wordset()` to revise tokenization dictionary
- 503 Add: `NERCut` tokenization engine

Corpus
- **License change:**
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0).
- All language models created by PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by).
- 449 Fix: remove instances with `[` or `]` from etcc.txt
- 467 Add: `corpus.common.provinces()` can now return romanized names
- 476 Add: `thai_family_names()` to get a set of Thai family names
- 487 Fix: `thailand_provinces_th.csv` not found issue
- 492 Fix: remove erroneous `AITT` tag from ORCHID to UD table -- thanks c4n for the fix

POS Tagger
- 464 Add: `LST20` language model for part-of-speech tagging
- 468 Add: port `PerceptronTagger` from NTLK. POS tagging no longer needs NLTK for dependency.
- 478 Update: ORCHID POS tags documentation

Name Entity Tagging
- 526 Update ThaiNER 1.4 to ThaiNER 1.5
- 538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate
- 485 Fixed Romanize failed in some examples
- 511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize
- 523 Add mT5 text summarize to `pythainlp.summarize`

Chunk parser
- 524 Add `pythainlp.tag.chunk`

Util
- 481 Fix: `remove_repeat_vowels()` bug that remove spaces between different vowels
- 483 Add: add `remove()` method to remove a word from a trie -- thanks korakot
- 490 Fix: `thai_strftime()` - normalize output for unsupported directive (running in glibc and musl should produce the same output)
- 512 Add: `emoji_to_thai()` to convert emoji to Thai description -- thanks ppirch for the development
- 513 Add: `thai_keyboard_dist()` to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks ppirch for the development

2.3.0dev0

Documentation: [https://pythainlp.github.io/dev-docs/index.html](https://pythainlp.github.io/dev-docs/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)

See [PyThaiNLP 2.3 change log](https://github.com/PyThaiNLP/pythainlp/issues/445) #445

Deprecation and other API changes
- NER change a ThaiNER model (from ThaiNER 1.4 to ThaiNER 1.5). If you need use ThaiNER 1.4 model, You can use version in ThaiNameTagger class. `pythainlp.tag.named_entity.ThaiNameTagger(version: str = '1.4')` (Docs: https://pythainlp.github.io/dev-docs/api/tag.html#pythainlp.tag.named_entity.ThaiNameTagger)

Tokenizer
- 484 Add: model option for `attacut.tokenize()`
- 502 Add: `corpus.util.revise_wordset()` to revise tokenization dictionary
- 503 Add: `NERCut` tokenization engine

Corpus
- **License change:**
- All corpora, datasets, and documentation created by PyThaiNLP project are now released under [Creative Commons Zero 1.0 Universal Public Domain Dedication License](https://creativecommons.org/publicdomain/zero/1.0/) (CC0).
- All language models created by PyThaiNLP project are released under [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/) (CC-by).
- 449 Fix: remove instances with `[` or `]` from etcc.txt
- 467 Add: `corpus.common.provinces()` can now return romanized names
- 476 Add: `thai_family_names()` to get a set of Thai family names
- 487 Fix: `thailand_provinces_th.csv` not found issue
- 492 Fix: remove erroneous `AITT` tag from ORCHID to UD table -- thanks c4n for the fix

POS Tagger
- 464 Add: `LST20` language model for part-of-speech tagging
- 468 Add: port `PerceptronTagger` from NTLK. POS tagging no longer needs NLTK for dependency.
- 478 Update: ORCHID POS tags documentation

Name Entity Tagging
- 526 Update ThaiNER 1.4 to ThaiNER 1.5
- 538 Add ThaiNameTagger version and add ThaiNER 1.4 support

Transliterate
- 485 Fixed Romanize failed in some examples
- 511 Add Thai W2P (Thai Word-to-Phoneme converter)

Text Summarize
- 523 Add mT5 text summarize to `pythainlp.summarize`

Chunk parser
- 524 Add `pythainlp.tag.chunk`

Util
- 481 Fix: `remove_repeat_vowels()` bug that remove spaces between different vowels
- 483 Add: add `remove()` method to remove a word from a trie -- thanks korakot
- 490 Fix: `thai_strftime()` - normalize output for unsupported directive (running in glibc and musl should produce the same output)
- 512 Add: `emoji_to_thai()` to convert emoji to Thai description -- thanks ppirch for the development
- 513 Add: `thai_keyboard_dist()` to calculate euclidean distance between two characters according to their location on a Thai keyboard layout -- thanks ppirch for the development

2.2.6

This release is a bug fix release.
- Update `pythainlp.tag` docs 492
- `thai_strftime`: Normalize output for unsupported directive 490
- port pickle to json and add lst20 postag model to `pythainlp.corpus` 488

Thanks to the following contributors to 2.2.6: c4n

Thanks to other contributors listed here: https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md

You can install or upgrade using `pip install -U pythainlp`

- GitHub Releases: https://github.com/PyThaiNLP/pythainlp/releases/tag/v2.2.6
- Documentation: https://www.thainlp.org/pythainlp/docs/2.2/
- Tutorials: https://thainlp.org/pythainlp/tutorials/
- GitHub: https://github.com/PyThaiNLP/pythainlp

We build Thai NLP
PyThaiNLP Team

Pythainlp

Page 8 of 21