You can install by `pip install pythainlp==3.0.0b0`.
Documentation: [https://pythainlp.github.io/dev-docs/index.html](https://pythainlp.github.io/dev-docs/index.html
)
Report bug: [https://github.com/PyThaiNLP/pythainlp/issues](https://github.com/PyThaiNLP/pythainlp/issues)
See [PyThaiNLP 3.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/545) #545
If you want to contributing to PyThaiNLP, you can read [Contributing to PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md).
News
> Since PyThaiNLP 3.0, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.2.
> We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.
What is new?
Deprecation and other API changes
- Deprecated syllable_tokenize. `syllable_tokenize` is deprecated, use `subword_tokenize` instead
- `pythainlp.tag.named_entity.ThaiNameTagger` is change to `pythainlp.tag.thainer.ThaiNameTagger`. This old class will be deprecated in PyThaiNLP version 3.1.
Augment
- Add Thai Text Augmentation
Corpus
- Fix lots of misspellings in dictionary (words_th.txt)
- Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on `default_db.json` and you dont load last thainer model from Internet.
Tag
- Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add NER class - `NER` class for Named-entity recognizer tasks.
Translate
- Add `pythainlp.translate.Translate` Class
- Add Chinese-Thai Machine Translation
Tokenization
- Tokenize repeating dots and commas from numbers
- Fix token_max_len bug that makes it always zero
- Tokenize repeating dots and commas from numbers (fix 461)
- Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
- Add SEFR CUT to pythainlp
- Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add nlpo3
Transliterate
- Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
- Manually merge update-royin branch with dev branch to add O-ANG rule
- Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Add pythainlp.transliterate.puan
Word Vector
- Fix token_max_len bug that makes it always zero
- Add `pythainlp.word_vector.WordVector`
Spell
- Add more spelling engine
- Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
Generate
- Add pythainlp.generate
Tool
- Add misspell module
Other
- Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
- Update requirements from ssg 0.0.6 to ssg 0.0.8
- Spoonerism: Add supports words more 3 syllables
- Add maiyamok; This function is preprocessing MaiYaMok in Thai sentence.
Contributors
<a href="https://github.com/PyThaiNLP/pythainlp/graphs/contributors">
<img src="https://contributors-img.firebaseapp.com/image?repo=PyThaiNLP/pythainlp" />
</a>
Thanks all the [contributors](https://github.com/PyThaiNLP/pythainlp/graphs/contributors). (Image made with [contributors-img](https://contributors-img.firebaseapp.com))
If you want to contributing to PyThaiNLP, you can read [Contributing to PyThaiNLP](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md).
PyThaiNLP ThaiNLP