Toiro

Latest version: v0.0.8

Safety actively analyzes 619599 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.0.8

- add chABSA_dataset to download_corpus method
`datadownloader.download_corpus('chABSA_dataset')`
`train_df, dev_df, test_df = datadownloader.load_corpus('chABSA_dataset')`
- add Python3.8 to travis and GitHub Actions
- fix preprocess.py and test_datadownloader.py

0.0.7

- add three tokenizers ([fugashi-ipadic](https://github.com/polm/fugashi), [fugashi-unidic](https://github.com/polm/fugashi), [tinysegmenter](https://github.com/SamuraiT/tinysegmenter)) to toiro
- add additional_tokenizers to compare
tokenizers.compare(filename, additional_tokenizers)
- add [sample codes](https://github.com/taishi-i/toiro/blob/develop/PyConJP2020/PyConJP2020_Online.ipynb) and [slides](https://speakerdeck.com/taishii/pycon-jp-2020) in [PyCon JP 2020](https://pycon.jp/2020/)
- add python-package.yml to .github/workflows
- fix toiro.\__version\__

0.0.6

- fix a generator error in tokenizer_janome.py due to an update of janome v0.4.0 https://github.com/taishi-i/toiro/commit/e2b3e7388feaa8d22464bab458fd9266cdf4bae5
- fix a failure Build and publish v0.0.5
- add 05_svm_vs_bert_benchmarking_application_tasks_ja.ipynb to examples

0.0.4

- add disable_tokenizers function to `tokenizers.compare`
- fix a bug in the initial release.
- fix error for a long input text in Jumanpp
- add 01_getting_started_ja.ipynb to README.md

0.0.3

- fix a bug in the initial release.
- fix typo: SVMClassifitionModel to SVMClassificationModel
- fix docker example in README.md

0.0.2

This is the first release of this library.

Toiro is a comparison tool of Japanese tokenizers.
- Compare the processing speed of tokenizers
- Compare the words segmented in tokenizers
- Compare the performance of tokenizers by benchmarking application tasks (e.g., text classification)

It also provides useful functions for natural language processing in Japanese.
- Data downloader for Japanese text corpora
- Preprocessor of these corpora
- Text classifier for Japanese text (e.g., SVM, BERT)

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.