Konoha

Latest version: v5.5.6

Safety actively analyzes 630169 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

5.2.0

Not secure
konoha v5.2.0 is now available.
This version introduces a change to make sentence tokenizer configurable (159).
You can see the detail in [README](https://github.com/himkt/konoha#sentence-level-tokenization).

1. sentence splitter

python
sentence = "私は猫だ。名前なんてものはない.だが,「かわいい。それで十分だろう」。"

tokenizer = SentenceTokenizer(period=".")
print(tokenizer.tokenize(sentence))
=> ['私は猫だ。名前なんてものはない.', 'だが,「かわいい。それで十分だろう」。']


2. bracket expression

python
sentence = "私は猫だ。名前なんてものはない。だが,『かわいい。それで十分だろう』。"

tokenizer = SentenceTokenizer(
patterns=SentenceTokenizer.PATTERNS + [re.compile(r"『.*?』")],
)
print(tokenizer.tokenize(sentence))
=> ['私は猫だ。', '名前なんてものはない。', 'だが,『かわいい。それで十分だろう』。']


core feature
- Configurable sentence tokenizer (159)

dependencies
- Remove `poetry-dynamic-versioning` (158)

5.1.0

Not secure
Other
- Update notebook (157)

Integration
- Breaking/drop integration (153)

Bug
- Fix poetry/setuptools version in Dockerfiles (151)

Dependencies
- Make requests required (156)
- Bump fastapi from 0.54.2 to 0.65.2 (155)

5.0.1

Not secure
This PR contains several minor fixes.

api
- Use async methods to avoid resource competition (147)

other
- Skip tests using AWS credential when it is not provided (149)
- Upgrade mypy (146)

5.0.0

Not secure
We release konoha v5.0.0. This version includes several major interface changes.

🚨 Breaking changes

Remove an option `with_postag`

Before

python
tokenizer_with_postag = WordTokenizer(tokenizer="mecab", with_postag=True)
tokenizer_without_postag = WordTokenizer(tokenizer="mecab", with_postag=False)


After

`with_postag` was simply removed.
Note that the option was also removed from API.

python
tokenizer = WordTokenizer(tokenizer="mecab")



Add `/api/v1/batch_tokenize` and prohibit users to pass `texts` to `/api/v1/tokenize`

Konoha 4.x.x allows users to pass `texts` to `/api/v1/tokenize`.
From 5.0.0, we provide the new endpoint `/api/v1/batch_tokenize` for batch tokenization.

Before

bash
curl localhost:8000/api/v1/tokenize \
-X POST \
-H "Content-Type: application/json" \
-d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'


After

bash
curl localhost:8000/api/v1/batch_tokenize \
-X POST \
-H "Content-Type: application/json" \
-d '{"tokenizer": "mecab", "texts": ["自然言語処理"]}'


---

core feature
- Remove postag information from `__repr__` (144)
- Remove `with_postag` from WordTokenizer (141)
- Remove konoa.konoha_token (140)
- Extract batch tokenization from WordTokenizer.tokenize (137)

other
- Introduce rich (143)
- Import libraries in initializers of tokenizer classes (142)
- Update tests (136)

api
- Change way to receive endpoint (139)
- Add endpoint `/v1/api/batch_tokenize` to konoha API (138)
- Support all options available for WordTokenizer in API server (135)

4.6.5

Not secure
Thanks to the contiribution by altescy, Konoha v6.4.5 supports UniDic for MeCab!

python
>>> from konoha import WordTokenizer
>>> [important] you have to include `unidic` in file path
>>> tk = WordTokenizer(system_dictionary_path="mecab/dict/mecab-unidic")
>>> tk.tokenize("オレンジジュースを飲む")
[オレンジ, ジュース, を, 飲む]


core feature
- Support UniDic format for MeCabTokenizer (132) thanks altescy!

documentation
- Add colab badge (130)
- chore: Fix typo in README (124)

integration
- Upgrade dependency for AllenNLP (131)

other
- Refactoring: cleanup word_tokenizers (129)
- Cleanup Dockerfiles (125)

api
- Use `app.state` for caching objects (128)
- Use `--factory` to launch app server with factory-style imports (127)
- Return detailed information of token in konoha API (126)

4.6.4

Not secure
[beta] New feature

![output-palette](https://user-images.githubusercontent.com/5164000/110263957-80d69f00-7ffb-11eb-864a-0d6e5b982971.gif)

`WordTokenizer` now supports a new argument `endpoint`.
You can use konoha without installing tokenizers on your computer.


Changes

api

- Update API module: change endpoint path and upgrade Ubuntu to 20.04 (120)
- Feature/remote tokenization (123)

other

- Packaging/poetry dynamic versioning (118)
- Update max-line-length for linters and remove poetry-dynamic-versioning.substitution (119)
- Update Dockerfile to reduce image size (122)

documentation

- Update README to add description of breaking change (121)

Page 3 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.