nagisa Changelog

0.2.10

- Fix hard-coding process for noun id conversion in tagger.py https://github.com/taishi-i/nagisa/commit/e638343e21559a072471c03ff672a67cf8196b77
- [Hidden the dynet log](https://github.com/taishi-i/dynet/blob/master/dynet/init.cc) that appears when `import nagisa` is used in Python 3.8, 3.9, 3.10, 3.11, and 3,12 on Linux
- Provide [the nagisa-demo page](https://huggingface.co/spaces/taishi-i/nagisa-demo) in Hugging Face Spaces
- Provide [the stopwords for nagisa](https://huggingface.co/datasets/taishi-i/nagisa_stopwords) in Hugging Face Datasets
- Update [read the docs documents](https://nagisa.readthedocs.io/en/latest)
- Compatible with Python 3.12 on Linux

- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, 3.11, **3,12**) to [PyPI](https://pypi.org/project/nagisa/0.2.10/) for Linux
- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, **3.11**) to [PyPI](https://pypi.org/project/nagisa/0.2.10/) for macOS Intel
- Add Python wheels (3.6, 3.7, 3.8) to [PyPI](https://pypi.org/project/nagisa/0.2.10/) for Windows

0.2.9

- Improve the bottleneck in part-of-speech tagging caused by 'list and append', problem resolved by using 'set and add'
![nagisa_v0 2 9](https://github.com/taishi-i/nagisa/assets/12726223/83602d46-85a5-4ed4-b78f-8f4d87afc46d)

Until now, there was an issue where the processing time would slow down as the results analyzed by the following code increased in [tagger.py](https://github.com/taishi-i/nagisa/blob/master/nagisa/tagger.py).

python
tids = []
for w in words:
if w in self._word2postags:
w2p = self._word2postags[w]
else:
w2p = [0]
if self.use_noun_heuristic is True:
if w.isalnum() is True:
if w2p == [0]:
w2p = [self._pos2id[u'名詞']]
else:
bottleneck is here!
w2p.append(self._pos2id[u'名詞'])
w2p = list(set(w2p))
tids.append(w2p)

By changing to [the following code](https://github.com/taishi-i/nagisa/blob/4179ebe94bf5743a6d7a4965156dff185a359803/nagisa/tagger.py#L122), we have resolved the issue of the processing slowing down.

python
tids = []
for w in words:
w2p = set(self._word2postags.get(w, [0]))
if self.use_noun_heuristic and w.isalnum():
if 0 in w2p:
w2p.remove(0)
w2p.add(2) nagisa.tagger._pos2id["名詞"] = 2
tids.append(list(w2p))

- Fix dash-separated 'description-file' error in setup.cfg to use 'description_file' in [setup.cfg](https://github.com/taishi-i/nagisa/blob/master/setup.cfg)

[metadata]
description_file = README.md

- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, 3.11) to PyPI for Linux
- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10) to PyPI for macOS
- Add Python wheels (3.6, 3.7, 3.8) to PyPI for Windows

0.2.8

- Fix `AttributeError` in nagisa_utils.pyx when tokenizing a text containing [Latin capital letter I with dot above 'İ'](https://unicode.scarfboy.com/?s=U%2B0130)

When tokenizing a text containing 'İ', an `AttributeError` has occurred. This is because, as the following example shows, lowering 'İ' would have changed to the length of 2, and would not have been extracting features correctly.

python
>>> text = "İ" [U+0130]
>>> print(len(text))
1
>>> text = text.lower() [U+0069] [U+0307]
>>> print(text)
'i̇'
>>> print(len(text))
2

To avoid this error, the following preprocess was added to the source code [modification 1](https://github.com/taishi-i/nagisa/blob/0513fa28b5ab1e5cb82f8bbfd6078971a66b50f1/nagisa/nagisa_utils.pyx#L46), [modification 2](https://github.com/taishi-i/nagisa/blob/0513fa28b5ab1e5cb82f8bbfd6078971a66b50f1/nagisa/nagisa_utils.pyx#L54).

python
text = text.replace('İ', 'I')

- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, 3.11) to PyPI for Linux
- Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10) to PyPI for macOS
- Add Python wheels (3.6, 3.7, 3.8) to PyPI for Windows

0.2.7

- Fix `AttributeError: module 'utils'` to rename utils.pyx into nagisa_utils.pyx 14
- Add wheels to PyPI for Linux and Windows users
- Increase test coverage from 92% to 96%
- Fix the problem where min_count (threshold=hp['THRESHOLD']) parameter was not used in train.py

0.2.6

- Increase test coverage from 88% to 92%
- Fix `readFile(filename)` in mecab_system_eval.py for windows users
- Add python3.7 to .travis.yml
- Add a DOI with the data archiving tool Zenodo to README.md
- Add `nagisa-0.2.6-cp36-cp36m-win_amd64.whl` and `nagisa-0.2.6-cp37-cp37m-win_amd64.whl` to PyPI to install nagisa without Build Tools for Windows users 23
- Add `nagisa-0.2.6-*-manylinux1_i686.whl` and `nagisa-0.2.6-*-manylinux1_x86_64.whl` to PyPI to install nagisa for Linux users

0.2.5

- Fix a white space bug in nagisa.decode. This fix resolves an error that occurs when decoding(nagisa.decode) words - contain whitespace.
- Add `__version__` to `__init__.py`
- Add slides link at PyCon JP 2019 to README.md

Nagisa

Page 1 of 2