Fugashi

Latest version: v1.3.2

Safety actively analyzes 630217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

1.0

This release does not include any major changes to the code. The main purpose of this release is to make it clear that the API has reached a point where it can remain stable moving forward. While there will surely be more patches to clean things up or add minor features, I don't have any major changes planned.

This release does include one small change: previously, `__repr__` marked UNKs. This behavior is useful in some situations, but it's easier to add it to generic behavior than take it out, so I removed it. Now you can (mostly) reconstruct the input with `''.join([str(nn) for nn in nodes])`.

Thanks for using fugashi, and if there's anything you'd like to see in it please feel free to open an issue.

1.0.0

0.2.0

This isn't a drastic release, but since I've been dragging out the patch numbers it seemed like a good time to bump the minor version. This is v0.2.0! :tada:

The first feature in this release is the addition of **command line scripts.** Since it's possible to install fugashi without MeCab, you might not have a command-line binary. This fixes that so you can use **fugashi** as a replacement for mecab. There's also the **fugashi-info** script, which is similar to `mecab -D` in that it prints dictionary information. I hope it will be useful when dealing with bugs and installation issues.

The other feature is that **Tagger instances are now callable.** One of the best features of fugashi is it makes it much easier to work with MeCab nodes, but the function associated with that - `parseToNodeList` - had an unfortunately long name. I didn't want to call it `parse` since that already has meaning in MeCab, but giving it a different name felt odd... so I realized the easiest thing is to make the Tagger instance itself callable. Here's an example of the change this makes possible:

python
from fugashi import Tagger
tagger = Tagger()

before
for word in tagger.parseToNodeList(text):
print(word.surface)

after
for word in tagger(text):
print(word.surface)


Feels better, doesn't it? I imagine this will be particularly helpful for compact expressions like list comprehensions. And `parseToNodeList` is still there, so existing code can be used unmodified.

Lately I've been working more on optimizing SudachiPy than fugashi, but there are still ease-of-use improvements to be made here, and if it works here it can be useful in other tokenizers too. If there's anything you'd like to see let me know.

0.1.12

This release adds support for installing UniDic from PyPI, whether the easy-to-install `unidic-lite` or the full-fledged `unidic` package. Special thanks to chezou for helping with testing on Windows, which had quoting issues due to backslashes in paths.

This release greatly simplifies installing and using fugashi. Assuming no major issues are found, the next release should be 1.0.0.

0.1.11

This release includes a fix for builds on OSX. See 16 for details; thanks to HiromuHota for the report and help with the fix.

0.1.10

This release includes a number of small fixes from 0.1.9 and two more significant changes.

Unidic 26 Field Format Support

Unidic has a surprising variety of formats, and the 26-field variety wasn't previously supported. This format includes kana accent information and is notably used in binary distribution of Unidic 2.1.2.

Support for Python 3.5, 3.6

Support for these versions was initially removed due to their short remaining lifespan and lack of a `default` option in the `namedtuple` constructor. tamuhey made the necessary changes to get them working so they're supported for now; thanks!

Other Changes

- dummy mecabrc specification for bundled Unidic support (still a work in progress)
- test fixes and documentation
- deal with comma separate values inside fields

Upcoming Changes

I'm working on creating a bundled version of Unidic. Modern versions of Unidic are too large to distribute via PyPI, so I'm figuring out the best way to distribute the data.

Page 2 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.