Sentencepiece

Latest version: v0.2.0

Safety actively analyzes 621892 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

1.0.0

Releases a new version of Sentencepiece with major refactorings:

* Builds with Bazel
* Re-uses existing open source libraries whenever possible
* Refactors internal dependencies
* New sets of features for configuring tokenizers
* Separation from Tensorflow

0.13.1

0.2.0

Major changes
N/A

New features
- [ALL] Added SentencePieceNormalizer class in C++/Python. It supports almost the equivalent feature of spm_normalize. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L794) [C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L394)
- [ALL] Added SentencePieceProcessor::Normalize method in C++/Python [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L771)
[C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L382)
- [ALL] Added functionality to override the normalization spec before the processing. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L860)

Bug fixes & minor changes
- Introduce better support of using external abseil and protobuf https://github.com/google/sentencepiece/issues/869
- Build universal binary in OSX release package https://github.com/google/sentencepiece/issues/892
- Add the set_min_log_level function to python to change the loglevel from the python wrapper. https://github.com/google/sentencepiece/issues/893
- Uses the logsumexp techniques in marginal probabilities of n-best tokenization to avoid underflow.
- Support Python 3.12 https://github.com/google/sentencepiece/issues/932
- Improves the thread utilization in batch encoding/decoding.
- Fix nasty bug in BPE position encoding.
- Fix bugs in the handling of duplicated bigrams

0.2.0pre1

Major changes
N/A

New features
- [ALL] Added SentencePieceNormalizer class in C++/Python. It supports almost the equivalent feature of spm_normalize. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L794) [C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L394)
- [ALL] Added SentencePieceProcessor::Normalize method in C++/Python [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L771)
[C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L382)
- [ALL] Added functionality to override the normalization spec before the processing. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L860)

Bug fixes & minor changes
- Introduce better support of using external abseil and protobuf https://github.com/google/sentencepiece/issues/869
- Build universal binary in OSX release package https://github.com/google/sentencepiece/issues/892
- Add the set_min_log_level function to python to change the loglevel from the python wrapper. https://github.com/google/sentencepiece/issues/893
- Uses the logsumexp techniques in marginal probabilities of n-best tokenization to avoid underflow.
- Support Python 3.12 https://github.com/google/sentencepiece/issues/932
- Improves the thread utilization in batch encoding/decoding.
- Fix nasty bug in BPE position encoding.
- Fix bugs in the handling of duplicated bigrams

0.1.99

Major changes
N/A

New features
N/A

Bug fixes & minor changes
- [ALL] Fixes the NaN issues in unigram model training: https://github.com/google/sentencepiece/issues/851
- [ALL] Fixes the bug in unigram loss computation: https://github.com/google/sentencepiece/issues/628
- [ALL] Fixes the minor bug in BPE token extraction algorithm: https://github.com/google/sentencepiece/issues/318
- [ALL] Increase the number of maximum threads from 128 to 1024. https://github.com/google/sentencepiece/issues/857

0.1.99pre1

v0.1.99 pre release for testing.

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.