Tokenizers

Latest version: v0.15.2

Safety actively analyzes 613822 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

0.15.2

What's Changed
Big shoutout to rlrs for [the fast replace normalizers](https://github.com/huggingface/tokenizers/pull/1413) PR. This boosts the performances of the tokenizers:
![image](https://github.com/huggingface/tokenizers/assets/48595927/d8ee81b1-6d92-43d4-b74c-8775727763e3)

* chore: Update dependencies to latest supported versions by bryantbiggs in https://github.com/huggingface/tokenizers/pull/1441
* Convert word counts to u64 by stephenroller in https://github.com/huggingface/tokenizers/pull/1433
* Efficient Replace normalizer by rlrs in https://github.com/huggingface/tokenizers/pull/1413

New Contributors
* bryantbiggs made their first contribution in https://github.com/huggingface/tokenizers/pull/1441
* stephenroller made their first contribution in https://github.com/huggingface/tokenizers/pull/1433
* rlrs made their first contribution in https://github.com/huggingface/tokenizers/pull/1413

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.1...v0.15.2rc1

0.15.1

What's Changed
* udpate to version = "0.15.1-dev0" by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390
* Derive `Clone` on `Tokenizer`, add `Encoding.into_tokens()` method by epwalsh in https://github.com/huggingface/tokenizers/pull/1381
* Stale bot. by Narsil in https://github.com/huggingface/tokenizers/pull/1404
* Fix doc links in readme by Pierrci in https://github.com/huggingface/tokenizers/pull/1367
* Faster HF dataset iteration in docs by mariosasko in https://github.com/huggingface/tokenizers/pull/1414
* Add quick doc to byte_level.rs by steventrouble in https://github.com/huggingface/tokenizers/pull/1420
* Fix make bench. by Narsil in https://github.com/huggingface/tokenizers/pull/1428
* Bump follow-redirects from 1.15.1 to 1.15.4 in /tokenizers/examples/unstable_wasm/www by dependabot in https://github.com/huggingface/tokenizers/pull/1430
* pyo3: update to 0.20 by mikelui in https://github.com/huggingface/tokenizers/pull/1386
* Encode special tokens by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1437
* Update release for python3.12 windows by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1438

New Contributors
* steventrouble made their first contribution in https://github.com/huggingface/tokenizers/pull/1420

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.1

0.15.1.rc0

What's Changed
* pyo3: update to 0.19 by mikelui in https://github.com/huggingface/tokenizers/pull/1322
* Add `expect()` for disabling truncation by boyleconnor in https://github.com/huggingface/tokenizers/pull/1316
* Re-using scritpts from safetensors. by Narsil in https://github.com/huggingface/tokenizers/pull/1328
* Reduce number of different revisions by 1 by Narsil in https://github.com/huggingface/tokenizers/pull/1329
* Python 38 arm by Narsil in https://github.com/huggingface/tokenizers/pull/1330
* Move to maturing mimicking move for `safetensors`. + Rewritten node bindings. by Narsil in https://github.com/huggingface/tokenizers/pull/1331
* Updating the docs with the new command. by Narsil in https://github.com/huggingface/tokenizers/pull/1333
* Update added tokens by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1335
* update package version for dev by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1339
* Added ability to inspect a 'Sequence' pre-tokenizer. by eaplatanios in https://github.com/huggingface/tokenizers/pull/1341
* Let's allow hf_hub < 1.0 by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1344
* Fixing the progressbar. by Narsil in https://github.com/huggingface/tokenizers/pull/1353
* Preparing release. by Narsil in https://github.com/huggingface/tokenizers/pull/1355
* fix a clerical error in the comment by tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356
* fix: remove useless token by rtrompier in https://github.com/huggingface/tokenizers/pull/1371
* Bump babel/traverse from 7.22.11 to 7.23.2 in /bindings/node by dependabot in https://github.com/huggingface/tokenizers/pull/1370
* Allow hf_hub 0.18 by mariosasko in https://github.com/huggingface/tokenizers/pull/1383
* Allow `huggingface_hub<1.0` by Wauplin in https://github.com/huggingface/tokenizers/pull/1385
* [`pre_tokenizers`] Fix sentencepiece based Metaspace by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1357
* udpate to version = "0.15.1-dev0" by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1390
* Derive `Clone` on `Tokenizer`, add `Encoding.into_tokens()` method by epwalsh in https://github.com/huggingface/tokenizers/pull/1381
* Stale bot. by Narsil in https://github.com/huggingface/tokenizers/pull/1404
* Fix doc links in readme by Pierrci in https://github.com/huggingface/tokenizers/pull/1367
* Faster HF dataset iteration in docs by mariosasko in https://github.com/huggingface/tokenizers/pull/1414
* Add quick doc to byte_level.rs by steventrouble in https://github.com/huggingface/tokenizers/pull/1420
* Fix make bench. by Narsil in https://github.com/huggingface/tokenizers/pull/1428
* Bump follow-redirects from 1.15.1 to 1.15.4 in /tokenizers/examples/unstable_wasm/www by dependabot in https://github.com/huggingface/tokenizers/pull/1430
* pyo3: update to 0.20 by mikelui in https://github.com/huggingface/tokenizers/pull/1386

New Contributors
* mikelui made their first contribution in https://github.com/huggingface/tokenizers/pull/1322
* eaplatanios made their first contribution in https://github.com/huggingface/tokenizers/pull/1341
* tiandiweizun made their first contribution in https://github.com/huggingface/tokenizers/pull/1356
* rtrompier made their first contribution in https://github.com/huggingface/tokenizers/pull/1371
* mariosasko made their first contribution in https://github.com/huggingface/tokenizers/pull/1383
* Wauplin made their first contribution in https://github.com/huggingface/tokenizers/pull/1385
* steventrouble made their first contribution in https://github.com/huggingface/tokenizers/pull/1420

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.15.1.rc0

0.15.0

What's Changed
* fix a clerical error in the comment by tiandiweizun in https://github.com/huggingface/tokenizers/pull/1356
* fix: remove useless token by rtrompier in https://github.com/huggingface/tokenizers/pull/1371
* Bump babel/traverse from 7.22.11 to 7.23.2 in /bindings/node by dependabot in https://github.com/huggingface/tokenizers/pull/1370
* Allow hf_hub 0.18 by mariosasko in https://github.com/huggingface/tokenizers/pull/1383
* Allow `huggingface_hub<1.0` by Wauplin in https://github.com/huggingface/tokenizers/pull/1385
* [`pre_tokenizers`] Fix sentencepiece based Metaspace by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1357

New Contributors
* tiandiweizun made their first contribution in https://github.com/huggingface/tokenizers/pull/1356
* rtrompier made their first contribution in https://github.com/huggingface/tokenizers/pull/1371
* mariosasko made their first contribution in https://github.com/huggingface/tokenizers/pull/1383
* Wauplin made their first contribution in https://github.com/huggingface/tokenizers/pull/1385

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.14.1...v0.15.0

0.14.1

What's Changed
* Fix conda release by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1211
* Fix node release by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1212
* Printing warning to stderr. by Narsil in https://github.com/huggingface/tokenizers/pull/1222
* Fixing padding_left sequence_ids. by Narsil in https://github.com/huggingface/tokenizers/pull/1233
* Use LTO for release and benchmark builds by csko in https://github.com/huggingface/tokenizers/pull/1157
* fix unigram.rs test_sample() by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1244
* implement a simple max_sentencepiece_length into BPE by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1228
* Makes `decode` and `decode_batch` work on borrowed content. by mfuntowicz in https://github.com/huggingface/tokenizers/pull/1251
* Update all GH Actions with dependency on actions/checkout by mfuntowicz in https://github.com/huggingface/tokenizers/pull/1256
* Parallelize unigram trainer by mishig25 in https://github.com/huggingface/tokenizers/pull/976
* Update unigram/trainer.rs by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1257
* Fixing broken link. by Narsil in https://github.com/huggingface/tokenizers/pull/1268
* fix documentation regarding regex by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1264
* Update Cargo.toml by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1266
* Update README.md - Broken link by sbhavani in https://github.com/huggingface/tokenizers/pull/1272
* [doc build] Use secrets by mishig25 in https://github.com/huggingface/tokenizers/pull/1273
* Improve error for truncation with too high stride by boyleconnor in https://github.com/huggingface/tokenizers/pull/1275
* Add unigram bytefallback by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1217
* revise type specification by hiroshi-matsuda-rit in https://github.com/huggingface/tokenizers/pull/1289
* Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node by dependabot in https://github.com/huggingface/tokenizers/pull/1291
* Update path name: master -> main by bact in https://github.com/huggingface/tokenizers/pull/1292
* import Tuple from typing by kellymarchisio in https://github.com/huggingface/tokenizers/pull/1295
* Fixing clippy warnings on 1.71. by Narsil in https://github.com/huggingface/tokenizers/pull/1296
* Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node by dependabot in https://github.com/huggingface/tokenizers/pull/1299
* feat: Added CITATION.cff. by SamuelLarkin in https://github.com/huggingface/tokenizers/pull/1302
* Single warning for holes. by Narsil in https://github.com/huggingface/tokenizers/pull/1303
* Give error when initializing tokenizer with too high stride by boyleconnor in https://github.com/huggingface/tokenizers/pull/1306
* Handle when precompiled charsmap is empty by kellymarchisio in https://github.com/huggingface/tokenizers/pull/1308
* Derive clone for TrainerWrapper by jonatanklosko in https://github.com/huggingface/tokenizers/pull/1317
* CD backports by chris-ha458 in https://github.com/huggingface/tokenizers/pull/1318
* 0.13.4.rc1 by Narsil in https://github.com/huggingface/tokenizers/pull/1319
* Release all at once for simplicity. by Narsil in https://github.com/huggingface/tokenizers/pull/1320
* Fix stride condition. by Narsil in https://github.com/huggingface/tokenizers/pull/1321
* pyo3: update to 0.19 by mikelui in https://github.com/huggingface/tokenizers/pull/1322
* Add `expect()` for disabling truncation by boyleconnor in https://github.com/huggingface/tokenizers/pull/1316
* Re-using scritpts from safetensors. by Narsil in https://github.com/huggingface/tokenizers/pull/1328
* Reduce number of different revisions by 1 by Narsil in https://github.com/huggingface/tokenizers/pull/1329
* Python 38 arm by Narsil in https://github.com/huggingface/tokenizers/pull/1330
* Move to maturing mimicking move for `safetensors`. + Rewritten node bindings. by Narsil in https://github.com/huggingface/tokenizers/pull/1331
* Updating the docs with the new command. by Narsil in https://github.com/huggingface/tokenizers/pull/1333
* Update added tokens by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1335
* update package version for dev by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1339
* Added ability to inspect a 'Sequence' pre-tokenizer. by eaplatanios in https://github.com/huggingface/tokenizers/pull/1341
* Let's allow hf_hub < 1.0 by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1344
* Fixing the progressbar. by Narsil in https://github.com/huggingface/tokenizers/pull/1353
* Preparing release. by Narsil in https://github.com/huggingface/tokenizers/pull/1355

New Contributors
* csko made their first contribution in https://github.com/huggingface/tokenizers/pull/1157
* chris-ha458 made their first contribution in https://github.com/huggingface/tokenizers/pull/1244
* sbhavani made their first contribution in https://github.com/huggingface/tokenizers/pull/1272
* boyleconnor made their first contribution in https://github.com/huggingface/tokenizers/pull/1275
* hiroshi-matsuda-rit made their first contribution in https://github.com/huggingface/tokenizers/pull/1289
* bact made their first contribution in https://github.com/huggingface/tokenizers/pull/1292
* kellymarchisio made their first contribution in https://github.com/huggingface/tokenizers/pull/1295
* SamuelLarkin made their first contribution in https://github.com/huggingface/tokenizers/pull/1302
* jonatanklosko made their first contribution in https://github.com/huggingface/tokenizers/pull/1317
* mikelui made their first contribution in https://github.com/huggingface/tokenizers/pull/1322
* eaplatanios made their first contribution in https://github.com/huggingface/tokenizers/pull/1341

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.13.3...v0.14.1

0.14.1rc1

What's Changed
* pyo3: update to 0.19 by mikelui in https://github.com/huggingface/tokenizers/pull/1322
* Add `expect()` for disabling truncation by boyleconnor in https://github.com/huggingface/tokenizers/pull/1316
* Re-using scritpts from safetensors. by Narsil in https://github.com/huggingface/tokenizers/pull/1328
* Reduce number of different revisions by 1 by Narsil in https://github.com/huggingface/tokenizers/pull/1329
* Python 38 arm by Narsil in https://github.com/huggingface/tokenizers/pull/1330
* Move to maturing mimicking move for `safetensors`. + Rewritten node bindings. by Narsil in https://github.com/huggingface/tokenizers/pull/1331
* Updating the docs with the new command. by Narsil in https://github.com/huggingface/tokenizers/pull/1333
* Update added tokens by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1335
* update package version for dev by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1339
* Added ability to inspect a 'Sequence' pre-tokenizer. by eaplatanios in https://github.com/huggingface/tokenizers/pull/1341
* Let's allow hf_hub < 1.0 by ArthurZucker in https://github.com/huggingface/tokenizers/pull/1344
* Fixing the progressbar. by Narsil in https://github.com/huggingface/tokenizers/pull/1353

New Contributors
* mikelui made their first contribution in https://github.com/huggingface/tokenizers/pull/1322
* eaplatanios made their first contribution in https://github.com/huggingface/tokenizers/pull/1341

**Full Changelog**: https://github.com/huggingface/tokenizers/compare/v0.13.4.rc2...v0.14.1rc1

Page 1 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.