Transformers

Latest version: v4.41.0

Safety actively analyzes 631310 Python packages for vulnerabilities to keep your Python projects secure.

Page 15 of 26

4.9.2

Not secure

- Tpu tie weights 13030 (sgugger)
- ONNX fixes & examples: 13048, 13049, 13028, 13014, 12911, (mfuntowicz, michaelbenayoun, LysandreJik)
- Fix push_to_hub for TPUs 12895 (sgugger)

4.9.1

Not secure

Fix barrier for SM distributed 12853 (sgugger)

4.9.0

Not secure

v4.9.0: TensorFlow examples, CANINE, tokenizer training, ONNX rework

ONNX rework

This version introduces a new package, `transformers.onnx`, which can be used to export models to ONNX. Contrary to the previous implementation, this approach is meant as an easily extendable package where users may define their own ONNX configurations and export the models they wish to export.

bash
python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/

Validating ONNX model...
-[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'}
- Validating ONNX Model output "last_hidden_state":
-[✓] (2, 8, 768) matchs (2, 8, 768)
-[✓] all values close (atol: 0.0001)
- Validating ONNX Model output "pooler_output":
-[✓] (2, 768) matchs (2, 768)
-[✓] all values close (atol: 0.0001)
All good, model saved at: onnx/bert-base-cased/model.onnx

- [RFC] Laying down building stone for more flexible ONNX export capabilities 11786 (mfuntowicz)

CANINE model

Four new models are released as part of the CANINE implementation: `CanineForSequenceClassification`, `CanineForMultipleChoice`, `CanineForTokenClassification` and `CanineForQuestionAnswering`, in PyTorch.

The CANINE model was proposed in [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It’s among the first papers that train a Transformer without using an explicit tokenization step (such as Byte Pair Encoding (BPE), WordPiece, or SentencePiece). Instead, the model is trained directly at a Unicode character level. Training at a character level inevitably comes with a longer sequence length, which CANINE solves with an efficient downsampling strategy, before applying a deep Transformer encoder.

- Add CANINE 12024 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=canine

Tokenizer training

This version introduces a new method to train a tokenizer from scratch based off of an existing tokenizer configuration.

py
from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("wikitext", name="wikitext-2-raw-v1", split="train")
We train on batch of texts, 1000 at a time here.
batch_size = 1000
corpus = (dataset[i : i + batch_size]["text"] for i in range(0, len(dataset), batch_size))

tokenizer = AutoTokenizer.from_pretrained("gpt2")
new_tokenizer = tokenizer.train_new_from_iterator(corpus, vocab_size=20000)

- Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) 12420 (SaulLu)
- Easily train a new fast tokenizer from a given one 12361 (sgugger)

TensorFlow examples

The `TFTrainer` is now entering deprecation - and it is replaced by `Keras`. With version v4.9.0 comes the end of a long rework of the TensorFlow examples, for them to be more Keras-idiomatic, clearer, and more robust.

- NER example for Tensorflow 12469 (Rocketknight1)
- TF summarization example 12617 (Rocketknight1)
- Adding TF translation example 12667 (Rocketknight1)
- Deprecate TFTrainer 12706 (Rocketknight1)

TensorFlow implementations

HuBERT is now implemented in TensorFlow:

- Add TFHubertModel 12206 (will-rice)

Breaking changes

When `load_best_model_at_end` was set to `True` in the `TrainingArguments`, having a different `save_strategy` and `eval_strategy` was accepted but the `save_strategy` was overwritten by the `eval_strategy` (the option to keep track of the best model needs to make sure there is an evaluation each time there is a save). This led to a lot of confusion with users not understanding why the script was not doing what it was told, so this situation will now raise an error indicating to set `save_strategy` and `eval_strategy` to the same values, and in the case that value is `"steps"`, `save_steps` must be a round multiple of `eval_steps`.

General improvements and bugfixes

- UpdateDescription of TrainingArgs param save_strategy 12328 (sam-qordoba)
- [Deepspeed] new docs 12077 (stas00)
- [ray] try fixing import error 12338 (richardliaw)
- [examples/Flax] move the examples table up 12341 (patil-suraj)
- Fix torchscript tests 12336 (LysandreJik)
- Add flax/jax quickstart 12342 (marcvanzee)
- Fixed a typo in readme 12356 (MichalPitr)
- Fix exception in prediction loop occurring for certain batch sizes 12350 (jglaser)
- Add FlaxBigBird QuestionAnswering script 12233 (vasudevgupta7)
- Replace NotebookProgressReporter by ProgressReporter in Ray Tune run 12357 (krfricke)
- [examples] remove extra white space from log format 12360 (stas00)
- fixed multiplechoice tokenization 12362 (cronoik)
- [trainer] add main_process_first context manager 12351 (stas00)
- [Examples] Replicates the new --log_level feature to all trainer-based pytorch 12359 (bhadreshpsavani)
- [Examples] Update Example Template for `--log_level` feature 12365 (bhadreshpsavani)
- [Examples] Replace `print` statement with `logger.info` in QA example utils 12368 (bhadreshpsavani)
- Onnx export v2 fixes 12388 (LysandreJik)
- [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers 12371 (ionicsolutions)
- Update run_mlm.py 12344 (TahaAslani)
- Add possibility to maintain full copies of files 12312 (sgugger)
- [CI] add dependency table sync verification 12364 (stas00)
- [Examples] Added context manager to datasets map 12367 (bhadreshpsavani)
- [Flax community event] Add more description to readme 12398 (patrickvonplaten)
- Remove the need for `einsum` in Albert's attention computation 12394 (mfuntowicz)
- [Flax] Adapt flax examples to include `push_to_hub` 12391 (patrickvonplaten)
- Tensorflow LM examples 12358 (Rocketknight1)
- [Deepspeed] match the trainer log level 12401 (stas00)
- [Flax] Add T5 pretraining script 12355 (patrickvonplaten)
- [models] respect dtype of the model when instantiating it 12316 (stas00)
- Rename detr targets to labels 12280 (NielsRogge)
- Add out of vocabulary error to ASR models 12288 (will-rice)
- Fix TFWav2Vec2 SpecAugment 12289 (will-rice)
- [example/flax] add summarization readme 12393 (patil-suraj)
- [Flax] Example scripts - correct weight decay 12409 (patrickvonplaten)
- fix ids_to_tokens naming error in tokenizer of deberta v2 12412 (hjptriplebee)
- Minor fixes in original RAG training script 12395 (shamanez)
- Added talks 12415 (suzana-ilic)
- [modelcard] fix 12422 (stas00)
- Add option to save on each training node 12421 (sgugger)
- Added to talks section 12433 (suzana-ilic)
- Fix default bool in argparser 12424 (sgugger)
- Add default bos_token and eos_token for tokenizer of deberta_v2 12429 (hjptriplebee)
- fix typo in mt5 configuration docstring 12432 (fcakyon)
- Add to talks section 12442 (suzana-ilic)
- [JAX/Flax readme] add philosophy doc 12419 (patil-suraj)
- [Flax] Add wav2vec2 12271 (patrickvonplaten)
- Add test for a WordLevel tokenizer model 12437 (SaulLu)
- [Flax community event] How to use hub during training 12447 (patrickvonplaten)
- [Wav2Vec2, Hubert] Fix ctc loss test 12458 (patrickvonplaten)
- Comment fast GPU TF tests 12452 (LysandreJik)
- Fix training_args.py barrier for torch_xla 12464 (jysohn23)
- Added talk details 12465 (suzana-ilic)
- Add TPU README 12463 (patrickvonplaten)
- Import check_inits handling of duplicate definitions. 12467 (Iwontbecreative)
- Validation split added: custom data files sgugger, patil-suraj 12407 (Souvic)
- Fixing bug with param count without embeddings 12461 (TevenLeScao)
- [roberta] fix lm_head.decoder.weight ignore_key handling 12446 (stas00)
- Rework notebooks and move them to the Notebooks repo 12471 (sgugger)
- fixed typo in flax-projects readme 12466 (mplemay)
- Fix TAPAS test uncovered by 12446 12480 (LysandreJik)
- Add guide on how to build demos for the Flax sprint 12468 (osanseviero)
- Add `Repository` import to the FLAX example script 12501 (LysandreJik)
- [examples/flax] clip style image-text training example 12491 (patil-suraj)
- [Flax] Fix wav2vec2 pretrain arguments 12498 (Wikidepia)
- [Flax] ViT training example 12300 (patil-suraj)
- Fix order of state and input in Flax Quickstart README 12510 (navjotts)
- [Flax] Dataset streaming example 12470 (patrickvonplaten)
- [Flax] Correct flax training scripts 12514 (patrickvonplaten)
- [Flax] Correct logging steps flax 12515 (patrickvonplaten)
- [Flax] Fix another bug in logging steps 12516 (patrickvonplaten)
- [Wav2Vec2] Flax - Adapt wav2vec2 script 12520 (patrickvonplaten)
- [Flax] Fix hybrid clip 12519 (patil-suraj)
- [RoFormer] Fix some issues 12397 (JunnYu)
- FlaxGPTNeo 12493 (patil-suraj)
- Updated README 12540 (suzana-ilic)
- Edit readme 12541 (SaulLu)
- implementing tflxmertmodel integration test 12497 (sadakmed)
- [Flax] Adapt examples to be able to use eval_steps and save_steps 12543 (patrickvonplaten)
- [examples/flax] add adafactor optimizer 12544 (patil-suraj)
- [Flax] Add FlaxMBart 12236 (stancld)
- Add a warning for broken ProphetNet fine-tuning 12511 (JetRunner)
- [trainer] add option to ignore keys for the train function too (11719) 12551 (shabie)
- MLM training fails with no validation file(same as 12406 for pytorch now) 12517 (Souvic)
- [Flax] Allow retraining from save checkpoint 12559 (patrickvonplaten)
- Adding prepare_decoder_input_ids_from_labels methods to all TF ConditionalGeneration models 12560 (Rocketknight1)
- Remove tf.roll wherever not needed 12512 (szutenberg)
- Double check for attribute num_examples 12562 (sgugger)
- [examples/hybrid_clip] fix loading clip vision model 12566 (patil-suraj)
- Remove logging of GPU count etc from run_t5_mlm_flax.py 12569 (ibraheem-moosa)
- raise exception when arguments to pipeline are incomplete 12548 (hwijeen)
- Init pickle 12567 (sgugger)
- Fix group_lengths for short datasets 12558 (sgugger)
- Don't stop at num_epochs when using IterableDataset 12561 (sgugger)
- Fixing the pipeline optimization by reindexing targets (V2) 12330 (Narsil)
- Fix MT5 init 12591 (sgugger)
- [model.from_pretrained] raise exception early on failed load 12574 (stas00)
- [doc] fix broken ref 12597 (stas00)
- Add Flax sprint project evaluation section 12592 (osanseviero)
- This will reduce "Already borrowed error": 12550 (Narsil)
- [Flax] Add flax marian 12595 (patrickvonplaten)
- [Flax] Fix cur step flax examples 12608 (patrickvonplaten)
- Simplify unk token 12582 (sgugger)
- Fix arg count for partial functions 12609 (sgugger)
- Pass `model_kwargs` when loading a model in `pipeline()` 12449 (aphedges)
- [Flax] Fix mt5 auto 12612 (patrickvonplaten)
- [Flax Marian] Add marian flax example 12614 (patrickvonplaten)
- [FLax] Fix marian docs 2 12615 (patrickvonplaten)
- [debugging utils] minor doc improvements 12525 (stas00)
- [doc] DP/PP/TP/etc parallelism 12524 (stas00)
- [doc] fix anchor 12620 (stas00)
- [Examples][Flax] added test file in summarization example 12630 (bhadreshpsavani)
- Point to the right file for hybrid CLIP 12599 (edugp)
- [flax]fix jax array type check 12638 (patil-suraj)
- Add tokenizer_file parameter to PreTrainedTokenizerFast docstring 12624 (lewisbails)
- Skip TestMarian_MT_EN 12649 (LysandreJik)
- The extended trainer tests should require torch 12650 (LysandreJik)
- Pickle auto models 12654 (sgugger)
- Pipeline should be agnostic 12656 (LysandreJik)
- Fix transfo xl integration test 12652 (LysandreJik)
- Remove SageMaker documentation 12657 (philschmid)
- Fixed docs 12646 (KickItLikeShika)
- fix typo in modeling_t5.py docstring 12640 (PhilipMay)
- Translate README.md to Simplified Chinese 12596 (JetRunner)
- Fix typo in README_zh-hans.md 12663 (JetRunner)
- Updates timeline for project evaluation 12660 (osanseviero)
- [WIP] Patch BigBird tokenization test 12653 (LysandreJik)
- **encode_plus() shouldn't run for W2V2CTC 12655 (LysandreJik)
- Add ByT5 option to example run_t5_mlm_flax.py 12634 (mapmeld)
- Wrong model is used in example, should be character instead of subword model 12676 (jsteggink)
- [Blenderbot] Fix docs 12227 (patrickvonplaten)
- Add option to load a pretrained model with mismatched shapes 12664 (sgugger)
- Fix minor docstring typos. 12682 (qqaatw)
- [tokenizer.prepare_seq2seq_batch] change deprecation to be easily actionable 12669 (stas00)
- [Flax Generation] Correct inconsistencies PyTorch/Flax 12662 (patrickvonplaten)
- [Deepspeed] adapt multiple models, add zero_to_fp32 tests 12477 (stas00)
- Add timeout to CI. 12684 (LysandreJik)
- Fix Tensorflow Bart-like positional encoding 11897 (JunnYu)
- [Deepspeed] non-native optimizers are mostly ok with zero-offload 12690 (stas00)
- Fix multiple choice doc examples 12679 (sgugger)
- Provide mask_time_indices to `_mask_hidden_states` to avoid double masking 12692 (mfuntowicz)
- Update TF examples README 12703 (Rocketknight1)
- Fix uninitialized variables when `config.mask_feature_prob > 0` 12705 (mfuntowicz)
- Only test the files impacted by changes in the diff 12644 (sgugger)
- flax model parallel training 12590 (patil-suraj)
- [test] split test into 4 sub-tests to avoid timeout 12710 (stas00)
- [trainer] release tmp memory in checkpoint load 12718 (stas00)
- [Flax] Correct shift labels for seq2seq models in Flax 12720 (patrickvonplaten)
- Fix typo in Speech2TextForConditionalGeneration example 12716 (will-rice)
- Init adds its own files as impacted 12709 (sgugger)
- LXMERT integration test typo 12736 (LysandreJik)
- Fix AutoModel tests 12733 (LysandreJik)
- Skip test while the model is not available 12739 (LysandreJik)
- Skip test while the model is not available 12740 (LysandreJik)
- Translate README.md to Traditional Chinese 12701 (qqaatw)
- Fix MBart failing test 12737 (LysandreJik)
- Patch T5 device test 12742 (LysandreJik)
- Fix DETR integration test 12734 (LysandreJik)
- Fix led torchscript 12735 (LysandreJik)
- Remove framework mention 12731 (LysandreJik)
- [doc] parallelism: Which Strategy To Use When 12712 (stas00)
- [doc] performance: batch sizes 12725 (stas00)
- Replace specific tokenizer in log message by AutoTokenizer 12745 (SaulLu)
- [Wav2Vec2] Correctly pad mask indices for PreTraining 12748 (patrickvonplaten)
- [doc] testing: how to trigger a self-push workflow 12724 (stas00)
- add intel-tensorflow-avx512 to the candidates 12751 (zzhou612)
- [flax/model_parallel] fix typos 12757 (patil-suraj)
- Turn on eval mode when exporting to ONNX 12758 (mfuntowicz)
- Preserve `list` type of `additional_special_tokens` in `special_token_map` 12759 (SaulLu)
- [Wav2Vec2] Padded vectors should not allowed to be sampled 12764 (patrickvonplaten)
- Add tokenizers class mismatch detection between `cls` and checkpoint 12619 (europeanplaice)
- Fix push_to_hub docstring and make it appear in doc 12770 (sgugger)
- [ray] Fix `datasets_modules` ImportError with Ray Tune 12749 (Yard1)
- Longer timeout for slow tests 12779 (LysandreJik)
- Enforce eval and save strategies are compatible when --load_best_model_at_end 12786 (sgugger)
- [CIs] add troubleshooting docs 12791 (stas00)
- Fix Padded Batch Error 12282 12487 (will-rice)
- Flax MLM: Allow validation split when loading dataset from local file 12689 (fgaim)
- [Longformer] Correct longformer docs 12809 (patrickvonplaten)
- [CLIP/docs] add and fix examples 12810 (patil-suraj)
- [trainer] sanity checks for `save_steps=0|None` and `logging_steps=0` 12796 (stas00)
- Expose get_config() on ModelTesters 12812 (LysandreJik)
- Refactor slow sentencepiece tokenizers. 11716 (PhilipMay)
- Refer warmup_ratio when setting warmup_num_steps. 12818 (tsuchm)
- Add versioning system to fast tokenizer files 12713 (sgugger)
- Add _CHECKPOINT_FOR_DOC to all models 12811 (LysandreJik)

4.8.2

Not secure

- Rename detr targets to labels 12280 (NielsRogge)
- fix ids_to_tokens naming error in tokenizer of deberta v2 12412 (hjptriplebee)
- Add option to save on each training node 12421 (sgugger)

4.8.1

Not secure

- Fix default for TensorBoard folder
- Ray Tune install 12338
- Tests fixes for Torch FX 12336

4.8.0

Not secure

Integration with the Hub

Our example scripts and Trainer are now optimized for publishing your model on the [Hugging Face Hub](https://huggingface.co/models), with Tensorboard training metrics, and an automatically authored model card which contains all the relevant metadata, including evaluation results.

Trainer Hub integration

Use --push_to_hub to create a model repo for your training and it will be saved with all relevant metadata at the end of the training.

Other flags are:
- `push_to_hub_model_id` to control the repo name
- `push_to_hub_organization` to specify an organization

Visualizing Training metrics on huggingface.co (based on Tensorboard)

By default if you have `tensorboard` installed the training scripts will use it to log, and the logging traces folder is conveniently located inside your model output directory, so you can push them to your model repo by default.

Any model repo that contains Tensorboard traces will spawn a Tensorboard server:

![image](https://user-images.githubusercontent.com/35901082/123144141-5c2af980-d429-11eb-8438-16b374d2fe73.png)

which makes it very convenient to see how the training went! This Hub feature is in Beta so let us know if anything looks weird :)

See this [model repo](https://huggingface.co/julien-c/reactiongif-roberta/tensorboard)

Model card generation

![image](https://user-images.githubusercontent.com/35901082/123144222-749b1400-d429-11eb-97f6-9834dcc97c6d.png)

The model card contains info about the datasets used, the eval results, ...

Many users were already adding their eval results to their model cards in markdown format, but this is a more structured way of adding them which will make it easier to parse and e.g. represent in leaderboards such as the ones on Papers With Code!

We use a format specified in collaboration with [PaperswithCode] (https://github.com/huggingface/huggingface_hub/blame/main/modelcard.md), see also [this repo](https://github.com/paperswithcode/model-index).

Model, tokenizer and configurations

All models, tokenizers and configurations having a revamp `push_to_hub()` method as well as a `push_to_hub` argument in their `save_pretrained()` method. The workflow of this method is changed a bit to be more like git, with a local clone of the repo in a folder of the working directory, to make it easier to apply patches (use `use_temp_dir=True` to clone in temporary folders for the same behavior as the experimental API).

- Clean push to hub API 12187 (sgugger)

Flax/JAX support

Flax/JAX is becoming a fully supported backend of the Transformers library with more models having an implementation in it. BART, CLIP and T5 join the already existing models, find the whole list [here](https://huggingface.co/transformers/#supported-frameworks).

- [Flax] FlaxAutoModelForSeq2SeqLM 12228 (patil-suraj)
- [FlaxBart] few small fixes 12247 (patil-suraj)
- [FlaxClip] fix test from/save pretrained test 12284 (patil-suraj)
- [Flax] [WIP] allow loading head model with base model weights 12255 (patil-suraj)
- [Flax] Fix flax test save pretrained 12256 (patrickvonplaten)
- [Flax] Add jax flax to env command 12251 (patrickvonplaten)
- add FlaxAutoModelForImageClassification in main init 12298 (patil-suraj)
- Flax T5 12150 (vasudevgupta7)
- [Flax T5] Fix weight initialization and fix docs 12327 (patrickvonplaten)
- Flax summarization script 12230 (patil-suraj)
- FlaxBartPretrainedModel -> FlaxBartPreTrainedModel 12313 (sgugger)

General improvements and bug fixes

- AutoTokenizer: infer the class from the tokenizer config if possible 12208 (sgugger)
- update desc for map in all examples 12226 (bhavitvyamalik)
- Depreciate pythonic Mish and support PyTorch 1.9 version of Mish 12240 (digantamisra98)
- [t5 doc] make the example work out of the box 12239 (stas00)
- Better CI feedback 12279 (LysandreJik)
- Fix for making student ProphetNet for Seq2Seq Distillation 12130 (vishal-burman)
- [DeepSpeed] don't ignore --adafactor 12257 (stas00)
- Tensorflow QA example 12252 (Rocketknight1)
- [tests] reset report_to to none, avoid deprecation warning 12293 (stas00)
- [trainer + examples] set log level from CLI 12276 (stas00)
- [tests] multiple improvements 12294 (stas00)
- Trainer: adjust wandb installation example 12291 (stefan-it)
- Fix and improve documentation for LEDForConditionalGeneration 12303 (ionicsolutions)
- [Flax] Main doc for event orga 12305 (patrickvonplaten)
- [trainer] 2 bug fixes and a rename 12309 (stas00)
- [docs] performance 12258 (stas00)
- Add CodeCarbon Integration 12304 (JetRunner)
- Optimizing away the `fill-mask` pipeline. 12113 (Narsil)
- Add output in a dictionary for TF `generate` method 12139 (stancld)
- Rewrite ProphetNet to adapt converting ONNX friendly 11981 (jiafatom)
- Add mention of the huggingface_hub methods for offline mode 12320 (LysandreJik)
- [Flax/JAX] Add how to propose projects markdown 12311 (patrickvonplaten)
- [TFWav2Vec2] Fix docs 12283 (chenht2010)
- Add all XxxPreTrainedModel to the main init 12314 (sgugger)
- Conda build 12323 (LysandreJik)
- Changed modeling_fx_utils.py to utils/fx.py for clarity 12326 (michaelbenayoun)

Page 15 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 15 of 26

4.9.2

4.9.1

4.9.0

4.8.2

4.8.1

4.8.0

Page 15 of 26

Links

Releases