Transformers

Latest version: v4.41.0

Safety actively analyzes 631249 Python packages for vulnerabilities to keep your Python projects secure.

Page 14 of 26

4.11.1

Not secure

Patch release with a few bug fixes:
- [Wav2Vec2] Better error message (13777)
- Fix LayoutLM ONNX test error (13710)
- Fix warning for gradient_checkpointing (13767)
- Implement len in IterableDatasetShard (13780)
- Fix length of IterableDatasetShard and add test (13792)

4.11.0

Not secure

GPT-J

Three new models are released as part of the GPT-J implementation: `GPTJModel`, `GPTJForCausalLM`, `GPTJForSequenceClassification`, in PyTorch.

The GPT-J model was released in the [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like causal language model trained on the Pile dataset.

It was contributed by StellaAthena, kurumuz, EricHallahan, and leogao2.

- GPT-J-6B 13022 (StellaAthena)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gptj

SpeechEncoderDecoder & Speech2Text2

One new model is released as part of the Speech2Text2 implementation: `Speech2Text2ForCausalLM`, in PyTorch.

The Speech2Text2 model is used together with Wav2Vec2 for Speech Translation models proposed in [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.

Speech2Text2 is a decoder-only transformer model that can be used with any speech encoder-only, such as Wav2Vec2 or HuBERT for Speech-to-Text tasks. Please refer to the [SpeechEncoderDecoder](https://huggingface.co/transformers/master/model_doc/speechencoderdecoder.html) class on how to combine Speech2Text2 with any speech encoder-only model.

- Add SpeechEncoderDecoder & Speech2Text2 13186 (patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=speech2text2

FNet

Eight new models are released as part of the FNet implementation: `FNetModel`, `FNetForPreTraining`, `FNetForMaskedLM`, `FNetForNextSentencePrediction`, `FNetForSequenceClassification`, `FNetForMultipleChoice`, `FNetForTokenClassification`, `FNetForQuestionAnswering`, in PyTorch.

The FNet model was proposed in [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT model with a fourier transform which returns only the real parts of the transform. The model is significantly faster than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97% accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model.

- Add FNet 13045 (gchhablani)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=fnet

TensorFlow improvements

Several bug fixes and UX improvements for Tensorflow:

- Users should notice much fewer unnecessary warnings and less 'console spam' in general while using Transformers with TensorFlow.
- TensorFlow models should be less picky about the specific integer dtypes (int32/int64) that are passed as input

Changes to compile() and train_step()

- You can now compile our TensorFlow models without passing a loss argument! If you do, the model will compute loss internally during the forward pass and then use this value to fit() on. This makes it much more convenient to get the right loss, particularly since many models have unique losses for certain tasks that are easy to overlook and annoying to reimplement. Remember to pass your labels as the "labels" key of your input dict when doing this, so that they're accessible to the model during the forward pass. There is no change to the behavior if you pass a loss argument, so all old code should remain unaffected by this change.

Associated PRs:

- Modified TF train_step 13678 (Rocketknight1)
- Fix Tensorflow T5 with int64 input 13479 (Rocketknight1)
- MarianMT int dtype fix 13496 (Rocketknight1)
- Removed console spam from misfiring warnings 13625 (Rocketknight1)

Pipelines

Pipeline refactor

The pipelines underwent a large refactor that should make contributing pipelines much simpler, and much less error-prone. As part of this refactor, PyTorch-based pipelines are now optimized for GPU performance based on PyTorch's `Dataset`s and `DataLoader`s.

See below for an example leveraging the `superb` dataset.
py
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
as we're not interested in the `target` part of the dataset.
for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
print(out)
{"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
{"text": ....}
....

- [Large PR] Entire rework of pipelines. 13308 (Narsil)

Audio classification pipeline

Additionally, an additional pipeline is available, for audio classification.

- Add the `AudioClassificationPipeline` 13342 (anton-l)
- Enabling automatic loading of tokenizer with `pipeline` for `audio-classification`. 13376 (Narsil)

Setters for common properties

Version v4.11.0 introduces setters for common configuration properties. Different configurations have different properties as coming from different implementations.

One such example is the `BertConfig` having the `hidden_size` attribute, while the `GPT2Config` has the `n_embed` attribute, which are essentially the same.

The newly introduced setters allow setting such properties through a standardized naming scheme, even on configuration objects that do not have them by default.

See the following code sample for an example:

from transformers import GPT2Config
config = GPT2Config()

config.hidden_size = 4 Failed previously
config = GPT2Config(hidden_size =4) Failed previously

config.n_embed returns 4
config.hidden_size returns 4

- Update model configs - Allow setters for common properties 13026 (nreimers)

Dynamic model code loading

An experimental feature adding support for model files hosted on the hub is added as part of this release. A walkthrough is available in the [PR description](https://github.com/huggingface/transformers/pull/13467).

:warning: This means that code files will be fetched from the hub to be executed locally. An additional argument, `trust_remote_code` is required when instantiating the model from the hub. We heavily encourage you to also specify a `revision` if using code from another user's or organization's repository.

- Dynamically load model code from the Hub 13467 (sgugger)

Trainer

The `Trainer` has received several new features, the main one being that models are uploaded to the Hub each time you save them locally (you can specify another strategy). This push is asynchronous, so training continues normally without interruption.

Also:
- The SigOpt optimization framework is now integrated in the `Trainer` API as an opt-in component.
- The `Trainer` API now supports fine-tuning on distributed CPUs.

Associated PRs:

- Push to hub when saving checkpoints 13503 (sgugger)
- Add SigOpt HPO to transformers trainer api 13572 (kding1)
- Add cpu distributed fine-tuning support for transformers Trainer API 13574 (kding1)

Model size CPU memory usage reduction

The memory required to load a model in memory using PyTorch's `torch.load` requires twice the amount of memory necessary. An experimental feature allowing model loading while requiring only the model size in terms of memory usage is out in version v4.11.0.

It can be used by using the `low_cpu_mem_usage=True` argument with PyTorch pretrained models.

- 1x model size CPU memory usage for `from_pretrained` 13466 (stas00)

GPT-Neo: simplified local attention

The GPT-Neo local attention was greatly simplified with no loss of performance.

- [GPT-Neo] Simplify local attention 13491 (finetuneanon, patil-suraj)

Breaking changes

*We strive for no breaking changes between releases - however, some bugs are not discovered for long periods of time, and users may eventually rely on such bugs. We document here such changes that may affect users when updating to a recent version.*

Order of overflowing tokens

The overflowing tokens returned by the slow tokenizers were returned in the wrong order. This is changed in the PR below.

- Correct order of overflowing_tokens for slow tokenizer 13179 (Apoorvgarg-creator)

Non-prefixed tokens for token classification pipeline

Updates the behavior of `aggregation_strategy` to more closely mimic the deprecated `grouped_entities` pipeline argument.

- Fixing backward compatiblity for non prefixed tokens (B-, I-). 13493 (Narsil)

Inputs normalization for Wav2Vec2 feature extractor

The changes in v4.10 (12804) introduced a bug in inputs normalization for non-padded tensors that affected Wav2Vec2 fine-tuning.
This is fixed in the PR below.

- [Wav2Vec2] Fix normalization for non-padded tensors 13512 (patrickvonplaten)

General bug fixes and improvements

- Fixes for the documentation 13361 (sgugger)
- fix wrong 'cls' masking for bigbird qa model output 13143 (donggyukimc)
- Improve T5 docs 13240 (NielsRogge)
- Fix tokenizer saving during training with `Trainer` 12806 (SaulLu)
- Fix DINO 13369 (NielsRogge)
- Properly register missing submodules in main init 13372 (sgugger)
- Add `Hubert` to the `AutoFeatureExtractor` 13366 (anton-l)
- Add missing feature extractors 13374 (LysandreJik)
- Fix RemBERT tokenizer initialization 13375 (LysandreJik)
- [Flax] Fix BigBird 13380 (patrickvonplaten)
- [GPU Tests] Fix SpeechEncoderDecoder GPU tests 13383 (patrickvonplaten)
- Fix name and get_class method in AutoFeatureExtractor 13385 (sgugger)
- [Flax/run_hybrid_clip] Fix duplicating images when captions_per_image exceeds the number of captions, enable truncation 12752 (edugp)
- Move Flax self-push to test machine 13364 (patrickvonplaten)
- Torchscript test 13350 (LysandreJik)
- Torchscript test for DistilBERT 13351 (LysandreJik)
- Torchscript test for ConvBERT 13352 (LysandreJik)
- Torchscript test for Flaubert 13353 (LysandreJik)
- Fix GPT-J _CHECKPOINT_FOR_DOC typo 13368 (LysandreJik)
- Update clip loss calculation 13217 (sachinruk)
- Add LayoutXLM tokenizer docs 13373 (NielsRogge)
- [doc] fix mBART example 13387 (patil-suraj)
- [docs] Update perplexity.rst to use negative log likelihood 13386 (madaan)
- [Tests] Fix SpeechEncoderDecoder tests 13395 (patrickvonplaten)
- [SpeechEncoderDecoder] Fix final test 13396 (patrickvonplaten)
- ✨ Add PyTorch image classification example 13134 (nateraw)
- Fix tests without any real effect in EncoderDecoderMixin 13406 (ydshieh)
- Fix scheduled tests for `SpeechEncoderDecoderModel` 13422 (anton-l)
- add torchvision in example test requirements 13438 (patil-suraj)
- [EncoderDecoder] Fix torch device in tests 13448 (patrickvonplaten)
- Adding a test for multibytes unicode. 13447 (Narsil)
- skip image classification example test 13451 (patil-suraj)
- Add TAPAS MLM-only models 13408 (NielsRogge)
- Fix scheduled TF Speech tests 13403 (anton-l)
- Update version of `packaging` package 13454 (shivdhar)
- Update setup.py 13421 (anukaal)
- Fix img classification tests 13456 (nateraw)
- Making it raise real errors on ByT5. 13449 (Narsil)
- Optimized bad word ids 13433 (guillaume-be)
- Use powers of 2 in download size calculations 13468 (anton-l)
- [docs] update dead quickstart link on resuing past for GPT2 13455 (shabie)
- fix CLIP conversion script. 13474 (patil-suraj)
- Deprecate Mirror 13470 (JetRunner)
- [CLIP] fix logit_scale init 13436 (patil-suraj)
- Don't modify labels inplace in `LabelSmoother` 13464 (sgugger)
- Enable automated model list copying for localized READMEs 13465 (qqaatw)
- Better error raised when cloned without lfs 13401 (LysandreJik)
- Throw ValueError for mirror downloads 13478 (JetRunner)
- Fix Tensorflow T5 with int64 input 13479 (Rocketknight1)
- Object detection pipeline 12886 (mishig25)
- Typo in "end_of_word_suffix" 13477 (KoichiYasuoka)
- Fixed the MultilabelTrainer document, which would cause a potential bug when executing the code originally documented. 13414 (Mohan-Zhang-u)
- Fix integration tests for `TFWav2Vec2` and `TFHubert` 13480 (anton-l)
- Fix typo in deepspeed documentation 13482 (apohllo)
- flax ner example 13365 (kamalkraj)
- Fix typo in documentation 13494 (apohllo)
- MarianMT int dtype fix 13496 (Rocketknight1)
- [Tentative] Moving slow tokenizer to the Trie world. 13220 (Narsil)
- Refactor internals for Trainer push_to_hub 13486 (sgugger)
- examples: minor fixes in flax example readme 13502 (stefan-it)
- [Wav2Vec2] Fix normalization for non-padded tensors 13512 (patrickvonplaten)
- TF multiple choice loss fix 13513 (Rocketknight1)
- [Wav2Vec2] Fix dtype 64 bug 13517 (patrickvonplaten)
- fix PhophetNet 'use_cache' assignment of no effect 13532 (holazzer)
- Ignore `past_key_values` during GPT-Neo inference 13521 (aphedges)
- Fix attention mask size checking for CLIP 13535 (Renovamen)
- [Speech2Text2] Skip newly added tokenizer test 13536 (patrickvonplaten)
- [Speech2Text] Give feature extraction higher tolerance 13538 (patrickvonplaten)
- [tokenizer] use use_auth_token for config 13523 (stas00)
- Small changes in `perplexity.rst`to make the notebook executable on google collaboratory 13541 (SaulLu)
- [Feature Extractors] Return attention mask always in int32 13543 (patrickvonplaten)
- Nightly torch ci 13550 (LysandreJik)
- Add long overdue link to the Google TRC project 13501 (avital)
- Fixing 13381 13400 (Narsil)
- fixing BC in `fill-mask` (wasn't tested in theses test suites apparently). 13540 (Narsil)
- add flax mbart in auto seq2seq lm 13560 (patil-suraj)
- [Flax] Addition of FlaxPegasus 13420 (bhadreshpsavani)
- Add checks to build cleaner model cards 13542 (sgugger)
- separate model card git push from the rest 13514 (elishowk)
- Fix test_fetcher when setup is updated 13566 (sgugger)
- [Flax] Fixes typo in Bart based Flax Models 13565 (bhadreshpsavani)
- Fix GPTNeo onnx export 13524 (patil-suraj)
- upgrade sentencepiece version 13564 (elishowk)
- [Pretrained Model] Add resize_position_embeddings 13559 (patrickvonplaten)
- [ci] nightly: add deepspeed master 13589 (stas00)
- [Tests] Disable flaky s2t test 13585 (patrickvonplaten)
- Correct device when resizing position embeddings 13593 (patrickvonplaten)
- Fix DataCollatorForSeq2Seq when labels are supplied as Numpy array instead of list 13582 (Rocketknight1)
- Fix a pipeline test with the newly updated weights 13608 (LysandreJik)
- Fix make fix-copies with type annotations 13586 (sgugger)
- DataCollatorForTokenClassification numpy fix 13609 (Rocketknight1)
- Feature Extractor: Wav2Vec2 & Speech2Text - Allow truncation + padding=longest 13600 (patrickvonplaten)
- [deepspeed] replaced deprecated init arg 13587 (stas00)
- Properly use test_fetcher for examples 13604 (sgugger)
- XLMR tokenizer is fully picklable 13577 (ben-davidson-6)
- Optimize Token Classification models for TPU 13096 (ibraheem-moosa)
- [Trainer] Add nan/inf logging filter 13619 (patrickvonplaten)
- Fix special tokens not correctly tokenized 13489 (qqaatw)
- Removed console spam from misfiring warnings 13625 (Rocketknight1)
- Use `config_dict_or_path` for deepspeed.zero.Init 13614 (aphedges)
- Fixes issues with backward pass in LED/Longformer Self-attention 13613 (aleSuglia)
- fix some docstring in encoder-decoder models 13611 (ydshieh)
- Updated tiny distilbert models 13631 (LysandreJik)
- Fix GPT2Config parameters in GPT2ModelTester 13630 (calpt)
- [run_summarization] fix typo 13647 (patil-suraj)
- [Fix]Make sure the args tb_writer passed to the TensorBoardCallback works 13636 (iamlockelightning)
- Fix mT5 documentation 13639 (ayaka14732)
- Update modeling_tf_deberta.py 13654 (kamalkraj)
- [megatron_gpt2] checkpoint v3 13508 (stas00)
- Change https:/ to https:// to dataset GitHub repo #13644 (flozi00)
- fix research_projects/mlm_wwm readme.md examples 13646 (LowinLi)
- Fix typo distilbert doc to code link 13643 (flozi00)
- Add Speech AutoModels 13655 (patrickvonplaten)
- beit-flax 13515 (kamalkraj)
- [FLAX] Question Answering Example 13649 (kamalkraj)
- Typo "UNKWOWN" -> "UNKNOWN" 13675 (kamalkraj)
- [SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy 13650 (anton-l)
- [SinusoidalPositionalEmbedding] incorrect dtype when resizing in `forward` 13665 (stas00)
- Add push_to_hub to no_trainer examples 13659 (sgugger)
- Layoutlm onnx support (Issue 13300) 13562 (nishprabhu)
- Update modeling_flax_wav2vec2.py 13680 (kamalkraj)
- [FlaxWav2Vec2] Revive Test 13688 (patrickvonplaten)
- [AutoTokenizer] Allow creation of tokenizers by tokenizer type 13668 (patrickvonplaten)
- [Wav2Vec2FeatureExtractor] Fix `extractor.pad()` dtype backwards compatibility 13693 (anton-l)
- Make gradient_checkpointing a training argument 13657 (sgugger)
- Assertions to exceptions 13692 (MocktaiLEngineer)
- Fix non-negligible difference between GPT2 and TFGP2 13679 (ydshieh)
- Allow only textual inputs to VisualBert 13687 (gchhablani)
- Patch training arguments issue 13699 (LysandreJik)
- Patch training arguments issue 13700 (LysandreJik)
- [GPT-J] Use the `float16` checkpoints in integration tests 13676 (anton-l)
- [docs/gpt-j] add a note about tokenizer 13696 (patil-suraj)
- Fix FNet reference to tpu short seq length 13686 (gchhablani)
- Add BlenderBot small tokenizer to the init 13367 (LysandreJik)
- Fix typo in torchscript tests 13701 (LysandreJik)
- Handle `UnicodeDecodeError` when loading config file 13717 (qqaatw)
- Add FSNER example in research_projects 13712 (sayef)
- Replace torch.set_grad_enabled by torch.no_grad 13703 (LysandreJik)
- [ASR] Add official ASR CTC example to `examples/pytorch/speech-recognition` 13620 (patrickvonplaten)
- Make assertions only if actually chunking forward 13598 (joshdevins)
- Use torch.unique_consecutive to check elements are same 13637 (oToToT)
- Fixing zero-shot backward compatiblity 13725 (Narsil)
- [Tests] FNetTokenizer 13729 (patrickvonplaten)
- Warn for unexpected argument combinations 13509 (shirayu)
- Add model card creation snippet to example scripts 13730 (gchhablani)
- [Examples] speech recognition - remove gradient checkpointing 13733 (patrickvonplaten)
- Update test dependence for torch examples 13738 (sgugger)
- [Tests] Add decorator to FlaxBeit 13743 (patrickvonplaten)
- Update requirements for speech example 13745 (sgugger)
- [Trainer] Make sure shown loss in distributed training is correctly averaged over all workers 13681 (patrickvonplaten)
- [megatron gpt checkpoint conversion] causal mask requires pos_embed dimension 13735 (stas00)
- [Tests] Cast Hubert model tests to fp16 13755 (anton-l)
- Fix type annotations for `distributed_concat()` 13746 (Renovamen)
- Fix loss computation in Trainer 13760 (sgugger)
- Silence warning in gradient checkpointing when it's False 13734 (sgugger)

4.10.3

Not secure

Patches an issue with the serialization of the `TrainingArguments`

4.10.2

Not secure

- [Wav2Vec2] Fix dtype 64 bug 13517 (patrickvonplaten)

4.10.1

Not secure

- [Wav2Vec2] Fix normalization for non-padded tensors 13512 (patrickvonplaten)
- Fixing backward compatiblity for non prefixed tokens (B-, I-). 13493 (Narsil)
- Fixing 13381 13400 (Narsil)

4.10.0

Not secure

LayoutLM-v2 and LayoutXLM

Four new models are released as part of the LatourLM-v2 implementation: `LayoutLMv2ForSequenceClassification`, `LayoutLMv2Model`, `LayoutLMv2ForTokenClassification` and `LayoutLMv2ForQuestionAnswering`, in PyTorch.

The LayoutLMV2 model was proposed in [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves [LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html) to obtain state-of-the-art results across several document image understanding benchmarks:

- Add LayoutLMv2 + LayoutXLM 12604 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=layoutlmv2

BEiT

Three new models are released as part of the BEiT implementation: `BeitModel`, `BeitForMaskedImageModeling`, and `BeitForImageClassification`, in PyTorch.

The BEiT model was proposed in [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the [original ViT paper](https://arxiv.org/abs/2010.11929)), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s [DALL-E model](https://arxiv.org/abs/2102.12092) given masked patches.

- Add BEiT 12994 (NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=beit

Speech improvements

The Wav2Vec2 and HuBERT models now have a sequence classification head available.

- Add Wav2Vec2 & Hubert ForSequenceClassification 13153 (anton-l)

DeBERTa in TensorFlow (kamalkraj)

The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow.

- Deberta tf 12972 (kamalkraj)
- Deberta_v2 tf 13120 (kamalkraj)

Flax model additions

EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax!

- FlaxEncoderDecoder allowing Bert2Bert and Bert2GPT2 in Flax 13008 (ydshieh)
- FlaxDistilBERT 13324 (kamalkraj)
- FlaxAlBERT 13294 (kamalkraj)

TensorFlow examples

A new example has been added in TensorFlow: multiple choice!
Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch.

- Add TF multiple choice example 12865 (Rocketknight1)
- TF/Numpy variants for all DataCollator classes 13105 (Rocketknight1)

Auto API refactor

The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed.

- Disentangle auto modules from other modeling files 13023 (sgugger)
- Fix AutoTokenizer when no fast tokenizer is available 13336 (sgugger)

Slight breaking change

When loading some kinds of corrupted state dictionaries of models, the `PreTrainedModel.from_pretrained` method was sometimes silently ignoring weights. This has now become a real error.

- Fix from_pretrained with corrupted state_dict 12939 (sgugger)

General improvements and bugfixes

- Improving pipeline tests 12784 (Narsil)
- Pin git python to <3.1.19 12858 (patrickvonplaten)
- [tests] fix logging_steps requirements 12860 (stas00)
- [Sequence Feature Extraction] Add truncation 12804 (patrickvonplaten)
- add `classifier_dropout` to classification heads 12794 (PhilipMay)
- Fix barrier for SM distributed 12853 (sgugger)
- Add possibility to ignore imports in test_fecther 12801 (sgugger)
- Add accelerate to examples requirements 12888 (sgugger)
- Fix documentation of BigBird tokenizer 12889 (sgugger)
- Better heuristic for token-classification pipeline. 12611 (Narsil)
- Fix push_to_hub for TPUs 12895 (sgugger)
- `Seq2SeqTrainer` set max_length and num_beams only when non None 12899 (cchen-dialpad)
- [FLAX] Minor fixes in CLM example 12914 (stefan-it)
- Correct validation_split_percentage argument from int (ex:5) to float (0.05) 12897 (Elysium1436)
- Fix typo in the example of MobileBertForPreTraining 12919 (buddhics)
- Add option to set max_len in run_ner 12929 (sgugger)
- Fix QA examples for roberta tokenizer 12928 (sgugger)
- Print defaults when using --help for scripts 12930 (sgugger)
- Fix StoppingCriteria ABC signature 12918 (willfrey)
- Add missing classmethod decorators 12927 (willfrey)
- fix distiller.py 12910 (chutaklee)
- Update generation_logits_process.py 12901 (willfrey)
- Update generation_logits_process.py 12900 (willfrey)
- Update tokenization_auto.py 12896 (willfrey)
- Fix docstring typo in tokenization_auto.py 12891 (willfrey)
- [Flax] Correctly Add MT5 12988 (patrickvonplaten)
- ONNX v2 raises an Exception when using PyTorch < 1.8.0 12933 (mfuntowicz)
- Moving feature-extraction pipeline to new testing scheme 12843 (Narsil)
- Add CpmTokenizerFast 12938 (JetRunner)
- fix typo in gradient_checkpointing arg 12855 (21jun)
- Log Azure ML metrics only for rank 0 12766 (harshithapv)
- Add substep end callback method 12951 (wulu473)
- Add multilingual documentation support 12952 (JetRunner)
- Fix division by zero in NotebookProgressPar 12953 (sgugger)
- [FLAX] Minor fixes in LM example 12947 (stefan-it)
- Prevent `Trainer.evaluate()` crash when using only tensorboardX 12963 (aphedges)
- Fix typo in example of DPRReader 12954 (tadejsv)
- Place BigBirdTokenizer in sentencepiece-only objects 12975 (sgugger)
- fix typo in example/text-classification README 12974 (fullyz)
- Fix template for inputs docstrings 12976 (sgugger)
- fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception 12981 (PhilipMay)
- Cast logits from bf16 to fp32 at the end of TF_T5 12332 (szutenberg)
- Update CANINE test 12453 (NielsRogge)
- pad_to_multiple_of added to DataCollatorForWholeWordMask 12999 (Aktsvigun)
- [Flax] Align jax flax device name 12987 (patrickvonplaten)
- [Flax] Correct flax docs 12782 (patrickvonplaten)
- T5: Create position related tensors directly on device instead of CPU 12846 (armancohan)
- Skip ProphetNet test 12462 (LysandreJik)
- Create perplexity.rst 13004 (sashavor)
- GPT-Neo ONNX export 12911 (michaelbenayoun)
- Update generate method - Fix floor_divide warning 13013 (nreimers)
- [Flax] Correct pt to flax conversion if from base to head 13006 (patrickvonplaten)
- [Flax T5] Speed up t5 training 13012 (patrickvonplaten)
- FX submodule naming fix 13016 (michaelbenayoun)
- T5 with past ONNX export 13014 (michaelbenayoun)
- Fix ONNX test: Put smaller ALBERT model 13028 (LysandreJik)
- Tpu tie weights 13030 (sgugger)
- Use min version for huggingface-hub dependency 12961 (lewtun)
- tfhub.de -> tfhub.dev 12565 (abhishekkrthakur)
- [Flax] Refactor gpt2 & bert example docs 13024 (patrickvonplaten)
- Add MBART to models exportable with ONNX 13049 (LysandreJik)
- Add to ONNX docs 13048 (LysandreJik)
- Fix small typo in M2M100 doc 13061 (SaulLu)
- Add try-except for torch_scatter 13040 (JetRunner)
- docs: add HuggingArtists to community notebooks 13050 (AlekseyKorshuk)
- Fix ModelOutput instantiation form dictionaries 13067 (sgugger)
- Roll out the test fetcher on push tests 13055 (sgugger)
- Fix fallback of test_fetcher 13071 (sgugger)
- Revert to all tests whil we debug what's wrong 13072 (sgugger)
- Use original key for label in DataCollatorForTokenClassification 13057 (ibraheem-moosa)
- [Doctest] Setup, quicktour and task_summary 13078 (sgugger)
- Add VisualBERT demo notebook 12263 (gchhablani)
- Install git 13091 (LysandreJik)
- Fix classifier dropout in AlbertForMultipleChoice 13087 (ibraheem-moosa)
- Doctests job 13088 (LysandreJik)
- Fix VisualBert Embeddings 13017 (gchhablani)
- Proper import for unittest.mock.patch 13085 (sgugger)
- Reactive test fecthers on scheduled test with proper git install 13097 (sgugger)
- Change a parameter name in FlaxBartForConditionalGeneration.decode() 13074 (ydshieh)
- [Flax/JAX] Run jitted tests at every commit 13090 (patrickvonplaten)
- Rely on huggingface_hub for common tools 13100 (sgugger)
- [FlaxCLIP] allow passing params to image and text feature methods 13099 (patil-suraj)
- Ci last fix 13103 (sgugger)
- Improve type checker performance 13094 (bschnurr)
- Fix VisualBERT docs 13106 (gchhablani)
- Fix CircleCI nightly tests 13113 (sgugger)
- Create py.typed 12893 (willfrey)
- Fix flax gpt2 hidden states 13109 (ydshieh)
- Moving fill-mask pipeline to new testing scheme 12943 (Narsil)
- Fix omitted lazy import for xlm-prophetnet 13052 (minwhoo)
- Fix classifier dropout in bertForMultipleChoice 13129 (mandelbrot-walker)
- Fix frameworks table so it's alphabetical 13118 (osanseviero)
- [Feature Processing Sequence] Remove duplicated code 13051 (patrickvonplaten)
- Ci continue through smi failure 13140 (LysandreJik)
- Fix missing `seq_len` in `electra` model when `inputs_embeds` is used. 13128 (sararb)
- Optimizes ByT5 tokenizer 13119 (Narsil)
- Add splinter 12955 (oriram)
- [AutoFeatureExtractor] Fix loading of local folders if config.json exists 13166 (patrickvonplaten)
- Fix generation docstrings regarding input_ids=None 12823 (jvamvas)
- Update namespaces inside torch.utils.data to the latest. 13167 (qqaatw)
- Fix the loss calculation of ProphetNet 13132 (StevenTang1998)
- Fix LUKE tests 13183 (NielsRogge)
- Add min and max question length options to TapasTokenizer 12803 (NielsRogge)
- SageMaker: Fix sagemaker DDP & metric logs 13181 (philschmid)
- correcting group beam search function output score bug 13211 (sourabh112)
- Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account 13056 (SaulLu)
- remove unwanted control-flow code from DeBERTa-V2 13145 (kamalkraj)
- Fix load_tf_weights alias. 13159 (qqaatw)
- Add RemBert to AutoTokenizer 13224 (LysandreJik)
- Allow local_files_only for fast pretrained tokenizers 13225 (BramVanroy)
- fix `AutoModel.from_pretrained(..., torch_dtype=...)` 13209 (stas00)
- Fix broken links in Splinter documentation 13237 (oriram)
- Custom errors and BatchSizeError 13184 (AmbiTyga)
- Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert 13226 (dependabot[bot])
- Update generation_logits_process.py 12671 (willfrey)
- Remove side effects of disabling gradient computaiton 13257 (LysandreJik)
- Replace assert statement with if condition and raise ValueError 13263 (nishprabhu)
- Better notification service 13267 (LysandreJik)
- Fix failing Hubert test 13261 (LysandreJik)
- Add CLIP tokenizer to AutoTokenizer 13258 (LysandreJik)
- Some `model_type`s cannot be in the mapping 13259 (LysandreJik)
- Add require flax to MT5 Flax test 13260 (LysandreJik)
- Migrating conversational pipeline tests to new testing format 13114 (Narsil)
- fix `tokenizer_class_from_name` for models with `-` in the name 13251 (stas00)
- Add error message concerning revision 13266 (BramVanroy)
- Move `image-classification` pipeline to new testing 13272 (Narsil)
- [Hotfix] Fixing the test (warnings was incorrect.) 13278 (Narsil)
- Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. 13277 (Narsil)
- Moving `summarization` pipeline to new testing format. 13279 (Narsil)
- Moving `table-question-answering` pipeline to new testing. 13280 (Narsil)
- Moving `table-question-answering` pipeline to new testing 13281 (Narsil)
- Hotfixing master tests. 13282 (Narsil)
- Moving `text2text-generation` to new pipeline testing mecanism 13283 (Narsil)
- Add DINO conversion script 13265 (NielsRogge)
- Moving `text-generation` pipeline to new testing framework. 13285 (Narsil)
- Moving `token-classification` pipeline to new testing. 13286 (Narsil)
- examples: add keep_linebreaks option to CLM examples 13150 (stefan-it)
- Moving `translation` pipeline to new testing scheme. 13297 (Narsil)
- Fix BeitForMaskedImageModeling 13275 (NielsRogge)
- Moving `zero-shot-classification` pipeline to new testing. 13299 (Narsil)
- Fixing mbart50 with `return_tensors` argument too. 13301 (Narsil)
- [Flax] Correct all return tensors to numpy 13307 (patrickvonplaten)

- examples: only use keep_linebreaks when reading TXT files 13320 (stefan-it)
- Slow tests - run rag token in half precision 13304 (patrickvonplaten)
- [Slow tests] Disable Wav2Vec2 pretraining test for now 13303 (patrickvonplaten)
- Announcing the default model used by the pipeline (with a link). 13276 (Narsil)
- use float 16 in causal mask and masked bias 13194 (hwijeen)
- ✨ add citation file 13214 (flaxel)
- Improve documentation of pooler_output in ModelOutput 13228 (navjotts)
- fix: typo spelling grammar 13212 (slowy07)
- Check None before going through iteration 13250 (qqaatw)
- Use existing functionality for 13251 13333 (sgugger)
- neptune.ai logger: add ability to connect to a neptune.ai run 13319 (fcakyon)
- Update label2id in the model config for run_glue 13334 (sgugger)
- :bug: fix small model card bugs 13310 (nateraw)
- Fall back to `observed_batch_size` when the `dataloader` does not know the `batch_size`. 13188 (mbforbes)
- Fixes 12941 where use_auth_token not been set up early enough 13205 (bennimmo)
- Correct wrong function signatures on the docs website 13198 (qqaatw)
- Fix release utils 13337 (sgugger)
- Add missing module __spec__ 13321 (laurahanu)
- Use DS callable API to allow hf_scheduler + ds_optimizer 13216 (tjruwase)
- Tests fetcher tests 13340 (sgugger)
- [Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests 13313 (patrickvonplaten)
- Fixing a typo in the data_collator documentation 13309 (Serhiy-Shekhovtsov)
- Add GPT2ForTokenClassification 13290 (tucan9389)
- Doc mismatch fixed 13345 (Apoorvgarg-creator)
- Handle nested dict/lists of tensors as inputs in the Trainer 13338 (sgugger)
- [doc] correct TP implementation resources 13248 (stas00)
- Fix minor typo in parallelism doc 13289 (jaketae)
- Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication 13152 (olenmg)
- TF CLM example fix typo 13002 (Rocketknight1)
- Add generate kwargs to Seq2SeqTrainingArguments 13339 (sgugger)

Page 14 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 14 of 26

4.11.1

4.11.0

4.10.3

4.10.2

4.10.1

4.10.0

Page 14 of 26

Links

Releases