Transformers

Latest version: v4.41.0

Safety actively analyzes 631215 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 26

4.27.4

Not secure

This patch fixes a regression with FlauBERT and XLM models.

* Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (21627) in 22444 by sgugger

4.27.3

Not secure

Enforce `max_memory` for device_map strategies by sgugger in 22311

4.27.2

Not secure

* Fix balanced and auto device_map by sgugger in 22271

4.27.1

Not secure

Patches two unwanted breaking changes that were released in v4.27.0:
- Revert 22152 MaskedImageCompletionOutput changes (22187)
- Regression pipeline device (22190)

4.27.0

Not secure

BridgeTower

The goal of this model is to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder thus achieving remarkable performance on various downstream tasks with almost negligible additional performance and computational costs.

* Add BridgeTower model by abhiwand in 20775
* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval by abhiwand in 21684
* [WIP] Add BridgeTowerForContrastiveLearning by abhiwand in 21964

Whisper speedup

The Whisper model was integrated a few releases ago. This release offers significant performance optimizations when generating with timestamps. This was made possible by rewriting the `generate()` function of `Whisper`, which now uses the `generation_config` and implementing a batched timestamp prediction. The `language` and `task` can now also be setup when calling `generate()`. For more details about this refactoring checkout [this colab](https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor#scrollTo=Ca4YYdtATxzo).
Notably, whisper is also now supported in `Flax` 🚀 thanks to andyehrenberg ! More whisper related commits:

* [Whisper] Refactor whisper by ArthurZucker in 21252
* [WHISPER] Small patch by ArthurZucker in 21307
* [Whisper] another patch by ArthurZucker in 21324
* add flax whisper implementation by andyehrenberg in 20479
* Add WhisperTokenizerFast by jonatanklosko in 21222
* Remove CLI spams with Whisper FeatureExtractor by qmeeus in 21267
* Update document of WhisperDecoderLayer by ling0322 in 21621
* [WhisperModel] fix bug in reshaping labels by jonatasgrosman in 21653
* [Whisper] Add SpecAugment by bofenghuang in 21298
* Fix-ci-whisper by ArthurZucker in 21767
* Fix `WhisperModelTest` by ydshieh in 21883
* [Whisper] Add rescaling function with `do_normalize` by ArthurZucker in 21263
* Refactor whisper asr pipeline to include language too. by Narsil in 21427
* Update `model_split_percents` for `WhisperModelTest` by ydshieh in 21922
* [Whisper] Fix feature normalization in `WhisperFeatureExtractor` by bofenghuang in 21938
* [Whisper] Add model for audio classification by sanchit-gandhi in 21754
* fixes the gradient checkpointing of whisper by soma2000-lang in 22019
* Skip 3 tests for `WhisperEncoderModelTest` by ydshieh in 22060
* [Whisper] Remove embed_tokens from encoder docstring by sanchit-gandhi in 21996
* [`Whiper`] add `get_input_embeddings` to `WhisperForAudioClassification` by younesbelkada in 22133
* [🛠️] Fix-whisper-breaking-changes by ArthurZucker in 21965

DETA

DETA (short for Detection Transformers with Assignment) improves [Deformable DETR](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/deformable_detr) by replacing the one-to-one bipartite Hungarian matching loss with one-to-many label assignments used in traditional detectors with non-maximum suppression (NMS). This leads to significant gains of up to 2.5 mAP.

* Add DETA by NielsRogge in 20983

SpeechT5

The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.

* add SpeechT5 model by hollance in 18922

XLM-V

XLM-V is multilingual language model with a one million token vocabulary trained on 2.5TB of data from Common Crawl (same as XLM-R).

* Add XLM-V to Model Doc by stefan-it in 21498

BLIP-2

BLIP-2 leverages frozen pre-trained image encoders and large language models (LLMs) by training a lightweight, 12-layer Transformer encoder in between them, achieving state-of-the-art performance on various vision-language tasks. Most notably, BLIP-2 improves upon [Flamingo](https://arxiv.org/abs/2204.14198), an 80 billion parameter model, by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.

* Add BLIP-2 by NielsRogge in 21441

X-MOD

X-MOD extends multilingual masked language models like [XLM-R](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/xlm-roberta) to include language-specific modular components (language adapters) during pre-training. For fine-tuning, the language adapters in each transformer layer are frozen.

* Add X-MOD by jvamvas in 20939

Ernie-M

ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.

* Add Ernie-M Model to huggingface by susnato in 21349

TVLT

The Textless Vision-Language Transformer (TVLT) is a model that uses raw visual and audio inputs for vision-and-language representation learning, without using text-specific modules such as tokenization or automatic speech recognition (ASR). It can perform various audiovisual and vision-language tasks like retrieval, question answering, etc.

* Add TVLT by zinengtang in 20725

CLAP

CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score.

* [CLAP] Add CLAP to the library by ArthurZucker in 21370
* [`CLAP`] Fix few broken things by younesbelkada in 21670

GPTSAN

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.

* add GPTSAN model (reopen) by tanreinama in 21291

EfficientNet

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.

* Add EfficientNet by alaradirik in 21563

ALIGN

ALIGN is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. ALIGN features a dual-encoder architecture with [EfficientNet](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/efficientnet) as its vision encoder and [BERT](https://huggingface.co/docs/transformers/v4.27.0/en/model_doc/bert) as its text encoder, and learns to align visual and text representations with contrastive learning. Unlike previous work, ALIGN leverages a massive noisy dataset and shows that the scale of the corpus can be used to achieve SOTA representations with a simple recipe.

* Add ALIGN to transformers by alaradirik in 21741

Informer

Informer is a method to be applied to long-sequence time-series forecasting. This method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

* [Time-Series] informer model by elisim in 21099

API updates and improvements

Safetensors

`safetensors` is a safe format of serialization of tensors, which has been supported in `transformers` as a first-class citizen for the past few versions.

This change enables explicitly forcing the `from_pretrained` method to use or not to use `safetensors`. This unlocks a few use-cases, notably the possibility to enforce loading *only* from this format, limiting security risks.

Example of usage:

python
from transformers import AutoModel

As of version v4.27.0, this loads the `pytorch_model.bin` by default if `safetensors` is not installed.
It loads the `model.safetensors` file if `safetensors` is installed.
model = AutoModel.from_pretrained('bert-base-cased')

This forces the load from the `model.safetensors` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=True)

This forces the load from the `pytorch_model.bin` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=False)

* [Safetensors] Add explicit flag to from pretrained by patrickvonplaten in 22083

Variant

This PR adds a "variant" keyword argument to PyTorch's from_pretrained and save_pretrained so that multiple weight variants can be saved in the model repo.

Example of usage with the model hosted in [this folder on the Hub](https://huggingface.co/huggingface/the-no-branch-repo/tree/main/text_encoder):

python
from transformers import CLIPTextModel

path = "huggingface/the-no-branch-repo" or ./text_encoder if local

Loads the `no_ema` variant. This loads the `pytorch_model.fp16.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder", variant="fp16")

This loads the no-variant checkpoint, loading the `pytorch_model.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder")

* Add variant to transformers by patrickvonplaten in 21332
* [Variant] Make sure variant files are not incorrectly deleted by patrickvonplaten in 21562

bitsandbytes

The `bitsandbytes` integration is overhauled, now offering a new configuration: the `BytsandbytesConfig`.

Read more about it in the [documentation](https://huggingface.co/docs/transformers/v4.27.0/en/main_classes/quantization#bitsandbytes-integration).

* [`bnb`] Introducing `BitsAndBytesConfig` by younesbelkada in 21579
* [`bnb`] fix `bnb` decoders bug by younesbelkada in 21688

FSDP

This PR enables the user to make use of the [PyTorch/XLA implementation of FSDP](https://github.com/pytorch/xla/tree/master/torch_xla/distributed/fsdp), including the newly added [auto-wrap feature](https://github.com/pytorch/xla/pull/4318). Four arguments have been added to `training_args.py` to facilitate this functionality:

- `xla_fsdp`: this flag is a string containing the location of a `.json` file which specifies the FSDP arguments the user wants to use when wrapping their model.
- `xla_fsdp_min_num_params`: this flag is an int which will set a size-based automatic wrapping policy which automatically FSDP wraps any module with at least `xla_fsdp_min_num_params` many parameters.
- `xla_fsdp_transformer_layer_cls_to_wrap`: this flag is a list of (case-sensitive) strings which will set a layer-class-based automatic wrapping policy which automatically FSDP wraps any module whose name matches one of the listed strings.
- `xla_fsdp_grad_ckpt`: this flag is a bool which determines whether gradient checkpointing is enabled for the automatically wrapped layers.

* Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) by AlexWertheim in 21406

Breaking changes

*Generate*

This PR standardizes beam search behavior across all three frameworks through `early_stopping`. PyTorch is unchanged, but TensorFlow and Flax users will see a significant speedup if they keep the default generation parameters.

There are, however, minor differences in outputs of the `.generate` method with beam search on TensorFlow and Flax. It should be very small and will come with significant speedups, but in case it breaks your workflow, we recommend you downgrade to a previous version and let us know in a GitHub issue so that we may investigate what is going on.

* 🚨🚨 Generate: standardize beam search behavior across frameworks by gante in 21368

*Single model initialization*

Model initialization has problems which led to the initialization being incoherent across models and across initialization techniques. This is technically a bugfix, but as it may result in your models being initialized with different values, we think it best to highlight it here.

* 🚨🚨🚨 Enforce single model initialization by sgugger in 21431

Deprecations

This PR deprecated the `parallelize` API which has been replaced by `accelerate` months ago. We recommend loading the model using the `device_map` attribute and setting it to `balanced` to obtain the previous behavior.

Setting your own `device_map` is still permitted, but it needs to be a dictionary from module name to device, for example:

py
device_map = {'h.0': 0, 'h.1': 1, ...}

* Deprecate parallelize API by sgugger in 21448

Pipelines

A new pipeline focused on zero-shot audio classification is added to the repository.

* [Pipeline] Add zero shot audio classification pipeline by ArthurZucker in 21600

Documentation

The task and model summaries have been refactored to take into account the larger number of tasks and models we now have.

* Update task summary by stevhliu in 21067
* Refactor model summary by stevhliu in 21408

Bugfixes and improvements

* [`t5`] Fix T5 inference in `float16` + `bnb` error by younesbelkada in 21281
* [examples/deepspeed] fix renamed api by stas00 in 21283
* [GenerationConfig] add additional kwargs handling by ArthurZucker in 21269
* [W2V2 with LM] Fix decoder test with params by sanchit-gandhi in 21277
* Fix `TrainingArguments.label_names` docs to reflect the correct default value behaviour by fredtcaroli in 21288
* Update expected values for doctest by stevhliu in 21284
* [GIT] Add test for batched generation by NielsRogge in 21282
* Supporting `ImageProcessor` in place of `FeatureExtractor` for pipelines by Narsil in 20851
* [Mask2Former] Add doc tests by NielsRogge in 21232
* Moving to cleaner tokenizer version or `oneformer`. by Narsil in 21292
* Fix `EfficientFormer` by ydshieh in 21294
* [Hubert] Fix Hubert processing auto by younesbelkada in 21299
* Update `OneFormerModelIntegrationTest` expected values by ydshieh in 21295
* [Doctest] Fix `Blenderbot` doctest by younesbelkada in 21297
* Documentation code sample fixes by MKhalusova in 21302
* [CI-Daily] replace `past` in prepare inputs for generation by ArthurZucker in 21296
* Small fix to ExponentialDecayLengthPenalty docstring by njhill in 21308
* Accept batched tensor of images as input to image processor by amyeroberts in 21144
* Use `model_class.__name__` and compare against `XXX_MAPPING_NAMES` by ydshieh in 21304
* Fix 2 paths in the doctest list by ydshieh in 21314
* [i18n-KO] Translated quicktour page to Korean by wonhyeongseo in 20946
* Small QoL for qa. by Narsil in 21316
* check paths in `utils/documentation_tests.txt` by ydshieh in 21315
* Fix `TFEncoderDecoder` tests by ydshieh in 21301
* Generate: better `compute_transition_scores` examples by gante in 21323
* [Doctest] Fix `Perceiver` doctest by younesbelkada in 21318
* Update Hebrew language code to he per IANA registry by altryne in 21310
* Fix M2M100 positional embedding creation for ONNX by michaelbenayoun in 21328
* Fix `RobertaPreLayerNorm` doctest by ydshieh in 21337
* Little cleanup: let huggingface_hub manage token retrieval by Wauplin in 21333
* Automated compatible models list for task guides by MKhalusova in 21338
* Fix `GitModelIntegrationTest.test_batched_generation` device issue by ydshieh in 21362
* Pipeline testing - using tiny models on Hub by ydshieh in 20426
* fix the issue that the output dict of jit model could not get [0] by sywangyi in 21354
* Corrected by HsiangNianian in 21350
* Remove duplicate declarations in dummy inputs for TFLongformer by peakji in 21352
* Fix DETR tests after 21144 by amyeroberts in 21365
* Add cPython files in build by sgugger in 21372
* Generate: Relaxed `max_length` and `max_new_tokens` coexistence by gante in 21347
* Fixes path for Graphormer checkpoint by clefourrier in 21367
* Adding resource section to GPT-J docs by adit299 in 21270
* translate index to zh by bfss in 20095)
* [`run_(clm|mlm).py` examples] add streaming dataset support by stas00 in 21343
* Template for framework-agnostic tests by gante in 21348
* Cleanup the usage of `layer_norm_eps` in some models by ydshieh in 21336
* Do not log the generation config for each prediction step in TrainerSeq2Seq by regisss in 21385
* [Docs] Minor fixes by NielsRogge in 21383
* Simplify column_names in run_clm/mlm by lhoestq in 21382
* Add support of backward_prefetch and forward_prefetch by raghavanone in 21237
* Remove more unused attributes in config classes by ydshieh in 21327
* Generate: fix TF XLA tests on models with `max_position_embeddings` or `max_target_positions` by gante in 21389
* Update `Graphormer` and fix its `torchscript` test failures by ydshieh in 21380
* Moved LiLT under multimodal models in TOC by MKhalusova in 21393
* Fix the issue of using only inputs_embeds in convbert model by raghavanone in 21398
* Skip batches fast with accelerate by sgugger in 21390
* Added DagshubCallback by jinensetpal in 21404
* Add TF image classification example script by amyeroberts in 19956
* Generate: decoder-only models can generate with `inputs_embeds` by gante in 21405
* Use torch `1.13.1` in push/schedule CI by ydshieh in 21421
* Fix image_processor_class bug by shikhartuli in 21410
* Add distinct section names for PyTorch and TF by Rocketknight1 in 21422
* Add the GeLU activation from pytorch with the tanh approximation by jlamypoirier in 21345
* Fix Graphormer test suite by clefourrier in 21419
* [`bnb`] Fine-tuning HF 8-bit models by younesbelkada in 21290
* Allow to add more information in `is_flaky` by ydshieh in 21426
* Fix some pipeline tests by ydshieh in 21401
* Fix task guide formatting by stevhliu in 21409
* Fixes bug in the creation of ExponentialDecayLengthPenalty by jorgemcgomes in 21423
* Add `inputs_embeds` support for `.generate()` with BLOOM models by akreal in 21430
* Remove more unused attributes in config classes by ydshieh in 21392
* Added model resources for LayoutLM Issue19848 by avisinghal6 in 21377
* Fix device issue in a `ConvBertModelTest` test by ydshieh in 21438
* do not scale gradient in bf16 mode by kashif in 21428
* exclude deleted files in the fixup script by dtuit in 21436
* Add tutorial doc for TF + TPU by Rocketknight1 in 21429
* For IterableDataset, return DataLoader using self._train_batch_size. … by agossard in 21447
* Avoid flaky generation sampling tests by ydshieh in 21445
* Fix `SpeechT5ForSpeechToSpeechIntegrationTests` device issue by ydshieh in 21460
* Add perf numbers for perf_train_cpu by jianan-gu in 20974
* Added documentation for DagsHubCallback by jinensetpal in 21452
* Fix `PushToHubCallback` import in Share a model docs by ireneisdoomed in 21457
* Add VQGAN-CLIP research project by ErwannMillon in 21329
* Fixed RAG script which was failing on dummy example by kaustubhdhole in 21416
* make SpeechT5 doc examples deterministic by hollance in 21470
* Generate: TF can now accept custom logits processors by gante in 21454
* Removing `more_itertools` dependency. by Narsil in 21473
* [examples] improve block_size warning message by stas00 in 21463
* [i18n-fr] Translate index page to French by NoB0 in 21458
* OPT: BLIP2-ready `prepare_inputs_for_generation` by gante in 21477
* Add tips for generation with Int8 models by lewtun in 21424
* Update quality tooling for formatting by sgugger in 21480
* Fix epoch number when resuming training by sgugger in 21478
* [CI ] Remove `past` in favor of `pat_key_values` by ArthurZucker in 21443
* Generate: TF can now generate from embeddings in encoder-decoder models by gante in 21475
* [`Doc`] Fix int8 docs by younesbelkada in 21487
* changed "ot" to "to" by Iulian277 in 21488
* :pen: fix typo in pytorch semantic segmentation readme by jvdd in 21492
* Typos/fixes to link syntax by Rocketknight1 in 21450
* Sanity check the type of id2label and label2id arguments of from_pretrained for TokenClassification models by raghavanone in 21490
* [OPT] Adds `GPT2TokenizerFast` to the list of tokenizer to use for OPT. by ArthurZucker in 20823
* A new test to check config attributes being used by ydshieh in 21453
* Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug by raghavanone in 21489
* Cleanup quality by sgugger in 21493
* [tokenizer] sanitize saved config by stas00 in 21483
* Add inverse sqrt learning rate scheduler by Sager611 in 21495
* Check for mapping/dict in distributed_concat function by prajwal967 in 21500
* Fix import in Accelerate for find_exec_bs by sgugger in 21501
* Wrap RemBert integration test forward passes with torch.no_grad() by katiele47 in 21503
* Exclude the madeup words from M2M100Tokenizer.vocab_size by guillaumekln in 20976
* [Doc] Minor URL fixes in PyTorch Text Classification Readme by stefan-it in 21511
* Generate: TF `compute_transition_scores` by gante in 21341
* no more dummies for speech processors by hollance in 21517
* Update OPT conversion script to work for OPT-IML by thomasw21 in 21519
* [tests] add missing `report_to none` by stas00 in 21505
* Fixing backward compatiblity `image_processor` in pipeline. by Narsil in 21513
* Fix multiple `eos_token_id`s in model.generate(...) by tokestermw in 21461
* Add `__len__` method to `_LazyAutoMapping` by ydshieh in 21522
* Generate: make TF `.generate()` signature == PT `.generate()` signature by gante in 21525
* Generate: TF `.generate()` can now be exported with dynamic length by gante in 21474
* Fix missing unfinished_sequences by tokestermw in 21529
* Fix ClearML Integration to run in ClearML pipelines and external Tasks. by thepycoder in 21531
* Tag tests as slow ⌛ by gante in 21537
* fix typo in run_speech_recognition_ctc.py by 21jun in 21528
* Fix inclusion of non py files in package by sgugger in 21546
* Fix from_pretrained API with config and state_dict by sgugger in 21542
* Added with torch.no_grad() to XLM-Roberta integration test by katiele47 in 21547
* [`pipeline`] A simple fix for half-precision & 8bit models by younesbelkada in 21479
* Added with torch.no_grad() to Camembert integration test by katiele47 in 21544
* adding a tip for deepspeed integration in multi-node environment by izapolsk in 21459
* Fix stuff related to the causal_mask in CodeGen. by GeneZC in 21527
* Replace inefficient torch.sqrt taking scalar input with numpy.sqrt by FindHao in 21496
* Add _mp_fn to run_mae.py for XLA testing by steventk-g in 21551
* [Tests] Improve flax test_attention_outputs by Shubhamai in 21486
* [from_pretrained] extend `torch_dtype="auto"` to look up `config.torch_dtype` first, expand docs by stas00 in 21524
* [Tasks] Adds image captioning by sayakpaul in 21512
* Goodbye to Blip-2 doctests by ydshieh in 21566
* [deepspeed] deal with models w/o `config.hidden_size` by stas00 in 21504
* improving contributing tests section by Shubhamai in 21569
* Replace input_values_processing with unpack_inputs by amyeroberts in 21502
* Added timesformer configuration by AdiaWu in 21446
* Remove more unused attributes in config classes by ydshieh in 21543
* [`Blip2`] Add int8 support for `blip2-flan-t5-xxl` by younesbelkada in 21574
* Generate: TF supports multiple eos tokens by gante in 21571
* Add: document question answering task guide by MKhalusova in 21518
* CI: skip failing TF hubert test by gante in 21601
* Remove trailing 'extractive' word from en documentation by tpaviot in 21594
* [MINOR] Fix link in timeseries transformer docs by cakiki in 21602
* Add `inputs_embeds` support when generating with GPT-J by dimitry12 in 21575
* Generate: Fix flaky indexing error in `test_constrained_beam_search_generate_dict_output` by gante in 21561
* [`bnb`] Let's make the daily CI green 🍏 by younesbelkada in 21597
* annotated TFvisionEncoderDecoder input type hints by miyu386 in 21432
* Correct Markdown bullets indentation by wangkuiyi in 21583
* Add missing arguemtn to run_clip.py by WarrenGreen in 21588
* Fix Blip-2 CI by ydshieh in 21595
* Generate: correct default model input creation for decoder-only models by gante in 21580
* [i18n-fr] Translate quicktour page to French by NoB0 in 21589
* Update setup.py by stas00 in 21584
* [deepspeed] performance docs by stas00 in 21573
* Clarify available pipelines in quicktour by stevhliu in 21607
* Fix env. variable type issue in testing by ydshieh in 21609
* Fix TF CTC tests by gante in 21606
* Add in big model inference to issue template by muellerzr in 21611
* Enable `requires_grad` on input embedding to train on top of frozen layers by younesbelkada in 21598
* Generate: filter encoder inputs when its signature does not accept wildcards by gante in 21603
* Generate: input expansion for any model input by gante in 21624
* Final cleanup of TOKENIZER_FOR_DOC by sgugger in 21565
* Remove Niels from templates by sgugger in 21564
* Fix generation config for empty state dict by sgugger in 21630
* Removes duplicate computations in DETR post processing by eclique in 21592
* Fix typo in documentation. by mmcdermott in 21632
* Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) by BenoitDalFerro in 21627
* Fix typo in QA task guide by stevhliu in 21608
* fix: Race Condition when using Sagemaker Checkpointing and Model Repository by DougTrajano in 21614
* Remove extra "`max_length` is reached." from InfNaNLogitsProcessor documentation by mmcdermott in 21634
* Fix Blip-2 CI again by ydshieh in 21637
* Skip wav2vec2 hubert high mem tests by amyeroberts in 21643
* Fix passing kwargs to TFBertTokenizer by balvisio in 21619
* Skipping more high mem tests - Wav2Vec2 Hubert by amyeroberts in 21647
* Pass parent exception as context exception to provide clearer stack trace by balvisio in 21636
* Generate: PT Dynamo without graph breaks in the main greedy/sample loop by gante in 21648
* Update deprecated load_module by sgugger in 21651
* Fix typos in contrastive-image-text example README by regisss in 21665
* [WIP] Move X-MOD models to facebook organization by jvamvas in 21640
* refactor: Make direct_transformers_import util by connor-henderson in 21652
* [bloom] gradient_checkpointing fix by stas00 in 21655
* Add OPT resources to the transformers documentation by alissadb in 21625
* Adapt PerceiverIO Multimodal class to work with arbitrary modalities by stevenmanton in 20054
* Fix multi-gpu training error for LayoutLMv2 by akkikiki in 21675
* Generate: eta sampling numerical stability by gante in 21676
* [`ImageProcessor`] Refactor default `mean` & `std` to `OPENAI_CLIP_MEAN` & `OPENAI_CLIP_STD` by younesbelkada in 21425
* [`BLIP`] update blip path on slow tests by younesbelkada in 21476
* Fix dynamic module import error by ydshieh in 21646
* Fix for non-contiguous label tensors in VisonEncoderDecoder by morganmcg1 in 21582
* Fix-rag-finetune-project-requirement by ArthurZucker in 21697
* Pass along revision in dynamic code fetch by sgugger in 21698
* Fix axial positional encoding calculations for reformer.mdx by ijindal in 21649
* remove position ids and token type ids from forward args in docstring by ArthurZucker in 21701
* Fix typo in `PROCESSOR_MAPPING_NAMES` and add tests by ydshieh in 21703
* Fix `get_class_in_module` by ydshieh in 21709
* Fix TVLT (torch device issue) by ydshieh in 21710
* Adding task guides to resources by MKhalusova in 21704
* Adding type hints to call() functions in this file by mollerup23 in 21548
* Time series transformer: input projection and Std scaler by kashif in 21020
* Apply ruff flake8-comprehensions by Skylion007 in 21694
* [`MBart`] Fix cross attention mask check by younesbelkada in 21730
* Respect documentation on passive log level by sgugger in 21700
* Remove `gptsan_japanese` from doctest list to avoid GPU OOM by ydshieh in 21722
* Change doc example for `BigBirdForQuestionAnswering` by ydshieh in 21723
* Fix `ErnieMEmbeddings` device issue by ydshieh in 21726
* Fix `GPTSanJapaneseModel` by ydshieh in 21731
* [SpeechT5HifiGan] Handle batched inputs by sanchit-gandhi in 21702
* Fix to KerasMetricCallback when the model returns unstructured output by Rocketknight1 in 21727
* Added "Open in Colab" to task guides by MKhalusova in 21729
* typos in french documentation by tpaviot in 21750
* Make ImageProcessorMixin compatible with subfolder kwarg by Abhinay1997 in 21725
* Update doctest GH workflow file by ydshieh in 21744
* Fix 2 quicktour file doctest by ydshieh in 21742
* [`GPTNeo`] Fix gradient checkpointing bug by younesbelkada in 21733
* Generate: Fix GIT batched captioning by gante in 21738
* Added Type Hints for modeling_tf_encoder_decoder.py by Batese2001 in 21673
* Auto api Value Error addition to Troubleshoot by MKhalusova in 21708
* [deepspeed tests] fix issues introduced by 21700 by stas00 in 21769
* Graphormer fix by clefourrier in 21699
* fix: Change is_last chunk calc and add conditional break in chunk_iter by connor-henderson in 21612
* [Flax] adding support for batch norm layers by Shubhamai in 21581
* [Examples] Generalise run audio classification for log-mel models by sanchit-gandhi in 21756
* Different behavior in DistilBERT when using "inputs_embeds" by ArthurZucker in 21752
* [Flax] Fix erroneous kwargs being passed to generate config by sanchit-gandhi in 21765
* Generate - update cookie cutters to not initialize cache with training and gradient checkpointing by gante in 21759
* [time series] updated expected values for integration test. by kashif in 21762
* [GPT2, ProphetNet] Fix gradient checkpointing bug by yhl48 in 21772
* [SpeechT5] Fix HiFiGAN tests by sanchit-gandhi in 21788
* Fix resume_from_checkpoint for deepspeed by mosheber in 21735
* [examples/summarization] deal with `max_length` and `num_beams` by bofenghuang in 21740
* Fix type in gpt2 config docstring by WeberJulian in 21782
* Fix en documentation typos by tpaviot in 21799
* [FX tracer] Make `concrete_args` from outside available by lygztq in 21775
* [torch] remove deprecated uint8 in favor of bool by ArthurZucker in 21384
* [`tests`] add `accelerate` marker by younesbelkada in 21743
* Fix PyTorch Perceiver `PerceiverFourierPositionEncoding` with fp16 by fxmarty in 21787
* Fix nn.init.trunc_normal_ call on torch.float16 data by fxmarty in 21789
* Fix gradient checkpointing bug in gptneox by KMFODA in 21815
* Inheritance-based framework detection by gante in 21784
* Fix quality with `ruff==0.0.253` by ydshieh in 21828
* introduce `logger.warning_once` and use it for grad checkpointing code by stas00 in 21804
* Rename `MobileViTModelTest` to `TFMobileViTModelTest` by ydshieh in 21825
* Fix gradient checkpointing bug BioGpt by saswatmeher in 21844
* check for None forced tokens by andyehrenberg in 21793
* Fix gradient checkpointing bug in git by KMFODA in 21818
* Fix gradient checkpointing imagegpt by KMFODA in 21816
* Fix tf random token masking probability in data collator by anruijian in 21834
* [`T5`] Fix torchquant issue by younesbelkada in 21843
* [`Blip2`] Add `Blip2Model` by younesbelkada in 21817
* Fix the issue of blip model returning loss even when the label is not provided. by raghavanone in 21811
* [GPTJ] Fix gradient checkpointing bug by krypticmouse in 21794
* Add: task guide for zero shot object detection by MKhalusova in 21829
* Make Slack CI reporting stronger by ydshieh in 21823
* [`Blip2`] Fix Blip-2 multi gpu by younesbelkada in 21707
* 🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 by ydshieh in 21516
* Improve TF weight loading, especially PT crossloading by Rocketknight1 in 21792
* Fix flaky test for log level by sgugger in 21776
* prepare for "__floordiv__ is deprecated and its behavior will change in a future version of pytorch" by ArthurZucker in 20211
* [ConvBert] Fix 21523 by ArthurZucker in 21849
* Flax beam search fix by andyehrenberg in 21857
* Fix gradient checkpointing bug Bart by saswatmeher in 21866
* [deepspeed] check whether model is NLP one instead of counting on input type by izapolsk in 21800
* Change the way tensor is reshaped in BartAttention (from .view to .reshape) by raghavanone in 21860
* Italian translation of community.mdx by lorenzobalzani in 21871
* [`Blip`] Fix blip doctest by younesbelkada in 21868
* Removed BLIP mention from the troubleshooting guide by MKhalusova in 21872
* update FSDP and add XLA-FSDP documentation by pacman100 in 21812
* [doc] deepspeed tests by stas00 in 21859
* Add an utility file to get information from test files by ydshieh in 21856
* Add check for different embedding types in examples by Rocketknight1 in 21881
* Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights by twaka in 21879
* Fix Gradient checkpointing bug BigBird by saswatmeher in 21882
* Fix `test_load_default_pipelines_pt` for `ClapModel` by ydshieh in 21886
* fix checkpoint by ArthurZucker in 21874
* [Refactor] Relative imports wherever we can by ArthurZucker in 21880
* [ZAC] fix ci daily by ArthurZucker in 21893
* Use PyAV instead of Decord in examples by amyeroberts in 21572
* Add `inputs_embeds` functionality when generating with BioGPT by sidkiblawi in 21889
* [T5 doc] Fix confusing documentation about `d_kv` by ArthurZucker in 21896
* fix typo in Bart's attention by kashif in 21898
* [GPT-J] add deprecation warning by ArthurZucker in 21869
* fsdp bf16 enable autocast by pacman100 in 21847
* Fix gradient checkpointing bug LED by KMFODA in 21840
* Fix gradient checkpointing bug M2M 100 by KMFODA in 21841
* Fix gradient checkpointing bug marian by KMFODA in 21842
* Mark pipeline tests to skip them easily by sgugger in 21887
* Clean up auto mapping names by ydshieh in 21903
* Prophetnet batch dimension inversion fix by kiansierra in 21870
* Make schedulers picklable by making lr_lambda fns global by connor-henderson in 21768
* Add Blip and Blip2 for pipeline tests by ydshieh in 21904
* Temporarily skip 3 tests in `BridgeTowerModelTest` by ydshieh in 21908
* Faster zero shot image by Narsil in 21897
* [time series] Add Time series inputs tests by kashif in 21846
* Avoid modeling tests run in pipeline CI jobs by ydshieh in 21911
* Fix doctests for TFVisionTextDualEncoder by Rocketknight1 in 21910
* faster forward following what is done for images by ArthurZucker in 21906
* Fix gradient checkpointing bug in MBart by KMFODA in 21918
* Fix gradient checkpointing bug in mvp by KMFODA in 21920
* Fix gradient checkpointing megatron bert by KMFODA in 21921
* Use large VM for `repo_utils_job` by ydshieh in 21928
* Cleanup more auto mapping names by ydshieh in 21909
* feat: filter try/except when looking at custom code by zanussbaum in 21914
* Fix `AlignModelTest` tests by ydshieh in 21923
* Avoid failure in `check_repo.py` due to missing backends by ydshieh in 21930
* Fix wrong documentation about DataCollator padding defaults by substanc3-dev in 21919
* [Flan-UL2] Add-flan-ul2 by ArthurZucker in 21929
* Update README logo by gary149 in 21933
* [CLAP] Support batched inputs for CLAP. Fixes pipeline issues by ArthurZucker in 21931
* Fix gradient checkpointing bug in OPT by KMFODA in 21943
* Fix gradient checkpointing bug in Pegasus by KMFODA in 21944
* Fix gradient checkpointing bug in Rembert by KMFODA in 21945
* Fix gradient checkpointing bug in Roformer by KMFODA in 21946
* Fixed gradient_checkpointing/use_cache bug in blenderbot by Batese2001 in 21833
* Update expected values in `XLMProphetNetModelIntegrationTest` by ydshieh in 21957
* [CI] Fix ci by ArthurZucker in 21940
* Disable DDP for neuron by aws-sangeetha in 21953
* Fix bert issue by saswatmeher in 21963
* [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM by asrimanth in 21956
* Add missing parameter definition in layoutlm config by Atomnp in 21960
* Use larger atol in `torch.allclose` for some tests by ydshieh in 21966
* Add TF contrastive image text finetuning example by Rocketknight1 in 21939
* Update expected values for `test_xglm_sample` by ydshieh in 21975
* Fix gradient checkpointing bug in BigBird Pegasus by KMFODA in 21976
* Fix gradient checkpointing bug in Blenderbot Small by KMFODA in 21977
* Fix gradient checkpointing bug in BlipText by KMFODA in 21978
* Fix gradient checkpointing bug in Codegen by KMFODA in 21979
* Fix gradient checkpointing bug in ESM by KMFODA in 21980
* docs: improve clarity for language modeling by pdhall99 in 21952
* Update `Jukebox` tests by ydshieh in 21984
* Add check before int casting for PIL conversion by amyeroberts in 21969
* Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens by eladsegal in 21959
* [DETR, YOLOS] Fix device bug by NielsRogge in 21974
* Remove unneeded casts to bool by regisss in 21983
* Update `notification_service.py` by ydshieh in 21992
* Skip `test_multi_gpu_data_parallel_forward` for some model tests by ydshieh in 21991
* Stop requiring Torch for our TF examples! by Rocketknight1 in 21997
* [TF] Fix creating a PR while pushing in TF framework by ArthurZucker in 21968
* [DETR and friends] Remove is_timm_available by NielsRogge in 21814
* Update tiny model creation script and some others files by ydshieh in 22006
* Generate - add 1 to cur_len to make up the new beam length by jimmieliu in 21993
* VideoMAE doctest - use valid dummy pixel values by amyeroberts in 22022
* update: bertology paper by QiushiSun in 22012
* Update `AudioClassificationPipelineTests::test_small_model_pt` for PT 2.0.0 by ydshieh in 22023
* [`bnb`] Fix bnb error message by younesbelkada in 22026
* Fix test for torchneuroncore in Trainer by sgugger in 22028
* Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline by anruijian in 22031
* [examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py by bofenghuang in 21942
* Avoid `text_config_dict` and `vision_config_dict` being saved for CLIP-like models by ydshieh in 22035
* Mark all `BridgeTower` tests slow for now by ydshieh in 22039
* Bug fix: token classification pipeline while passing offset_mapping by cceyda in 22034
* Update ALIGN docs by alaradirik in 22025
* [21737][T5]: Fix gradient checkpoint bug by nipunjindal in 22036
* Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it by shaun-scale in 22045
* Can't install tf2 on M1 Chip by default by shaun-scale in 22046
* Remove set_access_token usage + fail tests if FutureWarning by Wauplin in 22051
* Show the number of `huggingface_hub` warnings in CI report by ydshieh in 22054
* Return analysis for hyperparameter_search with Ray backend by anruijian in 22040
* pt-to-tf model architecture override by Rocketknight1 in 22055
* rm $ symbol from code block from contributing.md by kamalkraj in 22057
* [deepspeed] offload + non-cpuadam optimizer exception by stas00 in 22043
* Edit the docstring of `image_processing_donut` to match code by vermouthmjl in 22033
* Add setters by type of args to TrainingArguments by sgugger in 21570
* Update tiny model creation script by ydshieh in 22058
* Fix case when using --gradient_accumulation_steps with DDP disabled. by aws-sangeetha in 22007
* Add a progress bar for the total download of shards by sgugger in 22062
* Fix gradient checkpointing bug in Speech2Text by KMFODA in 22079
* Fix gradient checkpointing bug in switch transformer by KMFODA in 22081
* [GPT2] Propose fix for 21080 by ArthurZucker in 21853
* Fix small typo in flan-ul2.mdx by kevin51jiang in 22068
* Generate - Fix broken documentation links by gante in 22078
* Fix gradient checkpointing bug in Speecht5 by KMFODA in 22080
* Fix hint in src/transformers/modeling_utils.py by J-shang in 22074
* handle numpy inputs in whole word mask data collator by dwyatte in 22032
* GPT-J specific half precision on CPU note by MKhalusova in 22086
* Fix imports of TF MobileViT by sgugger in 22065
* Revert "[GPT2] Propose fix for 21080" by ydshieh in 22093
* Add AutoModelForZeroShotImageClassification by alaradirik in 22087
* add new model of MGP-STR by wdp-007 in 21418
* Add pr_checks.mdx Italian translation by alexcalabrese in 17459)
* Fix gradient checkpointing bug in xglm by KMFODA in 22127
* Add TFVisionTextDualEncoder by Rocketknight1 in 21873
* Fix gradient checkpointing bug in Trajectory Transformer by KMFODA in 22125
* Fix gradient checkpointing bug in xlm_roberta_xl by KMFODA in 22128
* Added big_models.mdx italian translation 17600 by nickprock in 22115
* [`Blip2`] skip accelerate test by younesbelkada in 22124
* Fix gradient checkpointing bug in xmod by KMFODA in 22129
* Fix gradient checkpointing bug in LongT5 by KMFODA in 22130
* Fix gradient checkpointing bug in trocr by KMFODA in 22126
* Zero-shot image classification task guide by MKhalusova in 22132
* Fix doc link for MGP-STR by sgugger in 22138
* Adding Type Hints to TF_Pegasus model by mollerup23 in 21941
* Add a new script to check model testers' config by ydshieh in 22063
* Update configuration_align.py (projected_dim=640) by bishmdl76 in 22139
* Trainer: let generate pick its inputs by gante in 22108
* Enforce same behavior as PyTorch 2.0 for older versions by sgugger in 22136
* [trainer] fix bug in grad accum with multiple epochs by stas00 in 22098
* [deepspeed docs] Activation Checkpointing by stas00 in 22099
* Remove backend check for torch.compile by sgugger in 22140
* Prepare daily CI for torch 2.0.0 by ydshieh in 22135
* docs: New terms and updates to glossary by MichaelRipa in 21982
* Move `is_pipeline_test_to_skip` to specific model test classes by ydshieh in 21999
* Add ConvNeXT V2 by alaradirik in 21679
* Update 2 doctest expected values for torch 2.0.0 by ydshieh in 22148
* Translation Italian: perf_train_cpu and perf_train_cpu_many by nickprock in 22151
* Fix big model inference for T5 models in float16 by sgugger in 22095
* Create MaskedImageCompletionOutput and fix ViT docs by alaradirik in 22152
* to_pil - don't rescale if int and in range 0-255 by amyeroberts in 22158
* [trainer] add `--optim adamw_torch_fused` for pt-2.0+ by stas00 in 22144
* Revert "Enforce same behavior as PyTorch 2.0 for older versions" by sgugger in 22163

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* abhiwand
* Add BridgeTower model (20775)
* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (21684)
* [WIP] Add BridgeTowerForContrastiveLearning (21964)
* wonhyeongseo
* [i18n-KO] Translated quicktour page to Korean (20946)
* ErwannMillon
* Add VQGAN-CLIP research project (21329)
* NoB0
* [i18n-fr] Translate index page to French (21458)
* [i18n-fr] Translate quicktour page to French (21589)
* jvamvas
* Add X-MOD (20939)
* [WIP] Move X-MOD models to facebook organization (21640)
* susnato
* Add Ernie-M Model to huggingface (21349)
* zinengtang
* Add TVLT (20725)
* andyehrenberg
* add flax whisper implementation (20479)
* check for None forced tokens (21793)
* Flax beam search fix (21857)
* tanreinama
* add GPTSAN model (reopen) (21291)
* jonatanklosko
* Add WhisperTokenizerFast (21222)
* Skylion007
* Apply ruff flake8-comprehensions (21694)
* kiansierra
* Prophetnet batch dimension inversion fix (21870)
* elisim
* [Time-Series] informer model (21099)
* wdp-007
* add new model of MGP-STR (21418)

4.26.1

Not secure

* ESM openfold_utils type hints by ringohoffman in 20544
* Add cPython files in build by sgugger in 21372
* Fix T5 inference in float16 + bnb error by younesbelkada in 21281
* Fix import in Accelerate for find_exec_bs by sgugger in 21501
* Fix inclusion of non py files in package by sgugger in 21546

Page 7 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 7 of 26

4.27.4

4.27.3

4.27.2

4.27.1

4.27.0

4.26.1

Page 7 of 26

Links

Releases