Transformers

Latest version: v4.41.0

Safety actively analyzes 631178 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 26

4.41.0

New models

Phi3

The Phi-3 model was proposed in [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/abs/2404.14219) by Microsoft.

TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a
Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

<img width="1599" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/0f37c6b0-b118-453c-ac64-6e45aa291d0a">

* Phi-3 by gugarosa in https://github.com/huggingface/transformers/pull/30423

JetMoE

JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by [Yikang Shen](https://scholar.google.com.hk/citations?user=qff5rRYAAAAJ) and [MyShell](https://myshell.ai/). JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the [ModuleFormer](https://arxiv.org/abs/2306.04640). Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.

<img width="1559" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/cc83ce99-7a61-4d04-a234-3f68e6c0fafd">

* Add JetMoE model by yikangshen in https://github.com/huggingface/transformers/pull/30005

PaliGemma

PaliGemma is a lightweight open vision-language model (VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199), and based on open components like the [SigLIP vision model](https://arxiv.org/abs/2303.15343) and the [Gemma language model](https://arxiv.org/abs/2403.08295). PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

More than 120 checkpoints are released see the collection [here](https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda) !

<img width="1064" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/23584b9a-6c36-46f5-8700-32f402c0f674">

* Add PaliGemma by molbap in https://github.com/huggingface/transformers/pull/30814

VideoLlava
Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.

💡 Simple baseline, learning united visual representation by alignment before projection
With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously.
🔥 High performance, complementary learning with video and image
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.

<img width="532" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/cLniWc__KECBBesliHKhd.png">

* Add Video Llava by zucchini-nlp in https://github.com/huggingface/transformers/pull/29733

Falcon 2 and FalconVLM:

<img width="1024" alt="image" src="https://falconllm.tii.ae/assets/images/table-1___.jpeg">

Two new models from TII-UAE! They published a [blog-post](https://falconllm.tii.ae/falcon-2.html) with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the `Llava` framework
* Support for Falcon2-11B by Nilabhra in https://github.com/huggingface/transformers/pull/30771
* Support arbitrary processor by ArthurZucker in https://github.com/huggingface/transformers/pull/30875

GGUF `from_pretrained` support

<img width="1064" alt="image" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/gguf-spec.png">

You can now load most of the GGUF quants directly with transformers' `from_pretrained` to convert it to a classic pytorch model. The API is simple:

python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: https://github.com/huggingface/transformers/issues/27712 for more details

* Loading GGUF files support by LysandreJik in https://github.com/huggingface/transformers/pull/30391

Quantization

New quant methods

In this release we support new quantization methods: HQQ & EETQ contributed by the community. Read more about how to quantize any transformers model using HQQ & EETQ in the [dedicated documentation section](https://huggingface.co/docs/transformers/quantization)

* Add HQQ quantization support by mobicham in https://github.com/huggingface/transformers/pull/29637
* [FEAT]: EETQ quantizer support by dtlzhuangz in https://github.com/huggingface/transformers/pull/30262

`dequantize` API for bitsandbytes models

In case you want to dequantize models that have been loaded with bitsandbytes, this is now possible through the `dequantize` API (e.g. to merge adapter weights)

* FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models by younesbelkada in https://github.com/huggingface/transformers/pull/30806

API-wise, you can achieve that with the following:

python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))

Generation updates

* Add Watermarking LogitsProcessor and WatermarkDetector by zucchini-nlp in https://github.com/huggingface/transformers/pull/29676
* Cache: Static cache as a standalone object by gante in https://github.com/huggingface/transformers/pull/30476
* Make `Gemma` work with `torch.compile` by ydshieh in https://github.com/huggingface/transformers/pull/30775

SDPA support

* [`BERT`] Add support for sdpa by hackyon in https://github.com/huggingface/transformers/pull/28802
* Add sdpa and fa2 the Wav2vec2 family. by kamilakesbi in https://github.com/huggingface/transformers/pull/30121
* add sdpa to ViT [follow up of 29325] by hyenal in https://github.com/huggingface/transformers/pull/30555

Improved Object Detection

Addition of fine-tuning script for object detection models

* Fix YOLOS image processor resizing by qubvel in https://github.com/huggingface/transformers/pull/30436
* Add examples for detection models finetuning by qubvel in https://github.com/huggingface/transformers/pull/30422
* Add installation of examples requirements in CI by qubvel in https://github.com/huggingface/transformers/pull/30708
* Update object detection guide by qubvel in https://github.com/huggingface/transformers/pull/30683

Interpolation of embeddings for vision models

Add interpolation of embeddings. This enables predictions from pretrained models on input images of sizes different than those the model was originally trained on. Simply pass `interpolate_pos_embedding=True` when calling the model.

Added for: BLIP, BLIP 2, InstructBLIP, SigLIP, ViViT

py
import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

image = Image.open(requests.get("https://huggingface.co/hf-internal-testing/blip-test-image/resolve/main/demo.jpg", stream=True).raw)
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b",
torch_dtype=torch.float16
).to("cuda")
inputs = processor(images=image, size={"height": 500, "width": 500}, return_tensors="pt").to("cuda")

predictions = model(**inputs, interpolate_pos_encoding=True)
Generated text: "a woman and dog on the beach"
generated_text = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()

* Blip dynamic input resolution by zafstojano in https://github.com/huggingface/transformers/pull/30722
* Add dynamic resolution input/interpolate position embedding to SigLIP by davidgxue in https://github.com/huggingface/transformers/pull/30719
* Enable dynamic resolution for vivit by jla524 in https://github.com/huggingface/transformers/pull/30630

🚨 might be breaking
* 🚨🚨🚨Deprecate `evaluation_strategy` to `eval_strategy`🚨🚨🚨 by muellerzr in https://github.com/huggingface/transformers/pull/30190
* 🚨 Add training compatibility for Musicgen-like models by ylacombe in https://github.com/huggingface/transformers/pull/29802
* 🚨 Update image_processing_vitmatte.py by rb-synth in https://github.com/huggingface/transformers/pull/30566

Cleanups
* Remove task guides auto-update in favor of links towards task pages by LysandreJik in https://github.com/huggingface/transformers/pull/30429
* Remove add-new-model in favor of add-new-model-like by LysandreJik in https://github.com/huggingface/transformers/pull/30424
* Remove mentions of models in the READMEs and link to the documentation page in which they are featured. by LysandreJik in https://github.com/huggingface/transformers/pull/30420

Not breaking but important for Llama tokenizers
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881

Fixes

* Fix missing `prev_ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30313
* Fix: remove `pad token id` in pipeline forward arguments by zucchini-nlp in https://github.com/huggingface/transformers/pull/30285
* fix Parameter dtype in audio models by ylacombe in https://github.com/huggingface/transformers/pull/30310
* disable use_cache if using gradient checkpointing by chenzizhao in https://github.com/huggingface/transformers/pull/30320
* Fix test transposing image with EXIF Orientation tag by albertvillanova in https://github.com/huggingface/transformers/pull/30319
* Avoid `jnp` import in `utils/generic.py` by ydshieh in https://github.com/huggingface/transformers/pull/30322
* Fix `AssertionError` in clip conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30321
* [UDOP] Add special tokens to tokenizer by NielsRogge in https://github.com/huggingface/transformers/pull/29594
* Enable multi-device for some models by jla524 in https://github.com/huggingface/transformers/pull/30207
* feat: Upgrade Weights & Biases callback by parambharat in https://github.com/huggingface/transformers/pull/30135
* [Feature Extractors] Fix kwargs to pre-trained by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30260
* Pipeline: fix `pad_token_id` again by zucchini-nlp in https://github.com/huggingface/transformers/pull/30338
* [Whisper] Fix slow tests by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30152
* parallel job limit for doctest by ydshieh in https://github.com/huggingface/transformers/pull/30342
* Transformers Metadata by LysandreJik in https://github.com/huggingface/transformers/pull/30344
* Deprecate default chat templates by Rocketknight1 in https://github.com/huggingface/transformers/pull/30346
* Restore casting of masked_spec_embed by ylacombe in https://github.com/huggingface/transformers/pull/30336
* Update unwrap from accelerate by SunMarc in https://github.com/huggingface/transformers/pull/29933
* Do not remove half seq length in generation tests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30016
* Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained by hiyouga in https://github.com/huggingface/transformers/pull/30299
* Add TF swiftformer by joaocmd in https://github.com/huggingface/transformers/pull/23342
* [Grounding DINO] Add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30232
* Nits for model docs by merveenoyan in https://github.com/huggingface/transformers/pull/29795
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30379
* GenerationConfig: warn if pad token is negative by zucchini-nlp in https://github.com/huggingface/transformers/pull/30187
* Add FSDP config for CPU RAM efficient loading through accelerate by helloworld1 in https://github.com/huggingface/transformers/pull/30002
* `Llama` family, fix `use_cache=False` generation by ArthurZucker in https://github.com/huggingface/transformers/pull/30380
* Update docstrings for text generation pipeline by Rocketknight1 in https://github.com/huggingface/transformers/pull/30343
* Terminator strings for generate() by Rocketknight1 in https://github.com/huggingface/transformers/pull/28932
* Fix layerwise GaLore optimizer hard to converge with warmup scheduler by hiyouga in https://github.com/huggingface/transformers/pull/30372
* Jamba: fix left-padding test by gante in https://github.com/huggingface/transformers/pull/30389
* Fix DETA save_pretrained by qubvel in https://github.com/huggingface/transformers/pull/30326
* FIX / PEFT: Pass device correctly to peft by younesbelkada in https://github.com/huggingface/transformers/pull/30397
* [docs] LLM inference by stevhliu in https://github.com/huggingface/transformers/pull/29791
* show `-rs` to show skip reasons by ArthurZucker in https://github.com/huggingface/transformers/pull/30318
* Add inputs embeds in generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30269
* [Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention by EduardoPach in https://github.com/huggingface/transformers/pull/30364
* remove redundant logging from longformer by riklopfer in https://github.com/huggingface/transformers/pull/30365
* fix: link to HF repo/tree/revision when a file is missing by mapmeld in https://github.com/huggingface/transformers/pull/30406
* [tests] add `require_torch_sdpa` for test that needs sdpa support by faaany in https://github.com/huggingface/transformers/pull/30408
* Jax: scipy version pin by gante in https://github.com/huggingface/transformers/pull/30402
* Fix on "cache position" for assisted generation by zucchini-nlp in https://github.com/huggingface/transformers/pull/30068
* fix for itemsize => element_size() for torch backwards compat by winglian in https://github.com/huggingface/transformers/pull/30133
* Make EosTokenCriteria compatible with mps by pcuenca in https://github.com/huggingface/transformers/pull/30376
* FIX: re-add bnb on docker image by younesbelkada in https://github.com/huggingface/transformers/pull/30427
* Fix LayoutLMv2 init issue and doctest by ydshieh in https://github.com/huggingface/transformers/pull/30278
* Remove old TF port docs by Rocketknight1 in https://github.com/huggingface/transformers/pull/30426
* Rename torch.run to torchrun by steven-basart in https://github.com/huggingface/transformers/pull/30405
* Fix use_cache for xla fsdp by alanwaketan in https://github.com/huggingface/transformers/pull/30353
* [`LlamaTokenizerFast`] Refactor default llama by ArthurZucker in https://github.com/huggingface/transformers/pull/28881
* New model PR needs green (slow tests) CI by ydshieh in https://github.com/huggingface/transformers/pull/30341
* Add llama3 by ArthurZucker in https://github.com/huggingface/transformers/pull/30334
* [`Llava`] + CIs fix red cis and llava integration tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30440
* [tests] make test device-agnostic by faaany in https://github.com/huggingface/transformers/pull/30444
* fix uncaught init of linear layer in clip's/siglip's for image classification models by vasqu in https://github.com/huggingface/transformers/pull/30435
* fix jamba slow foward for multi-gpu by SunMarc in https://github.com/huggingface/transformers/pull/30418
* [SegGPT] Fix loss calculation by EduardoPach in https://github.com/huggingface/transformers/pull/30421
* Add `paths` filter to avoid the chance of being triggered by ydshieh in https://github.com/huggingface/transformers/pull/30453
* Fix wrong indent in `utils/check_if_new_model_added.py` by ydshieh in https://github.com/huggingface/transformers/pull/30456
* [`research_project`] Most of the security issues come from this requirement.txt by ArthurZucker in https://github.com/huggingface/transformers/pull/29977
* Neuron: When save_safetensor=False, no need to move model to CPU by jeffhataws in https://github.com/huggingface/transformers/pull/29703
* Enable fp16 on CPU by muellerzr in https://github.com/huggingface/transformers/pull/30459
* Non blocking support to torch DL's by muellerzr in https://github.com/huggingface/transformers/pull/30465
* consistent job / pytest report / artifact name correspondence by ydshieh in https://github.com/huggingface/transformers/pull/30392
* Workflow / ENH: Add SSH into our runners workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30425
* FIX / Workflow: Change tailscale trigger condition by younesbelkada in https://github.com/huggingface/transformers/pull/30471
* FIX / Workflow: Fix SSH workflow bug by younesbelkada in https://github.com/huggingface/transformers/pull/30474
* [fix codellama conversion] by ArthurZucker in https://github.com/huggingface/transformers/pull/30472
* Script for finding candidate models for deprecation by amyeroberts in https://github.com/huggingface/transformers/pull/29686
* Fix SigLip classification doctest by amyeroberts in https://github.com/huggingface/transformers/pull/30475
* Don't run fp16 MusicGen tests on CPU by amyeroberts in https://github.com/huggingface/transformers/pull/30466
* Prevent crash with `WandbCallback` with third parties by tomaarsen in https://github.com/huggingface/transformers/pull/30477
* Add WSD scheduler by visheratin in https://github.com/huggingface/transformers/pull/30231
* Fix Issue 29817 Video Classification Task Guide Using Undeclared Variables by manju-rangam in https://github.com/huggingface/transformers/pull/30457
* Make accelerate install non-torch dependent by muellerzr in https://github.com/huggingface/transformers/pull/30463
* Introduce Stateful Callbacks by muellerzr in https://github.com/huggingface/transformers/pull/29666
* Fix Llava for 0-embeddings by zucchini-nlp in https://github.com/huggingface/transformers/pull/30473
* Do not use deprecated `SourceFileLoader.load_module()` in dynamic module loading by XuehaiPan in https://github.com/huggingface/transformers/pull/30370
* Add sidebar tutorial for chat models by Rocketknight1 in https://github.com/huggingface/transformers/pull/30401
* Quantization: `HfQuantizer` quant method update by younesbelkada in https://github.com/huggingface/transformers/pull/30484
* [docs] Spanish translation of pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30252
* FEAT: PEFT support for EETQ by younesbelkada in https://github.com/huggingface/transformers/pull/30449
* Fix the `bitsandbytes` error formatting ("Some modules are dispatched on ...") by kyo-takano in https://github.com/huggingface/transformers/pull/30494
* Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types by mgoin in https://github.com/huggingface/transformers/pull/30488
* Use the Keras set_random_seed in tests by Rocketknight1 in https://github.com/huggingface/transformers/pull/30504
* Remove skipping logic now that set_epoch exists by muellerzr in https://github.com/huggingface/transformers/pull/30501
* [`DETR`] Remove timm hardcoded logic in modeling files by amyeroberts in https://github.com/huggingface/transformers/pull/29038
* [examples] update whisper fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/29938
* Fix GroundingDINO, DPR after BERT SDPA update by amyeroberts in https://github.com/huggingface/transformers/pull/30506
* load_image - decode b64encode and encodebytes strings by amyeroberts in https://github.com/huggingface/transformers/pull/30192
* [SegGPT] Fix seggpt image processor by EduardoPach in https://github.com/huggingface/transformers/pull/29550
* Fix link in dbrx.md by eitanturok in https://github.com/huggingface/transformers/pull/30509
* Allow boolean FSDP options in fsdp_config by helloworld1 in https://github.com/huggingface/transformers/pull/30439
* Pass attn_implementation when using AutoXXX.from_config by amyeroberts in https://github.com/huggingface/transformers/pull/30507
* Fix broken link to Transformers notebooks by clinty in https://github.com/huggingface/transformers/pull/30512
* Update runner tag for PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30535
* Fix repo. fetch/checkout in PR slow CI job by ydshieh in https://github.com/huggingface/transformers/pull/30537
* Reenable SDPA's FA2 During Training with torch.compile by warner-benjamin in https://github.com/huggingface/transformers/pull/30442
* Include safetensors as part of `_load_best_model` by muellerzr in https://github.com/huggingface/transformers/pull/30553
* Pass `use_cache` in kwargs for GPTNeoX by zucchini-nlp in https://github.com/huggingface/transformers/pull/30538
* Enable multi-device for more models by jla524 in https://github.com/huggingface/transformers/pull/30409
* Generate: update links on LLM tutorial doc by gante in https://github.com/huggingface/transformers/pull/30550
* DBRX: make fixup by gante in https://github.com/huggingface/transformers/pull/30578
* Fix seq2seq collator padding by vasqu in https://github.com/huggingface/transformers/pull/30556
* BlipModel: get_multimodal_features method by XavierSpycy in https://github.com/huggingface/transformers/pull/30438
* Add chat templating support for KeyDataset in text-generation pipeline by DarshanDeshpande in https://github.com/huggingface/transformers/pull/30558
* Fix generation doctests by zucchini-nlp in https://github.com/huggingface/transformers/pull/30263
* General PR slow CI by ydshieh in https://github.com/huggingface/transformers/pull/30540
* Remove `use_square_size` after loading by ydshieh in https://github.com/huggingface/transformers/pull/30567
* Use text config's vocab size in testing models by zucchini-nlp in https://github.com/huggingface/transformers/pull/30568
* Encoder-decoder models: move embedding scale to nn.Module by zucchini-nlp in https://github.com/huggingface/transformers/pull/30410
* Fix Marian model conversion by zucchini-nlp in https://github.com/huggingface/transformers/pull/30173
* Refactor default chat template warnings by Rocketknight1 in https://github.com/huggingface/transformers/pull/30551
* Fix QA example by Rocketknight1 in https://github.com/huggingface/transformers/pull/30580
* remove jax example by ArthurZucker in https://github.com/huggingface/transformers/pull/30498
* Fix canonical model --model_type in examples by amyeroberts in https://github.com/huggingface/transformers/pull/30480
* Gemma: update activation warning by pcuenca in https://github.com/huggingface/transformers/pull/29995
* Bump gitpython from 3.1.32 to 3.1.41 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30587
* Fix image segmentation example - don't reopen image by amyeroberts in https://github.com/huggingface/transformers/pull/30481
* Improve object detection task guideline by NielsRogge in https://github.com/huggingface/transformers/pull/29967
* Generate: remove deprecated public decoding functions and streamline logic 🧼 by gante in https://github.com/huggingface/transformers/pull/29956
* Fix llava half precision and autocast issues by frasermince in https://github.com/huggingface/transformers/pull/29721
* Fix: failing CI after 30568 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30599
* Fix for Neuron by michaelbenayoun in https://github.com/huggingface/transformers/pull/30259
* Fix memory leak with CTC training script on Chinese languages by lucky-bai in https://github.com/huggingface/transformers/pull/30358
* Fix copies for DBRX - neuron fix by amyeroberts in https://github.com/huggingface/transformers/pull/30610
* fix:missing `output_router_logits` in SwitchTransformers by lausannel in https://github.com/huggingface/transformers/pull/30573
* Use `contiguous()` in clip checkpoint conversion script by ydshieh in https://github.com/huggingface/transformers/pull/30613
* phi3 chat_template does not support system role by amitportnoy in https://github.com/huggingface/transformers/pull/30606
* Docs: fix `generate`-related rendering issues by gante in https://github.com/huggingface/transformers/pull/30600
* Docs: add missing `StoppingCriteria` autodocs by gante in https://github.com/huggingface/transformers/pull/30617
* Generate: fix `SinkCache` on Llama models by gante in https://github.com/huggingface/transformers/pull/30581
* Fix FX tracing issues for Llama by michaelbenayoun in https://github.com/huggingface/transformers/pull/30619
* Output `None` as attention when layer is skipped by jonghwanhyeon in https://github.com/huggingface/transformers/pull/30597
* Fix CI after 30410 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30612
* add mlp bias for llama models by mayank31398 in https://github.com/huggingface/transformers/pull/30031
* Fix W&B run name by qubvel in https://github.com/huggingface/transformers/pull/30462
* HQQ: PEFT support for HQQ by younesbelkada in https://github.com/huggingface/transformers/pull/30632
* Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters by yting27 in https://github.com/huggingface/transformers/pull/30362
* Avoid duplication in PR slow CI model list by ydshieh in https://github.com/huggingface/transformers/pull/30634
* [`CI update`] Try to use dockers and no cache by ArthurZucker in https://github.com/huggingface/transformers/pull/29202
* Check if the current compiled version of pytorch supports MPS by jiaqianjing in https://github.com/huggingface/transformers/pull/30664
* Hotfix-change-ci by ArthurZucker in https://github.com/huggingface/transformers/pull/30669
* Quantization / HQQ: Fix HQQ tests on our runner by younesbelkada in https://github.com/huggingface/transformers/pull/30668
* Fix llava next tie_word_embeddings config by SunMarc in https://github.com/huggingface/transformers/pull/30640
* Trainer._load_from_checkpoint - support loading multiple Peft adapters by claralp in https://github.com/huggingface/transformers/pull/30505
* Trainer - add cache clearing and the option for batched eval metrics computation by FoamoftheSea in https://github.com/huggingface/transformers/pull/28769
* Fix typo: llama3.md by mimbres in https://github.com/huggingface/transformers/pull/30653
* Respect `resume_download` deprecation by Wauplin in https://github.com/huggingface/transformers/pull/30620
* top-k instead of top-p in MixtralConfig docstring by sorgfresser in https://github.com/huggingface/transformers/pull/30687
* Bump jinja2 from 3.1.3 to 3.1.4 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30680
* Bump werkzeug from 3.0.1 to 3.0.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30679
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True by hackyon in https://github.com/huggingface/transformers/pull/29024
* Fix `cache_position` initialisation for generation with `use_cache=False` by nurlanov-zh in https://github.com/huggingface/transformers/pull/30485
* Word-level timestamps broken for short-form audio by kamilakesbi in https://github.com/huggingface/transformers/pull/30325
* Updated docs of `forward` in `Idefics2ForConditionalGeneration` with correct `ignore_index` value by zafstojano in https://github.com/huggingface/transformers/pull/30678
* Bump tqdm from 4.63.0 to 4.66.3 in /examples/research_projects/decision_transformer by dependabot in https://github.com/huggingface/transformers/pull/30646
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/visual_bert by dependabot in https://github.com/huggingface/transformers/pull/30645
* Reboot Agents by aymeric-roucher in https://github.com/huggingface/transformers/pull/30387
* Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/lxmert by dependabot in https://github.com/huggingface/transformers/pull/30644
* Separate tokenizer tests by ArthurZucker in https://github.com/huggingface/transformers/pull/30675
* Update `workflow_id` in `utils/get_previous_daily_ci.py` by ydshieh in https://github.com/huggingface/transformers/pull/30695
* Rename artifact name `prev_ci_results` to `ci_results` by ydshieh in https://github.com/huggingface/transformers/pull/30697
* Add safetensors to model not found error msg for default use_safetensors value by davidgxue in https://github.com/huggingface/transformers/pull/30602
* Pin deepspeed by muellerzr in https://github.com/huggingface/transformers/pull/30701
* Patch CLIP image preprocessor by rootonchair in https://github.com/huggingface/transformers/pull/30698
* [BitsandBytes] Verify if GPU is available by NielsRogge in https://github.com/huggingface/transformers/pull/30533
* Llava: remove dummy labels by zucchini-nlp in https://github.com/huggingface/transformers/pull/30706
* Immutability for data collators by vasqu in https://github.com/huggingface/transformers/pull/30603
* Cache: models return input cache type by gante in https://github.com/huggingface/transformers/pull/30716
* Removal of deprecated maps by LysandreJik in https://github.com/huggingface/transformers/pull/30576
* Generate: add `min_p` sampling by gante in https://github.com/huggingface/transformers/pull/30639
* Fix image post-processing for OWLv2 by jla524 in https://github.com/huggingface/transformers/pull/30686
* KV cache is no longer a model attribute by zucchini-nlp in https://github.com/huggingface/transformers/pull/30730
* Generate: consistently handle special tokens as tensors by gante in https://github.com/huggingface/transformers/pull/30624
* Update CodeLlama references by osanseviero in https://github.com/huggingface/transformers/pull/30218
* [docs] Update es/pipeline_tutorial.md by aaronjimv in https://github.com/huggingface/transformers/pull/30684
* Update llama3.md, fix typo by mimbres in https://github.com/huggingface/transformers/pull/30739
* mlp_only_layers is more flexible than decoder_sparse_step by eigen2017 in https://github.com/huggingface/transformers/pull/30552
* PEFT / Trainer: Make use of `model.active_adapters()` instead of deprecated `model.active_adapter` whenever possible by younesbelkada in https://github.com/huggingface/transformers/pull/30738
* [docs] Update link in es/pipeline_webserver.md by aaronjimv in https://github.com/huggingface/transformers/pull/30745
* hqq - fix weight check in check_quantized_param by mobicham in https://github.com/huggingface/transformers/pull/30748
* [awq] replace scale when we have GELU by SunMarc in https://github.com/huggingface/transformers/pull/30074
* Workflow: Replace `actions/post-slack` with centrally defined workflow by younesbelkada in https://github.com/huggingface/transformers/pull/30737
* [GroundingDino] Adding ms_deform_attn kernels by EduardoPach in https://github.com/huggingface/transformers/pull/30768
* Llama: fix custom 4D masks, v2 by poedator in https://github.com/huggingface/transformers/pull/30348
* Generation / FIX: Fix multi-device generation by younesbelkada in https://github.com/huggingface/transformers/pull/30746
* Qwen: incorrect setup flag by gante in https://github.com/huggingface/transformers/pull/30776
* enable Pipeline to get device from model by faaany in https://github.com/huggingface/transformers/pull/30534
* [Object detection pipeline] Lower threshold by NielsRogge in https://github.com/huggingface/transformers/pull/30710
* Generate: remove near-duplicate sample/greedy copy by gante in https://github.com/huggingface/transformers/pull/30773
* Port IDEFICS to tensorflow by a8nova in https://github.com/huggingface/transformers/pull/26870
* Generate: assistant should be greedy in assisted decoding by gante in https://github.com/huggingface/transformers/pull/30778
* Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) by ydshieh in https://github.com/huggingface/transformers/pull/30699
* Deprecate models script by amyeroberts in https://github.com/huggingface/transformers/pull/30184
* skip low_cpu_mem_usage tests by SunMarc in https://github.com/huggingface/transformers/pull/30782
* CI: update to ROCm 6.0.2 and test MI300 by fxmarty in https://github.com/huggingface/transformers/pull/30266
* Fix OWLv2 Doc by jla524 in https://github.com/huggingface/transformers/pull/30794
* Fix cache type in Idefics2 by zucchini-nlp in https://github.com/huggingface/transformers/pull/30729
* PEFT: Access active_adapters as a property in Trainer by pashminacameron in https://github.com/huggingface/transformers/pull/30790
* CI: more models wo cache support by gante in https://github.com/huggingface/transformers/pull/30780
* Deprecate TF weight conversion since we have full Safetensors support now by Rocketknight1 in https://github.com/huggingface/transformers/pull/30786
* [T5] Adding `model_parallel = False` to `T5ForTokenClassification` and `MT5ForTokenClassification` by retarfi in https://github.com/huggingface/transformers/pull/30763
* Added the necessay import of module by ankur0904 in https://github.com/huggingface/transformers/pull/30804
* Add support for custom checkpoints in MusicGen by jla524 in https://github.com/huggingface/transformers/pull/30011
* Add missing dependencies in image classification example by jla524 in https://github.com/huggingface/transformers/pull/30820
* Support mixed-language batches in `WhisperGenerationMixin` by cifkao in https://github.com/huggingface/transformers/pull/29688
* Remove unused module DETR based models by conditionedstimulus in https://github.com/huggingface/transformers/pull/30823
* Jamba - Skip 4d custom attention mask test by amyeroberts in https://github.com/huggingface/transformers/pull/30826
* Missing `Optional` in typing. by xkszltl in https://github.com/huggingface/transformers/pull/30821
* Update ds_config_zero3.json by pacman100 in https://github.com/huggingface/transformers/pull/30829
* Better llava next. by nxphi47 in https://github.com/huggingface/transformers/pull/29850
* Deprecate models script - correctly set the model name for the doc file by amyeroberts in https://github.com/huggingface/transformers/pull/30785
* Use `torch 2.3` for CI by ydshieh in https://github.com/huggingface/transformers/pull/30837
* Fix llama model sdpa attention forward function masking bug when output_attentions=True by Aladoro in https://github.com/huggingface/transformers/pull/30652
* [LLaVa-NeXT] Small fixes by NielsRogge in https://github.com/huggingface/transformers/pull/30841
* [Idefics2] Improve docs, add resources by NielsRogge in https://github.com/huggingface/transformers/pull/30717
* Cache: add new flag to distinguish models that `Cache` but not static cache by gante in https://github.com/huggingface/transformers/pull/30800
* Disable the FA backend for SDPA on AMD GPUs by mht-sharma in https://github.com/huggingface/transformers/pull/30850
* Video-LLaVa: Fix docs by zucchini-nlp in https://github.com/huggingface/transformers/pull/30855
* Docs: update example with assisted generation + sample by gante in https://github.com/huggingface/transformers/pull/30853
* TST / Quantization: Reverting to torch==2.2.1 by younesbelkada in https://github.com/huggingface/transformers/pull/30866
* Fix VideoLlava imports by amyeroberts in https://github.com/huggingface/transformers/pull/30867
* TEST: Add llama logits tests by younesbelkada in https://github.com/huggingface/transformers/pull/30835
* Remove deprecated logic and warnings by amyeroberts in https://github.com/huggingface/transformers/pull/30743
* Enable device map by darshana1406 in https://github.com/huggingface/transformers/pull/30870
* Fix dependencies for image classification example by jla524 in https://github.com/huggingface/transformers/pull/30842
* [whisper] fix multilingual fine-tuning by sanchit-gandhi in https://github.com/huggingface/transformers/pull/30865
* update release script by ArthurZucker in https://github.com/huggingface/transformers/pull/30880

New Contributors
* joaocmd made their first contribution in https://github.com/huggingface/transformers/pull/23342
* kamilakesbi made their first contribution in https://github.com/huggingface/transformers/pull/30121
* dtlzhuangz made their first contribution in https://github.com/huggingface/transformers/pull/30262
* steven-basart made their first contribution in https://github.com/huggingface/transformers/pull/30405
* manju-rangam made their first contribution in https://github.com/huggingface/transformers/pull/30457
* kyo-takano made their first contribution in https://github.com/huggingface/transformers/pull/30494
* mgoin made their first contribution in https://github.com/huggingface/transformers/pull/30488
* eitanturok made their first contribution in https://github.com/huggingface/transformers/pull/30509
* clinty made their first contribution in https://github.com/huggingface/transformers/pull/30512
* warner-benjamin made their first contribution in https://github.com/huggingface/transformers/pull/30442
* XavierSpycy made their first contribution in https://github.com/huggingface/transformers/pull/30438
* DarshanDeshpande made their first contribution in https://github.com/huggingface/transformers/pull/30558
* frasermince made their first contribution in https://github.com/huggingface/transformers/pull/29721
* lucky-bai made their first contribution in https://github.com/huggingface/transformers/pull/30358
* rb-synth made their first contribution in https://github.com/huggingface/transformers/pull/30566
* lausannel made their first contribution in https://github.com/huggingface/transformers/pull/30573
* jonghwanhyeon made their first contribution in https://github.com/huggingface/transformers/pull/30597
* mobicham made their first contribution in https://github.com/huggingface/transformers/pull/29637
* yting27 made their first contribution in https://github.com/huggingface/transformers/pull/30362
* jiaqianjing made their first contribution in https://github.com/huggingface/transformers/pull/30664
* claralp made their first contribution in https://github.com/huggingface/transformers/pull/30505
* mimbres made their first contribution in https://github.com/huggingface/transformers/pull/30653
* sorgfresser made their first contribution in https://github.com/huggingface/transformers/pull/30687
* nurlanov-zh made their first contribution in https://github.com/huggingface/transformers/pull/30485
* zafstojano made their first contribution in https://github.com/huggingface/transformers/pull/30678
* davidgxue made their first contribution in https://github.com/huggingface/transformers/pull/30602
* rootonchair made their first contribution in https://github.com/huggingface/transformers/pull/30698
* eigen2017 made their first contribution in https://github.com/huggingface/transformers/pull/30552
* Nilabhra made their first contribution in https://github.com/huggingface/transformers/pull/30771
* a8nova made their first contribution in https://github.com/huggingface/transformers/pull/26870
* pashminacameron made their first contribution in https://github.com/huggingface/transformers/pull/30790
* retarfi made their first contribution in https://github.com/huggingface/transformers/pull/30763
* yikangshen made their first contribution in https://github.com/huggingface/transformers/pull/30005
* ankur0904 made their first contribution in https://github.com/huggingface/transformers/pull/30804
* conditionedstimulus made their first contribution in https://github.com/huggingface/transformers/pull/30823
* nxphi47 made their first contribution in https://github.com/huggingface/transformers/pull/29850
* Aladoro made their first contribution in https://github.com/huggingface/transformers/pull/30652
* hyenal made their first contribution in https://github.com/huggingface/transformers/pull/30555
* darshana1406 made their first contribution in https://github.com/huggingface/transformers/pull/30870

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.40.2...v4.41.0

4.40.2

Not secure

Fix torch fx for LLama model
- Fix for Neuron (30259)
- Fix copies for DBRX - neuron fix (30610)

Thanks michaelbenayoun !

4.40.1

Not secure

Kudos to pcuenca for the prompt fix in:

- Make EosTokenCriteria compatible with mps 30376

To support `EosTokenCriteria` on MPS while `pytorch` adds this functionality.

4.40.0

Not secure

New model additions

Llama 3

Llama 3 is supported in this release through the Llama 2 architecture and some fixes in the `tokenizers` library.

Idefics2

<img src="https://huggingface.co/HuggingFaceM4/idefics-80b/resolve/main/assets/IDEFICS.png"
alt="drawing" width="300"/>

The Idefics2 model was created by the Hugging Face M4 team and authored by Léo Tronchon, Hugo Laurencon, Victor Sanh. The accompanying blog post can be found here.

Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. It improves upon IDEFICS-1, notably on document understanding, OCR, or visual reasoning. Idefics2 is lightweight (8 billion parameters) and treats images in their native aspect ratio and resolution, which allows for varying inference efficiency.

* Add Idefics2 by amyeroberts in 30253

Recurrent Gemma

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/recurrent-gemma.png"
alt="drawing" width="600"/>

<small> Recurrent Gemma architecture. Taken from the <a href="https://arxiv.org/pdf/2402.19427.pdf">original paper.</a> </small>

The Recurrent Gemma model was proposed in RecurrentGemma: Moving Past Transformers for Efficient Open Language Models by the Griffin, RLHF and Gemma Teams of Google.

The abstract from the paper is the following:

We introduce RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.

* Add recurrent gemma by ArthurZucker in 30143

Jamba

Jamba is a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and an overall of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.

As depicted in the diagram below, Jamba’s architecture features a blocks-and-layers approach that allows Jamba to successfully integrate Transformer and Mamba architectures altogether. Each Jamba block contains either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP), producing an overall ratio of one Transformer layer out of every eight total layers.

![image](https://github.com/huggingface/transformers/assets/48595927/d78bb917-7a8a-4959-8206-e493c6c75f3d)

Jamba introduces the first `HybridCache` object that allows it to natively support assisted generation, contrastive search, speculative decoding, beam search and all of the awesome features from the `generate` API!

* Add jamba by tomeras91 in 29943

DBRX

DBRX is a [transformer-based](https://www.isattentionallyouneed.com/) decoder-only large language model (LLM) that was trained using next-token prediction. It uses a *fine-grained* mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input.

It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2.

This provides 65x more possible combinations of experts and the authors found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).

* Add DBRX Model by abhi-mosaic in 29921

OLMo

The OLMo model was proposed in OLMo: Accelerating the Science of Language Models by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.

OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs (coming soon), and details involved in training these models.

* Add OLMo model family by 2015aroras in 29890

Qwen2MoE

Qwen2MoE is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.

Model Details
Qwen2MoE is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. Qwen2MoE has the following architectural choices:

Qwen2MoE is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
Qwen2MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. For instance, Qwen1.5-MoE-A2.7B is upcycled from Qwen-1.8B. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while it achieves comparable performance with Qwen1.5-7B, with only 25% of the training resources.

* Add Qwen2MoE by bozheng-hit in 29377

Grounding Dino

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grouding_dino_architecture.png"
alt="drawing" width="600"/>

<small> Taken from the <a href="https://arxiv.org/pdf/2303.05499.pdf">original paper.</a> </small>

The Grounding DINO model was proposed in Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang. Grounding DINO extends a closed-set object detection model with a text encoder, enabling open-set object detection. The model achieves remarkable results, such as 52.5 AP on COCO zero-shot.

* Adding grounding dino by EduardoPach in 26087

Static pretrained maps

Static pretrained maps have been removed from the library's internals and are currently deprecated. These used to reflect all the available checkpoints for a given architecture on the Hugging Face Hub, but their presence does not make sense in light of the huge growth of checkpoint shared by the community.

With the objective of lowering the bar of model contributions and reviewing, we first start by removing legacy objects such as this one which do not serve a purpose.

* Remove static pretrained maps from the library's internals by LysandreJik in 29112

Notable improvements

Processors improvements

Processors are ungoing changes in order to uniformize them and make them clearer to use.

* Separate out kwargs in processor by amyeroberts in 30193
* [Processor classes] Update docs by NielsRogge in 29698

SDPA
* re-introduced the fast path for sdpa by fxmarty in 30070

Push to Hub for pipelines

Pipelines can now be pushed to Hub using a convenient `push_to_hub` method.

* add `push_to_hub` to pipeline by not-lain in 29172

Flash Attention 2 for more models (M2M100, NLLB, GPT2, MusicGen) !

Thanks to the community contribution, Flash Attention 2 has been integrated for more architectures

* Adding Flash Attention 2 Support for GPT2 by EduardoPach in 29226
* Add Flash Attention 2 support to Musicgen and Musicgen Melody by ylacombe in 29939
* Add Flash Attention 2 to M2M100 model by visheratin in 30256

Improvements and bugfixes

* [docs] Remove redundant `-` and `the` from custom_tools.md by windsonsea in 29767
* Fixed typo in quantization_config.py by kurokiasahi222 in 29766
* OWL-ViT box_predictor inefficiency issue by RVV-karma in 29712
* Allow `-OO` mode for `docstring_decorator` by matthid in 29689
* fix issue with logit processor during beam search in Flax by giganttheo in 29636
* Fix docker image build for `Latest PyTorch + TensorFlow [dev]` by ydshieh in 29764
* [`LlavaNext`] Fix llava next unsafe imports by ArthurZucker in 29773
* Cast bfloat16 to float32 for Numpy conversions by Rocketknight1 in 29755
* Silence deprecations and use the DataLoaderConfig by muellerzr in 29779
* Add deterministic config to `set_seed` by muellerzr in 29778
* Add support for `torch_dtype` in the run_mlm example by jla524 in 29776
* Generate: remove legacy generation mixin imports by gante in 29782
* Llama: always convert the causal mask in the SDPA code path by gante in 29663
* Prepend `bos token` to Blip generations by zucchini-nlp in 29642
* Change in-place operations to out-of-place in LogitsProcessors by zucchini-nlp in 29680
* [`quality`] update quality check to make sure we check imports 😈 by ArthurZucker in 29771
* Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 by stevemadere in 29738
* Enable AMD docker build CI by IlyasMoutawwakil in 29803
* Correct llava mask & fix missing setter for `vocab_size` by fxmarty in 29389
* rm input dtype change in CPU by jiqing-feng in 28631
* Generate: remove unused attributes in `AssistedCandidateGenerator` by gante in 29787
* replaced concatenation to f-strings to improve readability and unify … by igeni in 29785
* [`cleanup`] vestiges of causal mask by ArthurZucker in 29806
* Complete security policy with mentions of remote code by LysandreJik in 29707
* [`SuperPoint`] Fix doc example by amyeroberts in 29816
* [DOCS] Fix typo for llava next docs by aliencaocao in 29829
* model_summary.md - Restore link to Harvard's Annotated Transformer. by gamepad-coder in 29702
* Fix the behavior of collecting 'num_input_tokens_seen' by YouliangHUANG in 29099
* Populate torch_dtype from model to pipeline by B-Step62 in 28940
* remove quotes in code example by johko in 29812
* Add warnings if training args differ from checkpoint trainer state by jonflynng in 29255
* Replace 'decord' with 'av' in VideoClassificationPipeline by Tyx-main in 29747
* Fix header in IFE task guide by merveenoyan in 29859
* [docs] Indent ordered list in add_new_model.md by windsonsea in 29796
* Allow `bos_token_id is None` during the generation with `inputs_embeds` by LZHgrla in 29772
* Add `cosine_with_min_lr` scheduler in Trainer by liuyanyi in 29341
* Disable AMD memory benchmarks by IlyasMoutawwakil in 29871
* Set custom_container in build docs workflows by Wauplin in 29855
* Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation by bminixhofer in 29557
* Mamba `slow_forward` gradient fix by vasqu in 29563
* Fix 29807, sinusoidal positional encodings overwritten by post_init() by hovnatan in 29813
* Reimplement "Automatic safetensors conversion when lacking these files" by LysandreJik in 29846
* fix fuyu device_map compatibility by SunMarc in 29880
* Move `eos_token_id` to stopping criteria by zucchini-nlp in 29459
* add Cambricon MLUs support by huismiling in 29627
* MixtralSparseMoeBlock: add gate jitter by lorenzoverardo in 29865
* Fix typo in T5Block error message by Mingosnake in 29881
* [`make fix-copies`] update and help by ArthurZucker in 29924
* [`GptNeox`] don't gather on pkv when using the trainer by ArthurZucker in 29892
* [`pipeline`]. Zero shot add doc warning by ArthurZucker in 29845
* [doc] fix some typos and add `xpu` to the testing documentation by faaany in 29894
* Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` by gante in 29915
* Add beam search visualizer to the doc by aymeric-roucher in 29876
* Safe import of LRScheduler by amyeroberts in 29919
* add functions to inspect model and optimizer status to trainer.py by CKeibel in 29838
* RoPE models: add numerical sanity-check test for RoPE scaling by gante in 29808
* [`Mamba`] from pretrained issue with `self.embeddings` by ArthurZucker in 29851
* [ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix by ArthurZucker in 29453
* Allow GradientAccumulationPlugin to be configured from AcceleratorConfig by fabianlim in 29589
* [`BC`] Fix BC for other libraries by ArthurZucker in 29934
* Fix doc issue 29758 in DebertaV2Config class by vinayakkgarg in 29842
* [`LlamaSlowConverter`] Slow to Fast better support by ArthurZucker in 29797
* Update installs in image classification doc by MariaHei in 29947
* [`StableLm`] Add QK normalization and Parallel Residual Support by jon-tow in 29745
* Mark `test_eager_matches_sdpa_generate` flaky for some models by ydshieh in 29479
* Super tiny fix 12 typos about "with with" by fzyzcjy in 29926
* Fix rope theta for OpenLlama by jla524 in 29893
* Add warning message for `run_qa.py` by jla524 in 29867
* fix: get mlflow version from mlflow-skinny by clumsy in 29918
* Reset alarm signal when the function is ended by coldnight in 29706
* Update model card and link of blog post. by bozheng-hit in 29928
* [`BC`] Fix BC for AWQ quant by TechxGenus in 29965
* Rework tests to compare trainer checkpoint args by muellerzr in 29883
* Fix FA2 tests by ylacombe in 29909
* Fix copies main ci by ArthurZucker in 29979
* [tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` by faaany in 29975
* Generate: move misplaced test by gante in 29902
* [docs] Big model loading by stevhliu in 29920
* [`generate`] fix breaking change for patch by ArthurZucker in 29976
* Fix 29807 sinusoidal positional encodings in Flaubert, Informer and XLM by hovnatan in 29904
* [bnb] Fix bug in `_replace_with_bnb_linear` by SunMarc in 29958
* Adding FlaxNoRepeatNGramLogitsProcessor by giganttheo in 29677
* [Docs] Make an ordered list prettier in add_tensorflow_model.md by windsonsea in 29949
* Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` by msublee in 29311
* Hard error when ignoring tensors. by Narsil in 27484)
* Generate: fix logits processors doctests by gante in 29718
* Fix `remove_columns` in `text-classification` example by mariosasko in 29351
* Update `tests/utils/tiny_model_summary.json` by ydshieh in 29941
* Make EncodecModel.decode ONNX exportable by fxmarty in 29913
* Fix Swinv2ForImageClassification NaN output by miguelm-almeida in 29981
* Fix Qwen2Tokenizer by jklj077 in 29929
* Fix `kwargs` handling in `generate_with_fallback` by cifkao in 29225
* Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores by cifkao in 29248
* Fix vipllava for generation by zucchini-nlp in 29874
* [docs] Fix audio file by stevhliu in 30006
* Superpoint imports fix by zucchini-nlp in 29898
* [`Main CIs`] Fix the red cis by ArthurZucker in 30022
* Make clearer about zero_init requirements by muellerzr in 29879
* Enable multi-device for efficientnet by jla524 in 29989
* Add a converter from mamba_ssm -> huggingface mamba by byi8220 in 29705
* [`ProcessingIdefics`] Attention mask bug with padding by byi8220 in 29449
* Add `whisper` to `IMPORTANT_MODELS` by ydshieh in 30046
* skip `test_encode_decode_fast_slow_all_tokens` for now by ydshieh in 30044
* if output is tuple like facebook/hf-seamless-m4t-medium, waveform is … by sywangyi in 29722
* Fix mixtral ONNX Exporter Issue. by AdamLouly in 29858
* [Trainer] Allow passing image processor by NielsRogge in 29896
* [bnb] Fix offload test by SunMarc in 30039
* Update quantizer_bnb_4bit.py: In the ValueError string there should be "....you need to set `llm_int8_enable_fp32_cpu_offload=True`...." instead of "`load_in_8bit_fp32_cpu_offload=True`". by miRx923 in 30013
* [test fetcher] Always include the directly related test files by ydshieh in 30050
* Fix `torch.fx` symbolic tracing for LLama by michaelbenayoun in 30047
* Refactor daily CI workflow by ydshieh in 30012
* Add docstrings and types for MambaCache by koayon in 30023
* Fix auto tests by ydshieh in 30067
* Fix whisper kwargs and generation config by zucchini-nlp in 30018
* doc: Correct spelling mistake by caiyili in 30107
* [Whisper] Computing features on GPU in batch mode for whisper feature extractor. by vaibhavagg303 in 29900
* Change log level to warning for num_train_epochs override by xu-song in 30014
* Make MLFlow version detection more robust and handles mlflow-skinny by helloworld1 in 29957
* updated examples/pytorch/language-modeling scripts and requirements.txt to require datasets>=2.14.0 by Patchwork53 in 30120
* [tests] add `require_bitsandbytes` marker by faaany in 30116
* fixing issue 30034 - adding data format for run_ner.py by JINO-ROHIT in 30088
* Patch fix - don't use safetensors for TF models by amyeroberts in 30118
* [29174] ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1 Fix by UtkarshaGupte in 29888
* Accept token in trainer.push_to_hub() by mapmeld in 30093
* fix learning rate display in trainer when using galore optimizer by vasqu in 30085
* Fix falcon with SDPA, alibi but no passed mask by fxmarty in 30123
* Trainer / Core : Do not change init signature order by younesbelkada in 30126
* Make vitdet jit trace complient by fxmarty in 30065
* Fix typo at ImportError by DrAnaximandre in 30090
* Adding `mps` as device for `Pipeline` class by fnhirwa in 30080
* Fix failing DeepSpeed model zoo tests by pacman100 in 30112
* Add datasets.Dataset to Trainer's train_dataset and eval_dataset type hints by ringohoffman in 30077
* Fix docs Pop2Piano by zucchini-nlp in 30140
* Revert workaround for TF safetensors loading by Rocketknight1 in 30128
* [Trainer] Fix default data collator by NielsRogge in 30142
* [Trainer] Undo 29896 by NielsRogge in 30129
* Fix slow tests for important models to be compatible with A10 runners by ydshieh in 29905
* Send headers when converting safetensors by ydshieh in 30144
* Fix quantization tests by SunMarc in 29914
* [docs] Fix image segmentation guide by stevhliu in 30132
* [CI] Fix setup by SunMarc in 30147
* Fix length related warnings in speculative decoding by zucchini-nlp in 29585
* Fix and simplify semantic-segmentation example by qubvel in 30145
* [CI] Quantization workflow fix by SunMarc in 30158
* [tests] make 2 tests device-agnostic by faaany in 30008
* Add str to TrainingArguments report_to type hint by ringohoffman in 30078
* [UDOP] Fix tests by NielsRogge in 29573
* [UDOP] Improve docs, add resources by NielsRogge in 29571
* Fix accelerate kwargs for versions <0.28.0 by vasqu in 30086
* Fix typing annotation in hf_argparser by xu-song in 30156
* Fixing a bug when MlFlow try to log a torch.tensor by etiennebonnafoux in 29932
* Fix natten install in docker by ydshieh in 30161
* FIX / bnb: fix torch compatiblity issue with `itemize` by younesbelkada in 30162
* Update config class check in auto factory by Rocketknight1 in 29854
* Fixed typo in comments/documentation for Pipelines documentation by DamonGuzman in 30170
* Fix Llava chat template examples by lewtun in 30130
* Guard XLA version imports by muellerzr in 30167
* chore: remove repetitive words by hugehope in 30174
* fix: Fixed `ruff` configuration to avoid deprecated configuration warning by Sai-Suraj-27 in 30179
* Refactor Cohere Model by saurabhdash2512 in 30027
* Update output of SuperPointForKeypointDetection by NielsRogge in 29809
* Falcon: make activation, ffn_hidden_size configurable by sshleifer in 30134
* Docs PR template by stevhliu in 30171
* ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified by younesbelkada in 29235
* Fix pipeline logger.warning_once bug by amyeroberts in 30195
* fix: Replaced deprecated `logger.warn` with `logger.warning` by Sai-Suraj-27 in 30197
* fix typo by mdeff in 30220
* fix fuyu doctest by molbap in 30215
* Fix `RecurrentGemmaIntegrationTest.test_2b_sample` by ydshieh in 30222
* Update modeling_bark.py by bes-dev in 30221
* Fix/Update for doctest by ydshieh in 30216
* Fixed config.json download to go to user-supplied cache directory by ulatekh in 30189
* Add test for parse_json_file and change typing to os.PathLike by xu-song in 30183
* fix: Replace deprecated `assertEquals` with `assertEqual` by Sai-Suraj-27 in 30241
* Set pad_token in run_glue_no_trainer.py 28534 by JINO-ROHIT in 30234
* fix: Replaced deprecated `typing.Text` with `str` by Sai-Suraj-27 in 30230
* Refactor doctest by ydshieh in 30210
* fix: Fixed `type annotation` for compatability with python 3.8 by Sai-Suraj-27 in 30243
* Fix doctest more (for `docs/source/en`) by ydshieh in 30247
* round epoch only in console by xdedss in 30237
* update github actions packages' version to suppress warnings by ydshieh in 30249
* [tests] add the missing `require_torch_multi_gpu` flag by faaany in 30250
* [Docs] Update recurrent_gemma.md for some minor nits by sayakpaul in 30238
* Remove incorrect arg in codellama doctest by Rocketknight1 in 30257
* Update `ko/_toctree.yml` by jungnerd in 30062
* More fixes for doctest by ydshieh in 30265
* FIX: Fix corner-case issue with the important models workflow by younesbelkada in 30212
* FIX: Fix 8-bit serialization tests by younesbelkada in 30051
* Allow for str versions of dicts based on typing by muellerzr in 30227
* Workflow: Update tailscale to release version by younesbelkada in 30268
* Raise relevent err when wrong type is passed in as the accelerator_config by muellerzr in 29997
* BLIP - fix pt-tf equivalence test by amyeroberts in 30258
* fix: Fixed a `raise` statement by Sai-Suraj-27 in 30275
* Fix test fetcher (doctest) + `Idefics2`'s doc example by ydshieh in 30274
* Fix SDPA sliding window compatibility by fxmarty in 30127
* Fix SpeechT5 forward docstrings by ylacombe in 30287
* FIX / AWQ: Fix failing exllama test by younesbelkada in 30288
* Configuring Translation Pipelines documents update 27753 by UtkarshaGupte in 29986
* Enable fx tracing for Mistral by zucchini-nlp in 30209
* Fix test `ExamplesTests::test_run_translation` by ydshieh in 30281
* Fix `Fatal Python error: Bus error` in `ZeroShotAudioClassificationPipelineTests` by ydshieh in 30283
* FIX: Fix push important models CI by younesbelkada in 30291
* Add token type ids to CodeGenTokenizer by st81 in 29265
* Add strategy to store results in evaluation loop by qubvel in 30267
* Upgrading to tokenizers 0.19.0 by Narsil in 30289
* Re-enable SDPA's FA2 path by fxmarty in 30070
* Fix quality Olmo + SDPA by fxmarty in 30302
* Fix donut token2json multiline by qubvel in 30300
* Fix all torch pipeline failures except one by ydshieh in 30290
* Add atol for sliding window test by fxmarty in 30303
* Fix RecurrentGemma device_map by SunMarc in 30273
* Revert "Re-enable SDPA's FA2 path by ArthurZucker in 30070)"
* Do not drop mask with SDPA for more cases by fxmarty in 30311
* FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert 30070 at the same time by younesbelkada in 30317

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* bozheng-hit
* Add Qwen2MoE (29377)
* Update model card and link of blog post. (29928)
* EduardoPach
* Adding Flash Attention 2 Support for GPT2 (29226)
* Adding grounding dino (26087)
* 2015aroras
* Add OLMo model family (29890)
* tomeras91
* Add jamba (29943)
* abhi-mosaic
* Add DBRX Model (29921)

4.39.3

Not secure

The `AWQ` issue persisted, and there was a regression reported with beam search and input embeddings.

Changes
- Fix BC for AWQ quant 29965
- generate fix breaking change for patch 29976

4.39.2

Not secure

Series of fixes for backwards compatibility (AutoAWQ and other quantization libraries, imports from `trainer_pt_utils`) and functionality (LLaMA tokenizer conversion)

* Safe import of LRScheduler 29919
* [`BC`] Fix BC for other libraries 29934
* [`LlamaSlowConverter`] Slow to Fast better support 29797

Page 1 of 26

Releases

Has known vulnerabilities

Transformers

Page 1 of 26

4.41.0

4.40.2

4.40.1

4.40.0

4.39.3

4.39.2

Page 1 of 26

Links

Releases