Transformers

Latest version: v4.41.0

Safety actively analyzes 631143 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 26

4.30.0

Not secure

100k

Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of `transformers` and we have decided to create an [awesome-transformers](https://github.com/huggingface/transformers/blob/main/awesome-transformers.md) page to do just that.

We accept PRs to add projects to the list!

* Top 100 by LysandreJik in 22912
* Add LlamaIndex to awesome-transformers.md by ravi03071991 in 23484
* add cleanlab to awesome-transformers tools list by jwmueller in 23440

4-bit quantization and QLoRA

By leveraging the `bitsandbytes` library by TimDettmers, we add 4-bit support to `transformers` models!

* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 23479

Agents

The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
- Local agent capabilities, to load a generative model directly from `transformers` instead of relying on APIs.
- Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
- We add an `AzureOpenAiAgent` class to support Azure OpenAI agents.

* Add local agent by sgugger in 23438
* Enable prompts on the Hub by sgugger in 23662
* Add AzureOpenAiAgent by sgugger in 24058

Safetensors

The `safetensors` library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).

It has now become a core dependency of `transformers`.

* Making `safetensors` a core dependency. by Narsil in 23254

New models

Swiftformer

The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.

* Add swiftformer by shehanmunasinghe in 22686

Autoformer

This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.

* [Time-Series] Autoformer model by elisim in 21891

MobileViTv2

MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

* Add MobileViTv2 by shehanmunasinghe in 22820

PerSAM

PerSAM proposes a minimal modification to [SAM](https://huggingface.co/docs/transformers/model_doc/sam) to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.

* Add PerSAM [bis] by NielsRogge in 23659

Timm backbone

We add support for loading `timm` weights within the `AutoBackbone` API in `transformers`. `timm` models can be instantiated through the `TimmBackbone` class, and then used with any vision model that needs a backbone.

* Add TimmBackbone model by amyeroberts in 22619

Image to text pipeline conditional support

We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.

* [image-to-text pipeline] Add conditional text support + GIT by NielsRogge in 23362

TensorFlow implementations

* Add TensorFlow implementation of EfficientFormer by D-Roberts in 22620

Accelerate Migration

A major rework of the internals of the `Trainer` is underway, leveraging `accelerate` instead of redefining them in `transformers`. This should unify both framework and lead to increased interoperability and more efficient development.

* Smangrul/accelerate mp integrate by pacman100 in 23148
* Smangrul/accelerate ddp integrate by pacman100 in 23151
* fix trainer slow tests related to hyperparam search by pacman100 in 24011
* remove the extra `accelerator.prepare` by pacman100 in 23914
* move fsdp handling to accelerate by pacman100 in 23158
* shift torch dynamo handling to accelerate by pacman100 in 23168
* accelerate deepspeed and gradient accumulation integrate by pacman100 in 23236
* fix executable batch size issue by pacman100 in 24067
* fix accelerator prepare during eval only mode by pacman100 in 24014
* reset accelerate env variables after each test by pacman100 in 24107
* Fix translation no_trainer by muellerzr in 23407
* Update error message when Accelerate isn't installed by muellerzr in 23373
* Fix parallel mode check by muellerzr in 23409
* Muellerzr fix deepspeed by muellerzr in 23657
* Update all no_trainer with skip_first_batches by muellerzr in 23664
* Fix sagemaker DP/MP by muellerzr in 23681
* Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by muellerzr in 23800
* Up pinned accelerate version by muellerzr in 24089
* Move import check to before state reset by muellerzr in 23906
* Upgrade safetensors version by muellerzr in 23911
* Act on deprecations in Accelerate no_trainer examples by muellerzr in 24053
* Oops, missed one by muellerzr in 24054

Bugfixes and improvements

* chore: allow protobuf 3.20.3 requirement by jose-turintech in 22759
* Fix link displayed for custom tools by sgugger in 23274
* Remove missplaced test file by sgugger in 23275
* Bring back the PR `Refactor doctests + add CI` to `main` by ydshieh in 23271
* [`gpt`] Gpt2 fix half precision causal mask by younesbelkada in 23256
* Temporary tolerance fix for flaky whipser PT-TF equiv. test by amyeroberts in 23257
* Add `top_k` argument to post-process of conditional/deformable-DETR by CreatlV in 22787
* `transformers-cli` -> `huggingface-cli` by AlpinDale in 23276
* Temporarily increase tol for PT-FLAX whisper tests by amyeroberts in 23288
* Added missing " in CHAT_PROMPT_TEMPLATE by galatolofederico in 23287
* Update custom_tools.mdx: fix link by mishig25 in 23292
* Update transformers_agents.mdx by mishig25 in 23289
* Convert numpy arrays to lists before saving the evaluation metrics as json by harisankar95 in 23268
* Fix doctest files fetch issue by ydshieh in 23277
* skip `test_run_squad_no_trainer` for now by ydshieh in 23302
* Better check for packages availability by apbard in 23163
* Add gradient_checkpointing parameter to FlaxWhisperEncoder by raghavanone in 23300
* Agents extras by LysandreJik in 23301
* Fix broken links in the agent docs by sgugger in 23297
* Fix typo in gradio-tools docs by freddyaboulton in 23305
* Fix image segmentation tool test by sgugger in 23306
* unpin tf prob by ydshieh in 23293
* Revert "search buffers for dtype" by sgugger in 23308
* Remove `LanguageIdentificationTool` in `__init__.py` as we don't have it yet by ydshieh in 23326
* Fix docker image (caused by `tensorflow_text`) by ydshieh in 23321
* Compute the mask in-place, with less memory reads, and on CUDA on `XLNetLMHeadModel` by lezcano in 23332
* Only add files with modification outside doc blocks by ydshieh in 23327
* [docs] Fix Agents and Tools docstring by stevhliu in 23313
* OR am I crazy? by hwuebben in 23295
* Handle padding warning in generation when using `inputs_embeds` by zrthxn in 23131
* replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by susnato in 23273
* Use cu118 with cudnn >= 8.6 in docker file by ydshieh in 23339
* Removing one of the twice defined position_embeddings in LongFormer by GregorySenay in 23343
* Fix issue introduced in PR 23163 by ydshieh in 23363
* Typo suggestion by richardachen in 23360
* Fix some `is_xxx_available` by ydshieh in 23365
* Fix `BigBirdForMaskedLM` doctest by ydshieh in 23369
* Fix `OwlViTForObjectDetection.image_guided_detection` doc example by ydshieh in 23370
* Revert "Only add files with modification outside doc blocks" by ydshieh in 23371
* [Bugfix] `OPTDecoderLayer` does not return attentions when `gradient_checkpointing` and `training` is enabled. by gmlwns2000 in 23367
* Skip failing `AlignModelTest::test_multi_gpu_data_parallel_forward` by ydshieh in 23374
* Fix test typos - audio feature extractors by LWprogramming in 23310
* Added type hints for `Graphormer` pytorch version by dewasahu2003 in 23073
* Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by gojiteji in 23356
* Use `mkstemp` to replace deprecated `mktemp` by ready-research in 23372
* Fix `RwkvModel` by ydshieh in 23392
* Update `test_batched_inference_image_captioning_conditioned` by ydshieh in 23391
* OPT/BioGPT: Improved attention mask shape exception by gante in 23270
* Fix chat prompt in HFAgent by IvanSedykh in 23335
* 🌐 [i18n-KO] Translated `asr.mdx` to Korean by sim-so in 23106
* Minor fixes in transformers-tools by Wauplin in 23364
* [`Pix2Struct`] Add conditional generation on docstring example by younesbelkada in 23399
* Generate: faster `can_generate` check on TF and Flax by gante in 23398
* [AutoModel] fix `torch_dtype=auto` in `from_pretrained` by stas00 in 23379
* Docs: add link to assisted generation blog post by gante in 23397
* Build with non Python files by sgugger in 23405
* Generate: add test to check KV format by gante in 23403
* Replace appends with list comprehension. by ttsugriy in 23359
* Fix smdistributed check by sgugger in 23414
* Why crash the whole run when HFHub gives a 50x error? by ropoctl in 23320
* Run doctest (in PRs) only when some doc example(s) are modified by ydshieh in 23387
* Update `ConvNextV2ModelIntegrationTest::test_inference_image_classification_head` by ydshieh in 23402
* Fix a typo in HfAgent docstring. by ttsugriy in 23420
* Use dict.items to avoid unnecessary lookups. by ttsugriy in 23415
* Update 3 docker files to use cu118 by ydshieh in 23406
* [`SAM`] fix sam slow test by younesbelkada in 23376
* Return early once stop token is found. by ttsugriy in 23421
* [Reland] search model buffers for dtype as the last resort by cyyever in 23319
* Add Missing tokenization test [electra] by IMvision12 in 22997
* Small fixes and link in the README by LysandreJik in 23428
* TF: embeddings out of bounds check factored into function by gante in 23427
* Update Bigbird Pegasus tests by ydshieh in 23431
* Encoder-Decoder: add informative exception when the decoder is not compatible by gante in 23426
* Remove hardcoded prints in Trainer by hugoabonizio in 23432
* Fix device issue in `SwiftFormerModelIntegrationTest::test_inference_image_classification_head` by ydshieh in 23435
* Generate: skip left-padding tests on old models by gante in 23437
* remove unnecessary print in gpt neox sequence classifier by cfhammill in 23433
* 🌐 [i18n-KO] Translated `tasks/zero_shot_object_detection.mdx` to Korean by HanNayeoniee in 23430
* Fix (skip) a pipeline test for `RwkvModel` by ydshieh in 23444
* Fix DecisionTransformerConfig doctring by joaoareis in 23450
* TF: GPT2 with native embedding layers by gante in 23436
* Make `RwkvModel` accept `attention_mask` but discard it internally by ydshieh in 23442
* Less flaky `test_assisted_decoding_matches_greedy_search` by ydshieh in 23451
* Update tiny models and pipeline tests by ydshieh in 23446
* Properly guard PyTorch stuff by sgugger in 23452
* Add an option to log result from the Agent by sgugger in 23454
* Clean up CUDA kernels by sgugger in 23455
* fix bug in group_texts function, that was inserting short batches by BodaSadalla98 in 23429
* feat: Whisper prompting by connor-henderson in 22496
* README: Fix affiliation for MEGA by julien-c in 23394
* Remove .data usages in optimizations.py by alanwaketan in 23417
* TF port of the Segment Anything Model (SAM) by Rocketknight1 in 22970
* [`RWKV`] Rwkv fix for 8bit inference by younesbelkada in 23468
* Use config to set name and description if not present by sgugger in 23473
* Fix `transformers`' DeepSpeed CI job by ydshieh in 23463
* Fix PretrainedConfig `min_length` docstring by joaoareis in 23471
* Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by loevlie in 23475
* [`Blip`] Remove redundant shift right by younesbelkada in 23153
* Fix DeepSpeed stuff in the nightly CI by ydshieh in 23478
* Fix confusing `transformers` installation in CI by ydshieh in 23465
* Fix `tests/repo_utils/test_get_test_info.py` by ydshieh in 23485
* Debug example code for MegaForCausalLM by Tylersuard in 23382
* Remove erroneous `img` closing tag by xenova in 23646
* Fix tensor device while attention_mask is not None by zspo in 23538
* Fix accelerate logger bug by younesbelkada in 23650
* Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by TimDettmers in 23535
* Fix wav2vec2 is_batched check to include 2-D numpy arrays by LWprogramming in 23223
* changing the requirements to a cpu torch version that works by sshahrokhi in 23483
* Fix SAM tests and use smaller checkpoints by Rocketknight1 in 23656
* Update workflow files by ydshieh in 23658
* small fix to remove unused eos in processor when it's not used. by Narsil in 23408
* Fix typo in a parameter name for open llama model by aaalexlit in 23637
* Fix PyTorch SAM tests by ydshieh in 23682
* 🌐 [i18n-KO] Translated `tasks/monocular_depth_estimation.mdx` to Korean by HanNayeoniee in 23621
* Fix a `BridgeTower` test by ydshieh in 23694
* [`SAM`] Fixes pipeline and adds a dummy pipeline test by younesbelkada in 23684
* TF version compatibility fixes by Rocketknight1 in 23663
* [`Blip`] Fix blip doctest by younesbelkada in 23698
* is_batched fix for remaining 2-D numpy arrays by LWprogramming in 23309
* Skip `TFCvtModelTest::test_keras_fit_mixed_precision` for now by ydshieh in 23699
* fix: load_best_model_at_end error when load_in_8bit is True by dkqkxx in 23443
* Fix some docs what layerdrop does by zspo in 23691
* add GPTJ/bloom/llama/opt into model list and enhance the jit support by sywangyi in 23291

* Paged Optimizer + Lion Optimizer for Trainer by TimDettmers in 23217
* Export to ONNX doc refocused on using optimum, added tflite by MKhalusova in 23434
* fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by uchuhimo in 23683
* fix gptj could not jit.trace in GPU by sywangyi in 23317
* Better TF docstring types by Rocketknight1 in 23477
* Minor awesome-transformers.md fixes by pagarsky in 23453
* TF SAM memory reduction by Rocketknight1 in 23732
* fix: delete duplicate sentences in `document_question_answering.mdx` by jungnerd in 23735
* fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by connor-henderson in 23724
* Overhaul TF serving signatures + dummy inputs by Rocketknight1 in 23234
* [Whisper] Reduce batch size in tests by sanchit-gandhi in 23736
* Fix the regex in `get_imports` to support multiline try blocks and excepts with specific exception types by dakinggg in 23725
* Remove the last few TF serving sigs by Rocketknight1 in 23738
* Fix `pip install --upgrade accelerate` command in modeling_utils.py by tloen in 23747
* Fix psuh_to_hub in Trainer when nothing needs pushing by sgugger in 23751
* Revamp test selection for the example tests by sgugger in 23737
* [LongFormer] code nits, removed unused parameters by ArthurZucker in 23749
* Fix is_ninja_available() by niltok in 23752
* [`Nllb-Moe`] Fix nllb moe accelerate issue by younesbelkada in 23758
* [OPT] Doc nit, using fast is fine by ArthurZucker in 23789
* Fix RWKV backward on GPU by sgugger in 23774
* Update trainer.mdx class_weights example by amitportnoy in 23787
* no_cuda does not take effect in non distributed environment by sywangyi in 23795
* Fix no such file or directory error by RissyRan in 23783
* Enable code-specific revision for code on the Hub by sgugger in 23799
* add type hint in pipeline model argument by y3sar in 23740
* TF SAM shape flexibility fixes by Rocketknight1 in 23842
* fix Whisper tests on GPU by hollance in 23753
* 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean by KIHOON71 in 22956
* [i18n-KO] Translated video_classification.mdx to Korean by KIHOON71 in 23026
* 🌐 [i18n-KO] Translated `troubleshooting.mdx` to Korean by 0525hhgus in 23166
* Adds a FlyteCallback by peridotml in 23759
* Update collating_graphormer.py by clefourrier in 23862
* [LlamaTokenizerFast] nit update `post_processor` on the fly by ArthurZucker in 23855
* 23388 Issue: Update RoBERTa configuration by vijethmoudgalya in 23863
* [from_pretrained] imporve the error message when `_no_split_modules` is not defined by ArthurZucker in 23861
* Editing issue with pickle def with lambda function by Natyren in 23869
* Adds AutoProcessor.from_pretrained support for MCTCTProcessor by Ubadub in 23856
* 🌐 [i18n-KO] Translated `pad_truncation.mdx` to Korean by sim-so in 23823
* Fix bug leading to missing token in GPTSanJapaneseTokenizer by passaglia in 23883
* Fix last instances of kbit -> quantized by sgugger in 23797
* fix(configuration_llama): add `keys_to_ignore_at_inference` to `LlamaConfig` by calico-1226 in 23891
* Fix Trainer when model is loaded on a different GPU by sgugger in 23792
* Support shared tensors by thomasw21 in 23871
* ensure banned_mask and indices in same device by cauyxy in 23901
* Unpin numba by sanchit-gandhi in 23162
* [`bnb`] add warning when no linear by younesbelkada in 23894
* fix: Replace `add_prefix_space` in `get_prompt_ids` with manual space for FastTokenizer compatibility by connor-henderson in 23796
* [`RWKV`] Fix RWKV 4bit by younesbelkada in 23910
* add conditional statement for auxiliary loss calculation by harisankar95 in 23899
* Raise error if loss can't be calculated - ViT MIM by amyeroberts in 23872
* Empty circleci config by sgugger in 23913
* Bug fix - flip_channel_order for channels first images by amyeroberts in 23701
* Re-enable squad test by sgugger in 23912
* Update the update metadata job to use upload_folder by sgugger in 23917
* [PushToHub] Make it possible to upload folders by NielsRogge in 23920
* Skip device placement for past key values in decoder models by sgugger in 23919
* [Flax Whisper] Update decode docstring by sanchit-gandhi in 23908
* Effectively allow `encoder_outputs` input to be a tuple in pix2struct by fxmarty in 23932
* Fix doc string nits by sheonhan in 23929
* Pin rhoknp by sgugger in 23937
* rename DocumentQuestionAnsweringTool parameter input to match docstring by Adam-D-Lewis in 23939
* Update stale.yml to use HuggingFaceBot by LysandreJik in 23941
* Make TF ESM inv_freq non-trainable like PyTorch by Rocketknight1 in 23940
* Revert "Update stale.yml to use HuggingFaceBot" by LysandreJik in 23943
* 23675 Registering Malay language by soongbren in 23689
* Modify device_map behavior when loading a model using from_pretrained by SunMarc in 23922
* use _make_causal_mask in clip/vit models by kashif in 23942
* Fix `ReduceLROnPlateau` object has no attribute 'get_last_lr' by wasupandceacar in 23944
* [MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by patrickvonplaten in 23813
* add new mms functions to doc by patrickvonplaten in 23954
* 🌐 [i18n-KO] Translated object_detection.mdx to Korean by KIHOON71 in 23164
* Trainer: fixed evaluate raising `KeyError` for ReduceLROnPlateau by claudius-kienle in 23952
* [Whisper Tokenizer] Skip special tokens when decoding with timestamps by sanchit-gandhi in 23945
* Add an option to reduce compile() console spam by Rocketknight1 in 23938
* Added time-series blogs to the models by elisim in 23857
* Fix typo in doc comment of BitsAndBytesConfig by ledyba in 23978
* Skip `test_multi_gpu_data_parallel_forward` for `MobileViTV2ModelTest` by ydshieh in 24017
* Update README.md by ydshieh in 24022
* Auto tokenizer registration by Bearnardd in 23965
* expose safe_serialization argument in the pipeline API by yessenzhar in 23775
* Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by affjljoo3581 in 23976
* TensorBoard callback no longer adds hparams by bri25yu in 23999
* 🌐 [i18n-KO] Translated `tasks_explained.mdx` to Korean by 0525hhgus in 23844
* Fix `MobileViTV2` checkpoint name by ydshieh in 24018
* Pin `deepspeed` to `0.9.2` for now by ydshieh in 24024
* 🌐 [i18n-KO] Translated `language-modeling.mdx` by wonhyeongseo in 23969
* 🌐 [i18n-KO] Translated `bertology.mdx` to Korean by wonhyeongseo in 23968
* Add check for tied parameters by SunMarc in 24029
* Fixing single candidate_label return. by Narsil in 24023
* Use TruncatedNormal from Keras initializers by hvaara in 24036
* Prevent ZeroDivisionError on `trainer.evaluate` if model and dataset are tiny by tomaarsen in 24049
* Modification of one text example file should trigger said test by sgugger in 24051
* Tiny fix for `check_self_hosted_runner.py` by ydshieh in 24052
* Reduce memory usage in TF building by Rocketknight1 in 24046
* Move TF building to an actual build() method by Rocketknight1 in 23760
* Use new parametrization based weight norm if available by ezyang in 24030
* bring back `filtered_test_list_cross_tests.txt` by ydshieh in 24055
* Fix device placement for model-parallelism in generate for encoder/de… by sgugger in 24025
* Remote code improvements by sgugger in 23959
* Generate: increase left-padding test atol by gante in 23448
* [Wav2Vec2] Fix torch srcipt by patrickvonplaten in 24062
* Add support for non-rust implemented tokenization for `__getitem__` method. by jacklanda in 24039
* Support PEFT models when saving the model using trainer by younesbelkada in 24073
* [`Hub`] Add `safe_serialization` in push_to_hub by younesbelkada in 24074
* Fix `is_optimum_neuron_available` by michaelbenayoun in 23961
* [`bnb`] Fix bnb skip modules by younesbelkada in 24043
* Be nice to TF by ydshieh in 24076
* Make the TF dummies even smaller by Rocketknight1 in 24071
* [doc build] Use secrets by mishig25 in 24079
* Fix expected value in tests of the test fetcher by sgugger in 24077
* Update delete_doc_comment_trigger.yml by mishig25 in 24084
* Do not prepare lr scheduler as it as the right number of steps by sgugger in 24088
* Fix a tiny typo in `WhisperForConditionalGeneration::generate` docstring by sadra-barikbin in 24045
* [`Trainer`] Correct behavior of `_load_best_model` for PEFT models by younesbelkada in 24103

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* shehanmunasinghe
* Add swiftformer (22686)
* Add MobileViTv2 (22820)
* TimDettmers
* Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory (23535)
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (23479)
* Paged Optimizer + Lion Optimizer for Trainer (23217)
* elisim
* [Time-Series] Autoformer model (21891)
* Added time-series blogs to the models (23857)
* KIHOON71
* 🌐 [i18n-KO] Translated `fast_tokenizers.mdx` to Korean (22956)
* [i18n-KO] Translated video_classification.mdx to Korean (23026)
* 🌐 [i18n-KO] Translated object_detection.mdx to Korean (23164)
* D-Roberts
* Add TensorFlow implementation of EfficientFormer (22620)
* soongbren
* 23675 Registering Malay language (23689)

4.29.2

Not secure

Fixes the package so non-Python files (like CUDA kernels) are properly included.

4.29.1

Not secure

Reverts a regression in the FSDP integration.
Add `pip install transformers["agent"]` to have all dependencies agents rely on.
Fixes the documentation about agents.

* Revert "search buffers for dtype" in 23308 by sgugger
* Fix image segmentation tool test in 23306 by sgugger
* Fix typo in gradio-tools docs in 23305 by freddyaboulton
* Fix broken links in the agent docs in 23297 by sgugger
* Agents extras in 23301 by LysandreJik
* Update transformers_agents.mdx in 23289 by mishig25
* Update custom_tools.mdx: fix link in 23292 by mishig25

4.29.0

Not secure

Transformers Agents

Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the [docs](https://huggingface.co/docs/transformers/transformers_agents)

* Transformers Agents by LysandreJik patrickvonplaten and sgugger in 23214

SAM

SAM (Segment Anything Model) was proposed in [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

The model can be used to predict segmentation masks of any object of interest given an input image.

* Add Segment Anything Model (SAM) by ArthurZucker in 22654
* [`SAM`] Correct arxiv link by younesbelkada in 22886
* Fix SAM example in documentation by fxmarty in 22887
* [`SAM`] Change to `facebook/sam-vit-base` by younesbelkada in 22891
* Small sam patch by ArthurZucker in 22920
* [`SAM`] Add sam doc by younesbelkada in 22984
* Make sam ONNX exportable by fxmarty in 22915
* `DocumentQuestionAnsweringPipeline` only for fast ⚡ tokenizers by ydshieh in 22745
* Add `automatic-mask-generation` pipeline for Segment Anything Model (SAM) by ArthurZucker in 22840
* Expose AutoModelForMaskGeneration by fxmarty in 22910

RWKV

RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).

This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).

* Add RWKV-4 by sgugger and younesbelkada in 22797

FocalNet

The FocalNet model was proposed in [Focal Modulation Networks](https://arxiv.org/abs/2203.11926) by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like [ViT](https://huggingface.co/docs/transformers/model_doc/vit) and [Swin](https://huggingface.co/docs/transformers/model_doc/swin)) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.

* Add FocalNet by NielsRogge in 21532
* Add focalnet backbone by alaradirik in 23104

OpenLLaMa

The Open-Llama model was proposed in [Open-Llama project](https://github.com/s-JoL/Open-Llama) by community developer s-JoL.

The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

* add open-llama model with ckpt by s-JoL in 22795

Assisted Generation

Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!

* Generate: Add assisted generation by gante in 22211
* Generate: assisted generation with sample (take 2) by gante in 22949

Code on the Hub from another repo

To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.

* Use code on the Hub from another repo by sgugger in 22698
* Use code on the Hub from another repo by sgugger in 22814

Breaking changes

This releases has three breaking changes compared to version v4.28.0.

The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.

* 🚨🚨🚨 [`Pix2Struct`] Attempts to fix training issues 🚨🚨🚨 by younesbelkada in 23004

The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library

* 🚨🚨🚨 Use default ignore index in Luke by sgugger in 23014

Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct

* 🚨🚨🚨 [`Blip`] remove labels masking by younesbelkada in 23024

Bugfixes and improvements

* Change `torch_dtype` to `str` when `saved_model=True` in `save_pretrained` for TF models by ydshieh in 22740
* 🌐 [i18n-KO] Translated `training.mdx` to Korean by gabrielwithappy in 22670
* Remove `DS_BUILD_AIO=1` by ydshieh in 22741
* [trainer] update url by stas00 in 22747
* fix(llama): fix LlamaTokenzier by rockmagma02 in 22746
* Generate: handle text conditioning with multimodal encoder-decoder models by gante in 22748
* Revert (for now) the change on `Deta` in 22437 by ydshieh in 22750
* Fix `serving_output` for TF composite models (encoder-decoder like models) by ydshieh in 22743
* 🌐 [i18n-KO] Translated `sequence_classification.mdx` to Korean by 0525hhgus in 22655
* [Examples] TPU-based training of a language model using TensorFlow by sayakpaul in 21657
* Pix2struct: doctest fix by gante in 22761
* Generate: pin number of beams in BART test by gante in 22763
* Fix a mistake in Llama weight converter log output. by aljungberg in 22764
* Fix failing torchscript tests for `CpmAnt` model by ydshieh in 22766
* [WIP]🌐 [i18n-KO] Translated `tutorial/proprecssing.mdx` to Korean by sim-so in 22578
* Tweak ESM tokenizer for Nucleotide Transformer by Rocketknight1 in 22770
* Fix word_ids hyperlink by mayankagarwals in 22765
* Seq2SeqTrainer: Evict decoder_input_ids only when it is created from labels by gante in 22772
* Indexing fix - CLIP checkpoint conversion by amyeroberts in 22776
* Move labels to the same device as logits for Whisper by oscar-garzon in 22779
* Generate: add CJK support to TextStreamer by bcol23 in 22664
* Fix `test_word_time_stamp_integration` for `Wav2Vec2ProcessorWithLMTest` by ydshieh in 22800
* 🌐 [i18n-KO] Translated `custom_models.mdx` to Korean by HanNayeoniee in 22534
* [i18n-KO] fix: docs: ko: sagemaker anchors and `_toctree.yml` by jungnerd in 22549
* improve(llama): Faster apply_rotary_pos_emb by fpgaminer in 22785
* Fix sneaky torch dependency in TF example by Rocketknight1 in 22804
* 🌐 [i18n-KO] Translated `tasks/translation.mdx` to Korean by wonhyeongseo in 22805
* Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests by ydshieh in 22774
* Fix squeeze into torch 1.x compatible form in llama model by DyeKuu in 22808
* Remove accelerate from tf test reqs by muellerzr in 22777
* Simplify update metadata job by sgugger in 22811
* Revert "Use code on the Hub from another repo" by sgugger in 22813
* Introduce `PartialState` as the device handler in the `Trainer` by muellerzr in 22752
* Mark auto models as important by sgugger in 22815
* TTS fine-tuning for SpeechT5 by hollance in 21824
* 🌐 [i18n-KO] Fix anchor links for docs `auto_tutorial`, `training` by gabrielwithappy in 22796
* Fix Past CI not running against the latest `main` by ydshieh in 22823
* Fix `test_eos_token_id_int_and_list_top_k_top_sampling` by ydshieh in 22826
* Update accelerate version + warning check fix by muellerzr in 22833
* Fix from_pretrained when model is instantiated on the meta device by sgugger in 22837
* Raise err if minimum Accelerate version isn't available by muellerzr in 22841
* Make ClipSeg compatible with model parallelism by youssefadr in 22844
* fix SpeechT5 doc comments by hollance in 22854
* move preprocess_logits_for_metrics before _nested_gather in trainer.e… by ChenyangLiu in 22603
* feat(model parallelism): move labels to the same device as logits for M2M100 by elabongaatuo in 22850
* use `acceleratemain` in CI by ydshieh in 22859
* Remove 'main' from doc links by amyeroberts in 22860
* Show diff between 2 CI runs on Slack reports by ydshieh in 22798
* Remove some pipeline skip cases by ydshieh in 22865
* Fixup multigpu local_rank by muellerzr in 22869
* Fix to removing ESM special tokens by Rocketknight1 in 22870
* XGLM: Fix left-padding (PT and TF) by gante in 22828
* Patching clip model to create mask tensor on the device by shanmugamr1992 in 22711
* fix: Correct small typo in docstring by oscar-defelice in 22857
* Generation: only search for eos_token if set by xloem in 22875
* Change schedule CI time by ydshieh in 22884
* fix warning function call creating logger error (max_length and max_new_tokens) by QuentinAmbard in 22889
* [Examples/TensorFlow] minor refactoring to allow compatible datasets to work by sayakpaul in 22879
* moved labels to the same device as logits for OTP, CODEGEN ,gptj and pixel2struct model by sushmanthreddy in 22872
* Include decoder_attention_mask in T5 model inputs by aashiqmuhamed in 22835
* Fix weight tying in TF-ESM by Rocketknight1 in 22839
* Pin flax & optax version by amyeroberts in 22895
* Revert DeepSpeed stuff from accelerate integration by muellerzr in 22899
* [tensorflow] Add support for the `is_symbolic_tensor` predicate by hvaara in 22878
* moved labels to the same device as logits for LILT model by sushmanthreddy in 22898
* Skip a failing test on main for now by ydshieh in 22911
* Moved labels to enable parallelism pipeline in Luke model by sushmanthreddy in 22909
* Fix counting in Slack report for some jobs by ydshieh in 22913
* Fix Slack report for Nightly CI and Past CI by ydshieh in 22901
* fix CLAP integration tests by hollance in 22834
* Add inputs_embeds functionality when generating with GPT-Neox by TobiasLee in 22916
* Fix `FillMaskPipelineTests` by ydshieh in 22894
* Update Swin MIM output class by alaradirik in 22893
* fix bug of CLAP dataloader by lukewys in 22674
* Fix: Seq2SeqTrainingArgs overriding to_dict for GenerationConfig json support by Natooz in 22919
* fix: GPTNeoX half inference error by SeongBeomLEE in 22888
* Remove broken test_data symlink in legacy s2s examples by hvaara in 22876
* Hardcode GELU as the intermediate activation for ESM by Rocketknight1 in 22892
* [CI] clap patch fusion test values by ArthurZucker in 22922
* ddp fixes for training by winglian in 22874
* tests: Fix flaky test for NLLB-MoE by connor-henderson in 22880
* Fix a minor bug in CI slack report by ydshieh in 22906
* Feature to convert videomae huge and small finetuned on kinetics and ssv2 added to the videomae to pytorch converter by sandstorm12 in 22788
* vilt_model by sushmanthreddy in 22930
* [i18n-KO] Translated `accelerate.mdx` to Korean by 0525hhgus in 22830
* [CLAP] Doc nits by ArthurZucker in 22957
* Generate: Add exception path for Donut by gante in 22955
* Update tiny models and a few fixes by ydshieh in 22928
* 🌐 [i18n-KO] Translated `tasks/masked_language_modeling.mdx` to Korean by HanNayeoniee in 22838
* 🌐 [i18n-KO] Translated `tasks/summarization.mdx` to Korean by sim-so in 22783
* Add an attribute to disable custom kernels in deformable detr in order to make the model ONNX exportable by fxmarty in 22918
* Decorate `test_codegen_sample_max_time` as flaky by ydshieh in 22953
* Raise error if `stride` is too high in `TokenClassificationPipeline` by boyleconnor in 22942
* [Fix Bugs] Fix keys in `_load_pretrained_model` by hanrui1sensetime in 22947
* Prepare tests for hfh 0.14 by Wauplin in 22958
* 🌐 [i18n-KO] Translated `run_scripts.mdx` to Korean by HanNayeoniee in 22793
* Reverting Deta cloning mecanism. by Narsil in 22656
* fix ValueError message in LlamaAttention by othertea in 22966
* Fix TF example in quicktour by Rocketknight1 in 22960
* Update feature selection in to_tf_dataset by amyeroberts in 21935
* 🌐 [i18n-KO] translate `create_a_model` doc to Korean by gabrielwithappy in 22754
* Install `acceleretemain` in PyTorch Past CI jobs by ydshieh in 22963
* Fix `DeepSpeed` CI job link in Past CI by ydshieh in 22967
* 🌐 [i18n-KO] Fixed `tasks/masked_language_modeling.mdx` by HanNayeoniee in 22965
* Neptune fix bug init run by AleksanderWWW in 22836
* fixed small typo in code example by jvanmelckebeke in 22982
* Avoid invalid escape sequences, use raw strings by Lingepumpe in 22936
* [`DocTest`] Fix correct checkpoint by younesbelkada in 22988
* 🌐 [i18n-KO] Translated `serialization.mdx` to Korean by wonhyeongseo in 22806
* Fix typo in mega.mdx by dleve123 in 22998
* 🌐 [i18n-KO] Translated `tasks/image_captioning.mdx` to Korean by sim-so in 22943
* 🌐 [i18n-KO] Translated `token_classification.mdx` to Korean by 0525hhgus in 22945
* Add TensorFlow Wav2Vec2 for sequence classification by nandwalritik in 22073
* Remove a failing ONNX test by ydshieh in 23011
* Add gradient checkpointing to Whisper Flax by versae in 22954
* [`PEFT`] Add HFTracer support for PEFT by younesbelkada in 23006
* [Llama Tokenizer] Fast llama template by ArthurZucker in 22959
* Fix None value when adding info to auto_map by sgugger in 22990
* Bring back PartialState DeepSpeed by muellerzr in 22921
* Add methods to PreTrainedModel to use PyTorch's BetterTransformer by fxmarty in 21259
* [`Pix2Struct`] Fix pix2struct doctest by younesbelkada in 23023
* 🌐 [i18n-KO] Translated `multilingual.mdx` to Korean by HanNayeoniee in 23008
* Fix the expected error in `test_offline_mode_pipeline_exception` by ydshieh in 23022
* [MEGA] nit size test by ArthurZucker in 23028
* added GPTNeoXForTokenClassification by peter-sk in 23002
* added GPTNeoForTokenClassification by peter-sk in 22908
* Update `BridgeTowerModelTester` by ydshieh in 23029
* Fix bigbird random attention by Bearnardd in 21023
* Fix CLAP link across all READMEs by ehsanmok in 23032
* Make `_test_xla_generate` less flaky by ydshieh in 22996
* Add Trainer support for ReduceLROnPlateau by pie3636 in 23010
* 🌐 [i18n-KO] Translated `model_sharing.mdx` to Korean by 0525hhgus in 22991
* [docs] Doc TOC updates by MKhalusova in 23049
* Cuda rng_state_all is used when saving in distributed mode so same should also be used when loading by ShivamShrirao in 23045
* Skip pt/flax equivalence tests in pytorch `bigbird` test file by ydshieh in 23040
* Fix model parallelism for `BridgeTower` by ydshieh in 23039
* extend the test files by ydshieh in 23043
* Generate: prepare assisted generation for release by gante in 23052
* Fix grammar error in summarization pipeline by SKaplanOfficial in 23080
* Fix string syntax error in logger warning message (additional comma) by xwen99 in 23083
* Add `BioGPTForSequenceClassification` by awinml in 22253
* Fix `convnext` __init__ by IMvision12 in 23078
* Depricate xpu_backend for ddp_backend by muellerzr in 23085
* 🌐 [i18n-KO] Translated `tasks/image_classification.mdx` to Korean by 0525hhgus in 23048
* 🌐 [i18n-KO] Translated `tasks/question_answering.mdx` to Korean by jungnerd in 23012
* 🌐 [i18n-KO] Translated `tasks/zero_shot_image_classification.mdx` to Korean by HanNayeoniee in 23065
* added type hints for blip_text pytorch model by iamarunbrahma in 23071
* Save the tokenizer and image preprocessor after training a model with the contrastive image-text example by regisss in 23035
* GPT2ForQuestionAnswering by peter-sk in 23030
* 🌐 [i18n-KO] Translated `torchscript.mdx` to Korean by sim-so in 23060
* Fix check for backword_pos by winglian in 23075
* [`Flava`] Fix flava `torch.distributed.nn.functional import all_gather` issue by younesbelkada in 23108
* [ONNX] Sam fix by michaelbenayoun in 23110
* num_noise_spans should be <= num_items 22246 by alexcpn in 22938
* Fixed default config for `Pix2Struct` model to set `Pix2StructTextModel` to `is_decoder=True` by gbarello-uipath in 23051
* Pin numba for now by sgugger in 23118
* [`Doctest`] Fix pix2struct doctest by younesbelkada in 23121
* Generate: slow assisted generation test by gante in 23125
* Generate: correct beam search length on score calculation for multi batch generation by gante in 23127
* improve unclear documentation by ManuelFay in 23123
* Generate: better warnings with pipelines by gante in 23128
* Add resources for LayoutLmV2 and reformat documentation resources by y3sar in 23115
* Fix ConvNext V2 paramater naming issue by alaradirik in 23122
* Support union types `X | Y` syntax for `HfArgumentParser` for Python 3.10+ by XuehaiPan in 23126
* Add support for beam search's num_return_sequencs flag in flax by mayankagarwals in 23082
* docs: ko: update `_toctree.yml` by HanNayeoniee in 23112
* [doc] Try a few ≠ ways of linking to Papers, users, and org profiles by julien-c in 22611
* Enable to use custom tracer in FX `symbolic_trace` by regisss in 23105
* Remove redundant print statements by alaradirik in 23133
* Tidy Pytorch GLUE benchmark example by tlby in 23134
* GPTNeoForQuestionAnswering by peter-sk in 23057
* Add methods to update and verify out_features out_indices by amyeroberts in 23031
* fix spelling error by digger-yu in 23143
* Remove typo in perf_train_gpu_many.mdx by MrGeislinger in 23144
* fix resume fsdp by qywu in 23111
* gpt2 multi-gpu fix by peter-sk in 23149
* GPTNeoXForQuestionAnswering by peter-sk in 23059
* [`GPT-J`] Fix causal mask dtype by younesbelkada in 23147
* Add FlaxWhisperForAudioClassification model by raghavanone in 22883
* [docs] Text to speech task guide by MKhalusova in 23107
* Generate: text generation pipeline no longer emits `max_length` warning when it is not set by gante in 23139
* Revert "Add FlaxWhisperForAudioClassification model" by sgugger in 23154
* Add TrOCR resources by huangperry in 23142
* fixed whisper positional encoding by anvilarth in 23167
* 🌐 [i18n-KO] docs: ko: Translate `multiple_choice.mdx` by gabrielwithappy in 23064
* fix: Passing language as acronym to Whisper generate by connor-henderson in 23141
* Add `no_trainer` scripts to pre-train Vision Transformers by awinml in 23156
* Add FlaxWhisperForAudioClassification model by raghavanone in 23173
* search buffers for dtype by cyyever in 23159
* Update LLaMA docs with arxiv link by awinml in 23191
* fix random attention for pytorch's bigbird/pegasus_bigbird by Bearnardd in 23056
* Fix hf_argparser.parse_json_file to open file with utf-8 encoding, close file when finished by RobertBaruch in 23194
* Generate: starcoder 🤜 🤛 assisted generation by gante in 23182
* Fixing class embedding selection in owl-vit by orrzohar in 23157
* New version of Accelerate for the Trainer by sgugger in 23204
* docs: Fix broken link in 'How to add a model...' by connor-henderson in 23216
* Pin tensorflow-probability by sgugger in 23220
* [SAM] Add resources by NielsRogge in 23224
* audio_utils improvements by hollance in 21998
* make opt checkpoint dir name correct by dumpmemory in 21660
* Fix typo ; Update output.mdx by furkanakkurt1335 in 23227
* fix: Update run_qa.py to work with deepset/germanquad by sjrl in 23225
* Add Japanese translation to accelerate.mdx by rustinwelter in 23232
* Proposed fix for TF example now running on safetensors. by Narsil in 23208
* Support ratios for `logging_steps`, `eval_steps`, and `save_steps` by konstantinjdobler in 23235
* [Doctests] Refactor doctests + add CI by ArthurZucker in 22987
* Revert "[Doctests] Refactor doctests + add CI" by sgugger in 23245
* Fix `from_config` by DyeKuu in 23246
* CTC example: updated trainer parameters to save tokenizer by MKhalusova in 23243
* [docs] Audio task guides fixes by MKhalusova in 23239
* Improve Docs of Custom Tools and Agents by patrickvonplaten in 23255
* Metadata update by LysandreJik in 23259
* Update Image segmentation description by LysandreJik in 23261
* pin `tensorflow-probability` in docker files by ydshieh in 23260
* Refine documentation for Tools by sgugger in 23266
* Fix new line bug in chat mode for agents by sgugger in 23267
* Render custom tool docs a bit better by sgugger in 23269
* chore: allow protobuf 3.20.3 requirement by jose-turintech in 22759
* Fix link displayed for custom tools by sgugger in 23274
* Remove missplaced test file by sgugger in 23275

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* gabrielwithappy
* 🌐 [i18n-KO] Translated `training.mdx` to Korean (22670)
* 🌐 [i18n-KO] Fix anchor links for docs `auto_tutorial`, `training` (22796)
* 🌐 [i18n-KO] translate `create_a_model` doc to Korean (22754)
* 🌐 [i18n-KO] docs: ko: Translate `multiple_choice.mdx` (23064)
* 0525hhgus
* 🌐 [i18n-KO] Translated `sequence_classification.mdx` to Korean (22655)
* [i18n-KO] Translated `accelerate.mdx` to Korean (22830)
* 🌐 [i18n-KO] Translated `token_classification.mdx` to Korean (22945)
* 🌐 [i18n-KO] Translated `model_sharing.mdx` to Korean (22991)
* 🌐 [i18n-KO] Translated `tasks/image_classification.mdx` to Korean (23048)
* sim-so
* [WIP]🌐 [i18n-KO] Translated `tutorial/proprecssing.mdx` to Korean (22578)
* 🌐 [i18n-KO] Translated `tasks/summarization.mdx` to Korean (22783)
* 🌐 [i18n-KO] Translated `tasks/image_captioning.mdx` to Korean (22943)
* 🌐 [i18n-KO] Translated `torchscript.mdx` to Korean (23060)
* HanNayeoniee
* 🌐 [i18n-KO] Translated `custom_models.mdx` to Korean (22534)
* 🌐 [i18n-KO] Translated `tasks/masked_language_modeling.mdx` to Korean (22838)
* 🌐 [i18n-KO] Translated `run_scripts.mdx` to Korean (22793)
* 🌐 [i18n-KO] Fixed `tasks/masked_language_modeling.mdx` (22965)
* 🌐 [i18n-KO] Translated `multilingual.mdx` to Korean (23008)
* 🌐 [i18n-KO] Translated `tasks/zero_shot_image_classification.mdx` to Korean (23065)
* docs: ko: update `_toctree.yml` (23112)
* wonhyeongseo
* 🌐 [i18n-KO] Translated `tasks/translation.mdx` to Korean (22805)
* 🌐 [i18n-KO] Translated `serialization.mdx` to Korean (22806)
* peter-sk
* added GPTNeoXForTokenClassification (23002)
* added GPTNeoForTokenClassification (22908)
* GPT2ForQuestionAnswering (23030)
* GPTNeoForQuestionAnswering (23057)
* gpt2 multi-gpu fix (23149)
* GPTNeoXForQuestionAnswering (23059)
* s-JoL
* add open-llama model with ckpt (22795)
* awinml
* Add `BioGPTForSequenceClassification` (22253)
* Add `no_trainer` scripts to pre-train Vision Transformers (23156)
* Update LLaMA docs with arxiv link (23191)
* raghavanone
* Add FlaxWhisperForAudioClassification model (22883)
* Add FlaxWhisperForAudioClassification model (23173)

4.28.1

Not secure

Fixes a regression for DETA models

- Revert the change on Deta by ydshieh in 22750

4.28.0

Not secure

LLaMA

The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights [here](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform?usp=send_form) then use the conversion script to generate a checkpoint compatible with Hugging Face

* LLaMA Implementation by zphang in 21955

Pix2Struct, MatCha, DePlot

Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.

* Add Pix2Struct by younesbelkada in 21400
* Add DePlot + MatCha on `transformers` by younesbelkada in 22528

Mega

MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGA’s compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.

* Add Mega: Moving Average Equipped Gated Attention by mnaylor5 in 21766

GPTBigCode

The model is a an optimized [GPT2 model](https://huggingface.co/docs/transformers/model_doc/gpt2) with support for Multi-Query Attention.

* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by jlamypoirier in 22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

* `NLLB-MoE` Adds the moe model by ArthurZucker in 22024

Serializing 8bit models

* [`bnb`] Let's make serialization of int8 models possible by younesbelkada in 22177

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo [here](https://huggingface.co/ybelkada/bloom-1b7-8bit)

Breaking Changes

Ordering of height and width for the BLIP image processor

_Notes from the PR:_

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.

* 🚨🚨🚨 Fix ordering of height, width for BLIP image processor by amyeroberts in 22466

Prefix tokens for the NLLB tokenizer

The big problem was the `prefix` and `suffix` tokens of the NLLB tokenizer.

Previous behaviour:
python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> 2: '</s>'
>>> 256047 : 'eng_Latn'

New behaviour

python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

python
>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)

* 🚨🚨🚨 `[NLLB Tokenizer]` Fix the prefix tokens 🚨🚨🚨 by ArthurZucker in 22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

* Add TF port of BLIP by Rocketknight1 in 22090

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

* Generate: Export TF generate with a TF tokenizer by gante in 22310

Task guides

A new task guide has been added, focusing on depth-estimation.

* Depth estimation task guide by MKhalusova in 22205

Bugfixes and improvements

* Load optimizer state on CPU to avoid CUDA OOM by sgugger in 22159
* Run all tests by default by sgugger in 22162
* Fix: unfinished_sequences with correct device by Stxr in 22184
* Revert 22152 MaskedImageCompletionOutput changes by amyeroberts in 22187
* Regression pipeline device by sgugger in 22190
* Update BridgeTowerForContrastiveLearning by abhiwand in 22145
* t5 remove data dependency by prathikr in 22097
* Fix DeepSpeed CI by ydshieh in 22194
* Fix typo in Align docs by alaradirik in 22199
* Update expected values in `MgpstrModelIntegrationTest` by ydshieh in 22195
* Italian Translation of migration.mdx by Baelish03 in 22183
* Update tiny model creation script by ydshieh in 22202
* Temporarily fix ONNX model exporting error by SatyaJandhyalaAtMS in 21830
* [`XGLM`] Add `accelerate` support for XGLM by younesbelkada in 22207
* fixes a typo in WhisperFeatureExtractor docs. by susnato in 22208
* Hotfix for natten issue with torch 2.0.0 on CircleCI by ydshieh in 22218
* fix typos in llama.mdx by keturn in 22223
* fix code example in mgp-str doc by wdp-007 in 22219
* Use `dash==2.8.1` for now for daily CI by ydshieh in 22227
* LLaMA house-keeping by sgugger in 22216
* fix AutoTP in deepspeed could not work for bloom by sywangyi in 22196
* Add LlamaForSequenceClassification by lewtun in 22209
* Removed .mdx extension in two links by MKhalusova in 22230
* fix(docs): fix task guide links in model docs by Seb0 in 22226
* Fix natten by alihassanijr in 22229
* Revert "Use `dash==2.8.1` for now for daily CI" by ydshieh in 22233
* Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by ma787639046 in 22234
* [trainer] param count for deepspeed zero3 by stas00 in 22193
* Update training_args.py -- a nightly install is not required anymore for torch.compile by pminervini in 22266
* [Docs] fix typos in some tokenizer docs by yesinkim in 22256
* Italian translation perf_infer_cpu by nickprock in 22243
* [Trainer] Add optional communication backends for torch.distributed when using GPU by heya5 in 22247
* Fix the gradient checkpointing bug of the llama model by yqy2001 in 22270
* Fix balanced and auto device_map by sgugger in 22271
* Rework a bit the LLaMA conversion script by sgugger in 22236
* Proper map location for optimizer load by sgugger in 22273
* Fix doc links by amyeroberts in 22274
* Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by ani300 in 22279
* Example of pad_to_multiple_of for padding and truncation guide & docstring update by MKhalusova in 22278
* Update vision docstring bool masked pos by amyeroberts in 22237
* replace_8bit_linear modules_to_not_convert default value fix by BlackSamorez in 22238
* Fix error in mixed precision training of `TFCvtModel` by gcuder in 22267
* More doctests by ydshieh in 22268
* fix more doctests by ydshieh in 22292
* Add translation perf_infer_gpu_one for it by davidegazze in 22296
* Restore fp16 support on xla gpu device by ymwangg in 22300
* Correct NATTEN function signatures and force new version by alihassanijr in 22298
* [deepspeed] offload + non-cpuadam optimizer exception doc by stas00 in 22044
* Final update of doctest by ydshieh in 22299
* Add MaskedImageModelingOutput by alaradirik in 22212
* Enable traced model for text-generation task by jiqing-feng in 22265
* add low_cpu_mem_usage option in run_clm.py example which will benefit… by sywangyi in 22288
* fix: Allow only test_file in pytorch and flax summarization by connor-henderson in 22293
* Fix position embeddings for GPT-J and CodeGen by njhill in 22069
* Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by silentghoul-spec in 22302
* Enforce `max_memory` for device_map strategies by sgugger in 22311
* Beef up Llama tests by gante in 22314
* docs: Resolve incorrect type typo in trainer methods by tomaarsen in 22316
* Chunkable token classification pipeline by luccailliau in 21771
* Fix PipelineTests skip conditions by ydshieh in 22320
* [deepspeed zero3] need `generate(synced_gpus=True, ...)` by stas00 in 22242
* [gptj] support older pytorch version by stas00 in 22325
* Move common properties to BackboneMixin by amyeroberts in 21855
* Backbone add mixin tests by amyeroberts in 22542
* Backbone add out indices by amyeroberts in 22493
* [`MBart`] Add `accelerate` support for MBart by younesbelkada in 22309
* Fixed gradient checkpoint bug for TimeSeriesTransformer by mollerup23 in 22272
* Mention why one needs to specify max_steps in Trainer by lhoestq in 22333
* Fix various imports by sgugger in 22281
* Minor typo in pipeline FillMaskPipeline's documentation. by SamuelLarkin in 22339
* Added type hints to TFDeiTModel by Batese2001 in 22327
* Fix --bf16 option support for Neuron after PR 22300 by jeffhataws in 22307
* Generate: add test for left-padding support by gante in 22322
* Enable training Llama with model or pipeline parallelism by kooshi in 22329
* Automatically create/update tiny models by ydshieh in 22275
* [HFTracer] Make embeddings ops take on the dtype of the weight by jamesr66a in 22347
* Fix typo in Greedy Search Description by awinml in 22345
* Generate: Add GPTNeoX integration test by gante in 22346
* Update docker files to use official torch 2.0.0 by ydshieh in 22357
* Pin tensorflow-text to go with tensorflow by sgugger in 22362
* Improve error message by Mahrkeenerh in 22361
* TensorFlow: pin maximum version to 2.12 by gante in 22364
* Resnet flax by Shubhamai in 21472
* [Trainer] add disclaimer that full_determinism is slow by stas00 in 22368
* [safetensors] don't use in `torch<1.10` by stas00 in 22370
* TensorFlow: additional missing `cmake` dependencies in CI by gante in 22383
* Changed world_size() to get_world_size() bugfix by Charlie-Bell in 22381
* Translated documentation in italian by nickprock in 22388
* Adapt find_tied_parameters to handle breaking change in Accelerate by sgugger in 22360
* load_in_8bit now respects 'balanced' device maps in multi-gpu environments by kooshi in 22377
* Wav2Vec2ProcessorWithLM can return N best hypotheses now by vsokolovskii in 22235
* Seq2seq trainer generation config arg by Natooz in 22323
* Generate: support for left-padding on GPTNeoX and Llama by gante in 22382
* [`bnb`] Force `requires_grad` to be `False` by younesbelkada in 22396
* Transformers env safetensors by sgugger in 22400
* [Pix2Struct] Add support to resize embeddings by NielsRogge in 22394
* Trainer: move Seq2SeqTrainer imports under the typing guard by gante in 22401
* Trainer: missing None check by gante in 22404
* Hardware Auto-Setup for Examples by dongreenberg in 22319
* [neptune] fix checkpoint bug with relative out_dir by kshitij12345 in 22102
* Fix bug in perplexity guide calculations and update perplexity numbers. Fixes 22348 by fpgaminer in 22411
* [performance] ensure `causal_mask` is created directly on device by jeffra in 22378
* MBart: Fix docs and doctests by gante in 22422
* Add clean_up_tokenization_spaces to config by ArthurZucker in 22341
* Hyperparameter search reporting to W&B by NoB0 in 22440
* [`bnb`] fix bnb failing test by younesbelkada in 22439
* [`Generate`] Add conditional generation for multimodal models by younesbelkada in 22424
* Don't hard error when cache version can't be converted to int by sgugger in 22427
* Use real tokenizers if tiny version(s) creation has issue(s) by ydshieh in 22428
* Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by sgugger in 22444
* [`Pix2Struct`] Fix slow test by younesbelkada in 22448
* Revert "Fix --bf16 option support for Neuron after PR 22300" by jeffhataws in 22451
* Update Neptune docs by normandy7 in 22452
* Avoid using personal HF token in CI by ydshieh in 22453
* Update release instructions by sgugger in 22454
* Pin ruff by sgugger in 22455
* Update: ignore padding support for TransfoXL training when n_clusters==0 by StefanHeng in 22457
* Rescale image back if it was scaled during PIL conversion by amyeroberts in 22458
* Skip flaky NLLB Moe test for now by amyeroberts in 22463
* Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by hvaara in 22285
* [NLLB-MoE] `model_type` update for auto mapping by ArthurZucker in 22470
* Llama: support for `max_position_embeddings` by gante in 22471
* Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by manueldeprada in 22473
* (Re-)Enable Nightly + Past CI by ydshieh in 22393
* Relax `eos_token_id < 0` checks in `generate()` from `ValueError` to warning by lewtun in 22472
* Update `Wav2Vec2ProcessorWithLM` doc example by ydshieh in 22474
* Making sure we can use safetensors to serialize all the time. by Narsil in 22437
* Update Neptune callback docstring by normandy7 in 22497
* Test fetch v2 by sgugger in 22367
* Update convert_llama_weights_to_hf.py by Ricardokevins in 22525
* [Time-Series] fix past_observed_mask type by elisim in 22076
* Fix llama tokenizer by ArthurZucker in 22402
* [WIP] docs: ko: sagemaker.mdx by jungnerd in 22509
* added biogpt token classifier by upjabir in 22447
* Generate: `TextIteratorStreamer` (streamer for gradio) by gante in 22501
* Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by larekrow in 22526
* llama docs: fix conversion script url by python273 in 22514
* fix LayoutLMv3TokenizerFast subword label after 'Ġ' token by thibaultdouzon in 21695
* [BLIP] fix cross attentions for BlipTextEncoder by zhbh01 in 22515
* [`Trainer`] Force `is_model_parallel` when model is loaded in multiple GPUs using `accelerate` by younesbelkada in 22532
* [`T5`] Enable naive Pipeline Parallelism training for T5 by younesbelkada in 22535
* Fix missing metrics with multiple eval datasets by hawkeoni in 22536
* [setup] drop deprecated `distutils` usage by XuehaiPan in 22531
* Generate: Enable easier TextStreamer customization by vblagoje in 22516
* [setup] migrate setup script to `pyproject.toml` by XuehaiPan in 22539
* Update test_image_processing_pix2struct.py by younesbelkada in 22543
* Fix OPTForQuestionAnswering doc string by curlup in 22481
* Generate: Add text streamer decoding options by gante in 22544
* 🔥py38 + torch 2 🔥🔥🔥🚀 by ydshieh in 22204
* Time to Say Goodbye, torch 1.7 and 1.8 by ydshieh in 22291
* [Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by TheWall9 in 22416
* Implemented safetensors checkpoints save/load for Trainer by ViktorooReps in 22498
* Remove hack for dynamic modules and use Python functions instead by sgugger in 22537
* [`bnb`] Fix typo by younesbelkada in 22556
* Add id2label and label2id to model's config in run_xnil by maziyarpanahi in 22558
* Soft error whisper. by Narsil in 22475
* corrected the code comment for the output of find_pruneable_heads_and_indices by SunHaozhe in 22557
* Flax Regnet by Shubhamai in 21867
* fix `_no_split_modules` for Whisper model by pacman100 in 22486
* Fix inverted conditional in TF common test! by Rocketknight1 in 22540
* Generate: `TextIteratorStreamer` timeout by gante in 22576
* Move back doctest instructions to setup.cfg by sgugger in 22587
* Tests: disable `accelerate_tests` mark warnings by gante in 22585
* Fix PT-TF equivalence test for GPT1 by Rocketknight1 in 22586
* Add thousands separator in training summary by qmeeus in 22583
* docs: ko: complete `_toctree.yml` by wonhyeongseo in 22581
* Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by mpenagar in 21926
* Fix a typo in one of the BLIP pretrained checkpoint names by Rocketknight1 in 22588
* Adding support for BPE merge creation from scores instead of ids. by Narsil in 22582
* Use native TF checkpoints for the BLIP TF tests by Rocketknight1 in 22593
* feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by kaustubh-s1 in 22591
* Adding Llama FastTokenizer support. by Narsil in 22264
* Revert error back into warning for byte fallback conversion. by Narsil in 22607
* Seq2SeqTrainer: use unwrapped model to retrieve the generation config by gante in 22584
* Make tiny model creation + pipeline testing more robust by ydshieh in 22500
* docs: Fix broken link to generation strategies by connor-henderson in 22623
* update_pip_test_mapping by ydshieh in 22606
* A script to add/update `pipeline_model_mapping` systematically by ydshieh in 22180
* [`bnb`] 8bit models should not be converted to `DDP` by younesbelkada in 22628
* LlamaTokenizerFast Fix (.., from_slow=True). by Narsil in 22630
* [`Blip`] Fix slow tests and doctests with correct values by younesbelkada in 22632
* Update tiny model summary file for recent models by ydshieh in 22637
* fix FSDP version related issues by pacman100 in 22489
* 🌐[i18n-KO] Translate `autoclass_tutorial` to Korean and Fix the typo of `quicktour` by gabrielwithappy in 22533
* Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by xssChauhan in 22596
* Fix typo by Ronalmoo in 22650
* Fix `MegaModel` CI by ydshieh in 22652
* 🌐 [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean by wonhyeongseo in 22508
* Small nit, by ArthurZucker in 22653
* [tokenization] do not push special file by ArthurZucker in 22657
* [OPT] Fix default attention mask size by ArthurZucker in 22649
* Generate: add API warning to streamers by gante in 22659
* Revert migration of setup to pyproject.toml by sgugger in 22658
* moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by iamarunbrahma in 22663
* Model parallelism: Moving labels to the same device as logits for BridgeTower models by shahad-mahmud in 22676
* (feat): Moving labels to same device as logits for Deit by xssChauhan in 22679
* Make dynamic code work with offline mode by sgugger in 22661
* Fix quantization docs typo by python273 in 22666
* use __func__ to check can_generate by xin3he in 22643
* add GPTNeoXForSequenceClassification by Asugawara in 22671
* Model parallelism: Moving labels to same devices as the logits are by shahad-mahmud in 22691
* Update some `MarkupLM` tests' expected values by ydshieh in 22667
* Make it easier to develop without a dev install by sgugger in 22697
* Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by mayankagarwals in 22702
* Clarify stride option by luccailliau in 22684
* Remove 2 failing ONNX conversion tests by ydshieh in 22660
* Replace -100s in predictions by the pad token by sgugger in 22693
* Fix decorator order by ydshieh in 22708
* Update input values for docstring by amyeroberts in 22631
* remove wrong doc in readme by ArthurZucker in 22723
* Added parallel device usage for GPT-J by jprivera44 in 22713
* add model resources for CPMAnt (new) by pioliverse in 20906
* Modify pipeline_tutorial.mdx by ARKA1112 in 22726
* [tests] switch to torchrun by stas00 in 22712
* `torch.distributed` group initialization for `torch_neuron` disabled when `optimum-neuron` is installed by michaelbenayoun in 22728
* add fast support and option by ArthurZucker in 22724
* Update warning levels by NielsRogge in 22727
* Fix docstrings for TF BLIP by Rocketknight1 in 22618
* [Doctest] Add configuration_m2m_100.py by elabongaatuo in 22733
* [Doctest] Add configuration_mvp.py by elabongaatuo in 22735
* Indexing fix for gpt_bigcode by jlamypoirier in 22737
* Make vilt, switch_transformers compatible with model parallelism by Xrenya in 22703
* [Pix2struct] Simplify generation by NielsRogge in 22527

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* zphang
* LLaMA Implementation (21955)
* Seb0
* fix(docs): fix task guide links in model docs (22226)
* mnaylor5
* Add Mega: Moving Average Equipped Gated Attention (21766)
* Shubhamai
* Resnet flax (21472)
* Flax Regnet (21867)
* wonhyeongseo
* docs: ko: complete `_toctree.yml` (22581)
* 🌐 [i18n-KO] Translated `pipeline_tutorial.mdx` to Korean (22508)
* jlamypoirier
* Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (22575)
* Indexing fix for gpt_bigcode (22737)
* pioliverse
* add model resources for CPMAnt (new) (20906)

Page 6 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 6 of 26

4.30.0

4.29.2

4.29.1

4.29.0

4.28.1

4.28.0

Page 6 of 26

Links

Releases