Transformers

Latest version: v4.41.0

Safety actively analyzes 631215 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 26

4.19.0

Not secure
*Disclaimer*: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068) by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

* Add OPT by younesbelkada in 17088

FLAVA

The FLAVA model was proposed in [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

* [feat] Add FLAVA model by apsdehal in 16654

YOLOS

The YOLOS model was proposed in [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/main/en/model_doc/vit) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

* Add YOLOS by NielsRogge in 16848

RegNet

The RegNet model was proposed in [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

* RegNet by FrancescoSaverioZuppichini in 16188

TAPEX

The TAPEX model was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

* Add TAPEX by NielsRogge in 16473

Data2Vec: vision

The Data2Vec model was proposed in [data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/pdf/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

* [Data2Vec] Add data2vec vision by patrickvonplaten in 16760
* Add Data2Vec for Vision in TF by sayakpaul in 17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed.
PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

* PyTorch FSDP integration in Trainer by pacman100 in 17136

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

* Add image classification script, no trainer by NielsRogge in 16727
* Add semantic script no trainer, v2 by NielsRogge in 16788
* Add semantic script, trainer by NielsRogge in 16834

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

- Added es version of language_modeling.mdx doc by jQuinRivero in 17021
- Spanish translation of the file philosophy.mdx by jkmg in 16922
- Documentation: Spanish translation of fast_tokenizers.mdx by jloayza10 in 16882
- Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by omarespejel in 16685
- Spanish translation of the file multilingual.mdx by SimplyJuanjo in 16329
- Added spanish translation of autoclass_tutorial. by Duedme in 17069
- Fix style error in Spanish docs by osanseviero in 17197

Improvements and bugfixes

* [modeling_utils] rearrange text by stas00 in 16632
* Added Annotations for PyTorch models by anmolsjoshi in 16619
* Allow the same config in the auto mapping by sgugger in 16631
* Update no_trainer scripts with new Accelerate functionalities by muellerzr in 16617
* Fix doc example by NielsRogge in 16448
* Add inputs vector to calculate metric method by lmvasque in 16461
* [megatron-bert-uncased-345m] fix conversion by stas00 in 16639
* Remove parent/child tests in auto model tests by sgugger in 16653
* Updated _load_pretrained_model_low_mem to check if keys are in the state_dict by FrancescoSaverioZuppichini in 16643
* Update Support image on README.md by BritneyMuller in 16615
* bert: properly mention deprecation of TF2 conversion script by stefan-it in 16171
* add vit tf doctest with add_code_sample_docstrings by johko in 16636
* Fix error in doc of `DataCollatorWithPadding` by secsilm in 16662
* Fix QA sample by ydshieh in 16648
* TF generate refactor - Beam Search by gante in 16374
* Add tests for no_trainer and fix existing examples by muellerzr in 16656
* only load state dict when the checkpoint is not None by laurahanu in 16673
* [Trainer] tf32 arg doc by stas00 in 16674
* Update audio examples with MInDS-14 by stevhliu in 16633
* add a warning in `SpmConverter` for sentencepiece's model using the byte fallback feature by SaulLu in 16629
* Fix some doc examples in task summary by ydshieh in 16666
* Jia multi gpu eval by liyongsea in 16428
* Generate: min length can't be larger than max length by gante in 16668
* fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist by sadransh in 16686
* [Doctests] Correct task summary by patrickvonplaten in 16644
* Add Doc Test for BERT by vumichien in 16523
* Fix t5 shard on TPU Pods by agemagician in 16527
* update decoder_vocab_size when resizing embeds by patil-suraj in 16700
* Fix TF_MASKED_LM_SAMPLE by ydshieh in 16698
* Rename the method test_torchscript by ydshieh in 16693
* Reduce memory leak in _create_and_check_torchscript by ydshieh in 16691
* Enable more test_torchscript by ydshieh in 16679
* Don't push checkpoints to hub in `no_trainer` scripts by muellerzr in 16703
* Private repo TrainingArgument by nbroad1881 in 16707
* Handle image_embeds in ViltModel by ydshieh in 16696
* Improve PT/TF equivalence test by ydshieh in 16557
* Fix example logs repeating themselves by muellerzr in 16669
* [Bart] correct doc test by patrickvonplaten in 16722
* Add Doc Test GPT-2 by ArEnSc in 16439
* Only call get_output_embeddings when tie_word_embeddings is set by smelm in 16667
* Update run_translation_no_trainer.py by raki-1203 in 16652
* Qdqbert example add benchmark script with ORT-TRT by shangz-ai in 16592
* Replace assertion with exception by anmolsjoshi in 16720
* Change the chunk_iter function to handle by Narsil in 16730
* Remove duplicate header by sgugger in 16732
* Moved functions to pytorch_utils.py by anmolsjoshi in 16625
* TF: remove set_tensor_by_indices_to_value by gante in 16729
* Add Doc Tests for Reformer PyTorch by hiromu166 in 16565
* [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init by sanchit-gandhi in 16728
* [FlaxWav2Vec2Model] Fix bug in attention mask by sanchit-gandhi in 16725
* add Bigbird ONNX config by vumichien in 16427
* TF generate: handle case without cache in beam search by gante in 16704
* Fix decoding score comparison when using logits processors or warpers by bryant1410 in 10638
* [Doctests] Fix all T5 doc tests by patrickvonplaten in 16646
* Fix 16660 (tokenizers setters of ids of special tokens) by davidleonfdez in 16661
* [from_pretrained] refactor find_mismatched_keys by stas00 in 16706
* Add Doc Test for GPT-J by ArEnSc in 16507
* Fix and improve CTRL doctests by jeremyadamsfisher in 16573
* [modeling_utils] better explanation of ignore keys by stas00 in 16741
* CI: setup-dependent pip cache by gante in 16751
* Reduce Funnel PT/TF diff by ydshieh in 16744
* Add defensive check for config num_labels and id2label by sgugger in 16709
* Add self training code for text classification by tuvuumass in 16738
* [self-scheduled ci] explain where dependencies are by stas00 in 16757
* Fixup no_trainer examples scripts and add more tests by muellerzr in 16765
* [Doctest] added doctest changes for electra by bhadreshpsavani in 16675
* Enabling `Tapex` in table question answering pipeline. by Narsil in 16663
* [Flax `.from_pretrained`] Raise a warning if model weights are not in float32 by sanchit-gandhi in 16762
* Fix batch size in evaluation loop by sgugger in 16763
* Make nightly install dev accelerate by muellerzr in 16783
* [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop by stas00 in 16717
* Kill async pushes when calling push_to_hub with blocking=True by sgugger in 16755
* Improve image classification example by NielsRogge in 16585
* [SpeechEncoderDecoderModel] Fix bug in reshaping labels by sanchit-gandhi in 16748
* Fix issue avoid-missing-comma found at https://codereview.doctor by code-review-doctor in #16768
* [trainer / deepspeed] fix hyperparameter_search by stas00 in 16740
* [modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests by stas00 in 16657
* Fix PT TF ViTMAE by ydshieh in 16766
* Update README.md by NielsRogge in 16797
* Pin Jax to last working release by sgugger in 16808
* CI: non-remote GH Actions now use a python venv by gante in 16789
* TF generate refactor - XLA sample by gante in 16713
* Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed by allanj in 16786
* Create empty venv on cache miss by gante in 16816
* [ViT, BEiT, DeiT, DPT] Improve code by NielsRogge in 16799
* [Quicktour Audio] Improve && remove ffmpeg dependency by patrickvonplaten in 16723
* fix megatron bert convert state dict naming by Codle in 15820
* use base_version to check torch version in torch_less_than_1_11 by nbroad1881 in 16806
* Allow passing encoder_ouputs as tuple to EncoderDecoder Models by jsnfly in 16814
* Refactor issues with yaml by LysandreJik in 16772
* fix _setup_devices in case where there is no torch.distributed package in build by dlwh in 16821
* Clean up semantic segmentation tests by NielsRogge in 16801
* Fix `LayoutLMv2` tokenization docstrings by qqaatw in 16187
* Wav2 vec2 phoneme ctc tokenizer optimisation by ArthurZucker in 16817
* [Flax] improve large model init and loading by patil-suraj in 16148
* Some tests misusing assertTrue for comparisons fix by code-review-doctor in 16771
* Type hints added for TFMobileBert by Dahlbomii in 16505
* fix `rum_clm.py` seeking text column name twice by dandelin in 16624
* Add onnx export of models with a multiple choice classification head by echarlaix in 16758
* [ASR Pipeline] Correct init docs by patrickvonplaten in 16833
* Add doc about `attention_mask` on gpt2 by wiio12 in 16829
* TF: Add sigmoid activation function by gante in 16819
* Correct Logging of Eval metric to Tensorboard by Jeevesh8 in 16825
* replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in some docstrings by SaulLu in 16835
* Type hints added to Speech to Text by Dahlbomii in 16506
* Improve test_pt_tf_model_equivalence on PT side by ydshieh in 16731
* Add support for bitsandbytes by manuelciosici in 15622
* [Typo] Fix typo in modeling utils by patrickvonplaten in 16840
* add DebertaV2 fast tokenizer by mingboiz in 15529
* Fixing return type tensor with `num_return_sequences>1`. by Narsil in 16828
* [modeling_utils] use less cpu memory with sharded checkpoint loading by stas00 in 16844
* [docs] fix url by stas00 in 16860
* Fix custom init sorting script by sgugger in 16864
* Fix multiproc metrics in no_trainer examples by muellerzr in 16865
* Long QuestionAnsweringPipeline fix. by Narsil in 16778
* t5: add conversion script for T5X to FLAX by stefan-it in 16853
* tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars by ghlai9665 in 15901
* Adding support for `array` key in raw dictionnaries in ASR pipeline. by Narsil in 16827
* Return input_ids in ImageGPT feature extractor by sgugger in 16872
* Use ACT2FN to fetch ReLU activation by eldarkurtic in 16874
* Fix GPT-J onnx conversion by ChainYo in 16780
* Fix doctest list by ydshieh in 16878
* New features for CodeParrot training script by loubnabnl in 16851
* Add missing entries in mappings by ydshieh in 16857
* TF: rework XLA generate tests by gante in 16866
* Minor fixes/improvements in `convert_file_size_to_int` by mariosasko in 16891
* Add doc tests for Albert and Bigbird by vumichien in 16774
* Add OnnxConfig for ConvBERT by ChainYo in 16859
* TF: XLA repetition penalty by gante in 16879
* Changes in create_optimizer to support tensor parallelism with SMP by cavdard in 16880
* [DocTests] Fix some doc tests by patrickvonplaten in 16889
* add bigbird typo fixes by ChainYo in 16897
* Fix doc test quicktour dataset by patrickvonplaten in 16929
* Add missing ckpt in config docs by ydshieh in 16900
* Fix PyTorch RAG tests GPU OOM by ydshieh in 16881
* Fix RemBertTokenizerFast by ydshieh in 16933
* TF: XLA logits processors - minimum length, forced eos, and forced bos by gante in 16912
* TF: XLA Logits Warpers by gante in 16899
* added deit onnx config by rushic24 in 16887
* TF: XLA stable softmax by gante in 16892
* Replace deprecated logger.warn with warning by sanchit-gandhi in 16876
* Fix issue probably-meant-fstring found at https://codereview.doctor by code-review-doctor in #16913
* Limit the use of PreTrainedModel.device by sgugger in 16935
* apply torch int div to layoutlmv2 by ManuelFay in 15457
* FIx Iterations for decoder by agemagician in 16934
* Add onnx config for RoFormer by skrsna in 16861
* documentation: some minor clean up by mingboiz in 16850
* Fix RuntimeError message format by ftnext in 16906
* use original loaded keys to find mismatched keys by tricktreat in 16920
* [Research] Speed up evaluation for XTREME-S by anton-l in 16785
* Fix HubertRobustTest PT/TF equivalence test on GPU by ydshieh in 16943
* Misc. fixes for Pytorch QA examples: by searchivarius in 16958
* [HF Argparser] Fix parsing of optional boolean arguments by NielsRogge in 16946
* Fix `distributed_concat` with scalar tensor by Yard1 in 16963
* Update custom_models.mdx by mishig25 in 16964
* Fix add-new-model-like when model doesn't support all frameworks by sgugger in 16966
* Fix multiple deletions of the same files in save_pretrained by sgugger in 16947
* Fixup no_trainer save logic by muellerzr in 16968
* Fix doc notebooks links by sgugger in 16969
* Fix check_all_models_are_tested by ydshieh in 16970
* Add -e flag to some GH workflow yml files by ydshieh in 16959
* Update tokenization_bertweet.py by datquocnguyen in 16941
* Update check_models_are_tested to deal with Windows path by ydshieh in 16973
* Add parameter --config_overrides for run_mlm_wwm.py by conan1024hao in 16961
* Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx by amyeroberts in 16993
* set eos_token_id to None to generate until max length by ydshieh in 16989
* Fix savedir for by epoch by muellerzr in 16996
* Update README to latest release by sgugger in 16997
* use scale=1.0 in floats_tensor called in speech model testers by ydshieh in 17007
* Update all require decorators to use skipUnless when possible by muellerzr in 16999
* TF: XLA bad words logits processor and list of processors by gante in 16974
* Make create_extended_attention_mask_for_decoder static method by pbelevich in 16893
* Update README_zh-hans.md by tarzanwill in 16977
* Updating variable names. by Narsil in 16445
* Revert "Updating variable names. by Narsil in 16445)"
* Replace dict/BatchEncoding instance checks by Mapping by sgugger in 17014
* Result of new doc style with fixes by sgugger in 17015
* Add a check on config classes docstring checkpoints by ydshieh in 17012
* Add translating guide by omarespejel in 17004
* update docs of length_penalty by manandey in 17022
* [FlaxGenerate] Fix bug in decoder_start_token_id by sanchit-gandhi in 17035
* Fx with meta by michaelbenayoun in 16836
* [Flax(Speech)EncoderDecoder] Fix bug in `decoder_module` by sanchit-gandhi in 17036
* Fix typo in RetriBERT docstring by mpoemsl in 17018
* add torch.no_grad when in eval mode by JunnYu in 17020
* Disable Flax GPU tests on push by sgugger in 17042
* Clean up vision tests by NielsRogge in 17024
* [Trainer] Move logic for checkpoint loading into separate methods for easy overriding by calpt in 17043
* Update no_trainer examples to use new logger by muellerzr in 17044
* Fix no_trainer examples to properly calculate the number of samples by muellerzr in 17046
* Allow all imports from transformers by LysandreJik in 17050
* Make the sacremoses dependency optional by LysandreJik in 17049
* Clean up setup.py by sgugger in 17045
* [T5 Tokenizer] Model has no fixed position ids - there is no hardcode… by patrickvonplaten in 16990
* [FlaxBert] Add ForCausalLM by sanchit-gandhi in 16995
* Move test model folders by ydshieh in 17034
* Make Trainer compatible with sharded checkpoints by sgugger in 17053
* Remove Python and use v2 action by sgugger in 17059
* Fix RNG reload in resume training from epoch checkpoint by sgugger in 17055
* Remove device parameter from create_extended_attention_mask_for_decoder by pbelevich in 16894
* Fix hashing for deduplication by thomasw21 in 17048
* Skip RoFormer ONNX test if rjieba not installed by lewtun in 16981
* Remove masked image modeling from BEIT ONNX export by lewtun in 16980
* Make sure telemetry arguments are not returned as unused kwargs by sgugger in 17063
* Type hint complete Albert model file. by karthikrangasai in 16682
* Deprecate model templates by sgugger in 17062
* Update to build via git for accelerate by muellerzr in 17084
* Allow saved_model export of TFCLIPModel in save_pretrained by seanmor5 in 16886
* Fix DeBERTa `token_type_ids` by deutschmn in 17082
* 📝 open fresh PR for pipeline doctests by stevhliu in 17073
* minor change on TF Data2Vec test by ydshieh in 17085
* type hints for pytorch models by robotjellyzone in 17064
* Add type hints for BERTGeneration by robsmith155 in 17047
* Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME by orieg in 17091
* Remove torchhub test by sgugger in 17097
* fix missing "models" in pipeline test module by ydshieh in 17090
* Fix link to example scripts by stevhliu in 17103
* Fix self-push CI report path in cat by ydshieh in 17111
* Added BigBirdPegasus onnx config by nandwalritik in 17104
* split single_gpu and multi_gpu by ydshieh in 17083
* LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens by ghlai9665 in 17092
* Add type hints for BigBirdPegasus and Data2VecText PyTorch models by robsmith155 in 17123
* add `mobilebert` onnx configs by manandey in 17029
* [WIP] Fix Pyright static type checking by replacing if-else imports with try-except by d-miketa in 16578
* Add the auto_find_batch_size capability from Accelerate into Trainer by muellerzr in 17068
* Fix MLflowCallback end_run() and add support for tags and nested runs by orieg in 17130
* Fix all docs for accelerate install directions by muellerzr in 17145
* LogSumExp trick `question_answering` pipeline. by Narsil in 17143
* train args defaulting None marked as Optional by d-miketa in 17156
* [trainer] sharded _load_best_model by stas00 in 17150
* [Deepspeed] add many more models to the model zoo test by stas00 in 12695
* Fixing the output of code examples in the preprocessing chapter by HallerPatrick in 17162
* missing file by stas00 in 17164
* Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback by orieg in 17148
* Fix template init by sgugger in 17163
* MobileBERT tokenizer tests by leondz in 16896
* [M2M100 doc] remove duplicate example by patil-suraj in 17175
* Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training by jianan-gu in 17154
* propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs by arampacha in 17105
* Convert image to rgb for clip model by hengkuanwee in 17101
* Add missing RetriBERT tokenizer tests by mpoemsl in 17017
* [WIP] Enable reproducibility for distributed trainings by hasansalimkanmaz in 16907
* Remove unnecessary columns for all dataset types in `Trainer` by Yard1 in 17166
* Fix LED documentation by manuelciosici in 17181
* Ensure tensors are at least 1d for pad and concat by Yard1 in 17179
* add shift_tokens_right in FlaxMT5 by patil-suraj in 17188
* Remove columns before passing to data collator by Yard1 in 17187
* Remove duplicated os.path.join by shijie-wu in 17192
* Fix contents in index.mdx to match docs' sidebar by omarespejel in 17198
* ViT and Swin symbolic tracing with torch.fx by michaelbenayoun in 17182
* migrate azure blob for beit checkpoints by donglixp in 16902
* Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) by sayakpaul in 17194

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* anmolsjoshi
* Added Annotations for PyTorch models (16619)
* Replace assertion with exception (16720)
* Moved functions to pytorch_utils.py (16625)
* vumichien
* Add Doc Test for BERT (16523)
* add Bigbird ONNX config (16427)
* Add doc tests for Albert and Bigbird (16774)
* tuvuumass
* Add self training code for text classification (16738)
* sayakpaul
* Add Data2Vec for Vision in TF (17008)
* robotjellyzone
* type hints for pytorch models (17064)
* d-miketa
* [WIP] Fix Pyright static type checking by replacing if-else imports with try-except (16578)
* train args defaulting None marked as Optional (17156)

4.18.0

Not secure
New model additions

You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)

GLPN

The GLPN model was proposed in [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines [SegFormer](https://huggingface.co/docs/transformers/main/en/model_doc/segformer)’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.

* Add GLPN by NielsRogge in https://github.com/huggingface/transformers/pull/16199

ResNet

The ResNet model was proposed in [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by [Nvidia](https://catalog.ngc.nvidia.com/orgs/nvidia/resources/resnet_50_v1_5_for_pytorch), we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.

ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.

* Resnet by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15770

VAN

The VAN model was proposed in [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.

* Visual Attention Network (VAN) by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16027

VisionTextDualEncoder

The [VisionTextDualEncoderModel](https://huggingface.co/docs/transformers/main/en/model_doc/vision-text-dual-encoder#transformers.VisionTextDualEncoderModel) can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. [ViT](https://huggingface.co/docs/transformers/main/en/model_doc/vit), [BEiT](https://huggingface.co/docs/transformers/main/en/model_doc/beit), [DeiT](https://huggingface.co/docs/transformers/main/en/model_doc/deit)) and any pretrained text autoencoding model as the text encoder (e.g. [RoBERTa](https://huggingface.co/docs/transformers/main/en/model_doc/roberta), [BERT](https://huggingface.co/docs/transformers/main/en/model_doc/bert)). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.

In [LiT: Zero-Shot Transfer with Locked-image Text Tuning](https://arxiv.org/abs/2111.07991) it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.

* add VisionTextDualEncoder and CLIP fine-tuning script by patil-suraj in https://github.com/huggingface/transformers/pull/15701

DiT

DiT was proposed in [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of [BEiT](https://huggingface.co/docs/transformers/main/en/model_doc/beit) (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:

- document image classification: the [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset (a collection of 400,000 images belonging to one of 16 classes).
- document layout analysis: the [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).
- table detection: the [ICDAR 2019 cTDaR](https://github.com/cndplab-founder/ICDAR2019_cTDaR) dataset (a collection of 600 training images and 240 testing images).

* Add Document Image Transformer (DiT) by NielsRogge in https://github.com/huggingface/transformers/pull/15984

DPT

The DPT model was proposed in [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/main/en/model_doc/vit) as backbone for dense prediction tasks like semantic segmentation and depth estimation.

* Add DPT by NielsRogge in https://github.com/huggingface/transformers/pull/15991

Checkpoint sharding

Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:
- it's tougher to upload/download files bigger than 20/30 GB efficiently
- the whole checkpoint might not fit into RAM even if you have enough GPU memory

That's why the `save_pretrained` method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. `from_pretrained` will handle such sharded checkpoints as if there was only one file.

* Checkpoint sharding by sgugger in https://github.com/huggingface/transformers/pull/16343

TensorFlow implementations

GPT-J and ViTMAE are now available in TensorFlow.

* Add TF implementation of GPT-J by stancld in https://github.com/huggingface/transformers/pull/15623
* Add TF ViT MAE by sayakpaul in https://github.com/huggingface/transformers/pull/16255

Documentation guides

The IA migration is wrapped up with a new conceptual guide available.

* Create concept guide section by stevhliu in https://github.com/huggingface/transformers/pull/16369

Improvements and bugfixes

* Fix doc links in release utils by sgugger in https://github.com/huggingface/transformers/pull/15903
* Fix a TF Vision Encoder Decoder test by ydshieh in https://github.com/huggingface/transformers/pull/15896
* [Fix link in pipeline doc] by patrickvonplaten in https://github.com/huggingface/transformers/pull/15906
* Fix and improve REALM fine-tuning by qqaatw in https://github.com/huggingface/transformers/pull/15297
* Freeze FlaxWav2Vec2 Feature Encoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15873
* The tests were not updated after the addition of `torch.diag` by Narsil in https://github.com/huggingface/transformers/pull/15890
* [Doctests] Fix ignore bug and add more doc tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15911
* Enabling MaskFormer in pipelines by Narsil in https://github.com/huggingface/transformers/pull/15917
* Minor fixes for MaskFormer by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15916
* Add vision models to doc tests by NielsRogge in https://github.com/huggingface/transformers/pull/15905
* Fix 15898 by davidleonfdez in https://github.com/huggingface/transformers/pull/15928
* Update doc test readme by patrickvonplaten in https://github.com/huggingface/transformers/pull/15926
* Re-enabling all fast pipeline tests. by Narsil in https://github.com/huggingface/transformers/pull/15924
* Support CLIPTokenizerFast for CLIPProcessor by cosmoquester in https://github.com/huggingface/transformers/pull/15913
* Updating the slow tests: by Narsil in https://github.com/huggingface/transformers/pull/15893
* Adding `MODEL_FOR_INSTANCE_SEGMENTATION_MAPPING` by Narsil in https://github.com/huggingface/transformers/pull/15934
* Add missing support for Flax XLM-RoBERTa by versae in https://github.com/huggingface/transformers/pull/15900
* [FlaxT5 Example] Fix flax t5 example pretraining by patrickvonplaten in https://github.com/huggingface/transformers/pull/15835
* Do not change the output from tuple to list - to match PT's version by ydshieh in https://github.com/huggingface/transformers/pull/15918
* Tests for MaskFormerFeatureExtractor's post_process*** methods by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15929
* Constrained Beam Search [*With* Disjunctive Decoding] by cwkeam in https://github.com/huggingface/transformers/pull/15761
* [LayoutLMv2] Update requires_backends of feature extractor by NielsRogge in https://github.com/huggingface/transformers/pull/15941
* Made MaskFormerModelTest faster by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15942
* [Bug Fix] Beam search example in docs fails & a fix (integrating `max_length` in `BeamScorer.finalize()`) by cwkeam in https://github.com/huggingface/transformers/pull/15555
* remove re-defination of FlaxWav2Vec2ForCTCModule by patil-suraj in https://github.com/huggingface/transformers/pull/15965
* Support modern list type hints in HfArgumentParser by konstantinjdobler in https://github.com/huggingface/transformers/pull/15951
* Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15938
* Fix Embedding Module Bug in Flax Models by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15920
* Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15954
* Update training scripts docs by stevhliu in https://github.com/huggingface/transformers/pull/15931
* Set scale_embedding to False in some TF tests by ydshieh in https://github.com/huggingface/transformers/pull/15952
* Fix LayoutLMv2 test by NielsRogge in https://github.com/huggingface/transformers/pull/15939
* [Tests] Fix ViTMAE integration test by NielsRogge in https://github.com/huggingface/transformers/pull/15949
* Returning outputs only when asked for for MaskFormer. by Narsil in https://github.com/huggingface/transformers/pull/15936
* Speedup T5 Flax training by using Numpy instead of JAX for batch shuffling by yhavinga in https://github.com/huggingface/transformers/pull/15963
* Do a pull in case docs were updated during build by sgugger in https://github.com/huggingface/transformers/pull/15922
* Fix TFEncDecModelTest - Pytorch device by ydshieh in https://github.com/huggingface/transformers/pull/15979
* [Env Command] Add hf hub to env version command by patrickvonplaten in https://github.com/huggingface/transformers/pull/15981
* TF: Update multiple choice example by gante in https://github.com/huggingface/transformers/pull/15868
* TF generate refactor - past without encoder outputs by gante in https://github.com/huggingface/transformers/pull/15944
* Seed _get_train_sampler's generator with arg seed to improve reproducibility by dlwh in https://github.com/huggingface/transformers/pull/15961
* Add `ForInstanceSegmentation` models to `image-segmentation` pipelines by Narsil in https://github.com/huggingface/transformers/pull/15937
* [Doctests] Move doctests to new GPU & Fix bugs by patrickvonplaten in https://github.com/huggingface/transformers/pull/15969
* Removed an outdated check about hdf5_version by ydshieh in https://github.com/huggingface/transformers/pull/16011
* Swag example: Update doc format by gante in https://github.com/huggingface/transformers/pull/16014
* Fix github actions comment by LysandreJik in https://github.com/huggingface/transformers/pull/16009
* Simplify release utils by sgugger in https://github.com/huggingface/transformers/pull/15921
* Make `pos` optional in `PerceiverAudioPreprocessor` to avoid crashing `PerceiverModel` operation by basilevh in https://github.com/huggingface/transformers/pull/15972
* Fix MaskFormer failing test on master by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16012
* Fix broken code blocks in README.md by upura in https://github.com/huggingface/transformers/pull/15967
* Use tiny models for get_pretrained_model in TFEncoderDecoderModelTest by ydshieh in https://github.com/huggingface/transformers/pull/15989
* Add ONNX export for ViT by lewtun in https://github.com/huggingface/transformers/pull/15658
* Add FlaxBartForCausalLM by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15995
* add doctests for bart like seq2seq models by patil-suraj in https://github.com/huggingface/transformers/pull/15987
* Fix warning message in ElectraForCausalLM by pbelevich in https://github.com/huggingface/transformers/pull/16023
* Freeze Feature Encoder in FlaxSpeechEncoderDecoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15997
* Fix dependency error message in ServeCommand by andstor in https://github.com/huggingface/transformers/pull/16033
* [Docs] Improve PyTorch, Flax generate API by patrickvonplaten in https://github.com/huggingface/transformers/pull/15988
* [Tests] Add attentions_option to ModelTesterMixin by NielsRogge in https://github.com/huggingface/transformers/pull/15909
* [README] fix url for Preprocessing tutorial by patil-suraj in https://github.com/huggingface/transformers/pull/16042
* Fix Bug in Flax-Speech-Encoder-Decoder Test by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16041
* Fix TFDebertaV2ConvLayer in TFDebertaV2Model by ydshieh in https://github.com/huggingface/transformers/pull/16031
* Build the doc in a seperate folder then move it by sgugger in https://github.com/huggingface/transformers/pull/16020
* Don't compute metrics in LM examples on TPU by sgugger in https://github.com/huggingface/transformers/pull/16029
* TF: Unpack model inputs through a decorator by gante in https://github.com/huggingface/transformers/pull/15907
* Fix Bug in Flax Seq2Seq Models by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16021
* DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 by LysandreJik in https://github.com/huggingface/transformers/pull/16043
* support new marian models by patil-suraj in https://github.com/huggingface/transformers/pull/15831
* Fix duplicate arguments passed to dummy inputs in ONNX export by lewtun in https://github.com/huggingface/transformers/pull/16045
* FIX: updating doc/example for fine-tune for downstream Token Classification by davidsbatista in https://github.com/huggingface/transformers/pull/16063
* Fix a TF test name (LayoutLMModelTest) by ydshieh in https://github.com/huggingface/transformers/pull/16061
* Move QDQBert in just PyTorch block by sgugger in https://github.com/huggingface/transformers/pull/16062
* Remove assertion over possible activation functions in DistilBERT by mfuntowicz in https://github.com/huggingface/transformers/pull/16066
* Fix torch-scatter version by LysandreJik in https://github.com/huggingface/transformers/pull/16072
* Add type annotations for BERT and copies by Rocketknight1 in https://github.com/huggingface/transformers/pull/16074
* Adding type hints for TFRoBERTa by Rocketknight1 in https://github.com/huggingface/transformers/pull/16057
* Make sure `'torch.dtype'` has str-type value in config and all nested dicts for JSON serializability by feifang24 in https://github.com/huggingface/transformers/pull/16065
* Run daily doctests without time-out at least once by patrickvonplaten in https://github.com/huggingface/transformers/pull/16077
* Add soft length regulation for sequence generation by kevinpl07 in https://github.com/huggingface/transformers/pull/15245
* Update troubleshoot guide by stevhliu in https://github.com/huggingface/transformers/pull/16001
* Add type annotations for ImageGPT by johnnv1 in https://github.com/huggingface/transformers/pull/16088
* Rebuild deepspeed by LysandreJik in https://github.com/huggingface/transformers/pull/16081
* Add missing type hints for all flavors of RoBERTa PyTorch models. by ChainYo in https://github.com/huggingface/transformers/pull/16086
* [Fix doc example] FSMT by ydshieh in https://github.com/huggingface/transformers/pull/16085
* Audio/vision task guides by stevhliu in https://github.com/huggingface/transformers/pull/15808
* [ZeRO] Fixes issue with embedding resize by jeffra in https://github.com/huggingface/transformers/pull/16093
* [Deepspeed] add support for bf16 mode by stas00 in https://github.com/huggingface/transformers/pull/14569
* Change unpacking of TF Bart inputs to use decorator by osanseviero in https://github.com/huggingface/transformers/pull/16094
* add unpack_inputs decorator to mbart tf by Abdelrhman-Hosny in https://github.com/huggingface/transformers/pull/16097
* Add type annotations for segformer pytorch by p-mishra1 in https://github.com/huggingface/transformers/pull/16099
* Add unpack_input decorator to ViT model by johnnv1 in https://github.com/huggingface/transformers/pull/16102
* Add type hints to XLM model (PyTorch) by jbrry in https://github.com/huggingface/transformers/pull/16108
* Add missing type hints for all flavors of LayoutLMv2 PyTorch models. by ChainYo in https://github.com/huggingface/transformers/pull/16089
* Add TFCamembertForCausalLM and ONNX integration test by lewtun in https://github.com/huggingface/transformers/pull/16073
* Fix and document Zero Shot Image Classification by osanseviero in https://github.com/huggingface/transformers/pull/16079
* Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16056
* Update convert_marian_to_pytorch.py by jorgtied in https://github.com/huggingface/transformers/pull/16124
* Make TF pt-tf equivalence test more aggressive by ydshieh in https://github.com/huggingface/transformers/pull/15839
* Fix ProphetNetTokenizer by ydshieh in https://github.com/huggingface/transformers/pull/16082
* Change unpacking of TF mobilebert inputs to use decorator by vumichien in https://github.com/huggingface/transformers/pull/16110
* Steps strategy fix for PushtoHubCallback and changed docstring by merveenoyan in https://github.com/huggingface/transformers/pull/16138
* [ViTMAE] Add copied from statements and fix prefix by NielsRogge in https://github.com/huggingface/transformers/pull/16119
* Spanish translation of the file training.mdx by yharyarias in https://github.com/huggingface/transformers/pull/16047
* Added missing type hints - ELECTRA PyTorch by kamalkraj in https://github.com/huggingface/transformers/pull/16103
* Added missing type hints - Deberta V1 and V2 by kamalkraj in https://github.com/huggingface/transformers/pull/16105
* [Fix doc example] Fix checkpoint name in docstring example by ydshieh in https://github.com/huggingface/transformers/pull/16083
* Better input variable naming for OpenAI (TF) by bhavika in https://github.com/huggingface/transformers/pull/16129
* Improve model variable naming - CLIP [TF] by bhavika in https://github.com/huggingface/transformers/pull/16128
* Add type hints for TFDistilBert by PepijnBoers in https://github.com/huggingface/transformers/pull/16107
* Choose framework for ONNX export by michaelbenayoun in https://github.com/huggingface/transformers/pull/16018
* Add type hints for Luke in PyTorch by bhavika in https://github.com/huggingface/transformers/pull/16111
* Add type hints for PoolFormer in Pytorch by soomiles in https://github.com/huggingface/transformers/pull/16121
* Add type hints for SqueezeBert PyTorch by Tegzes in https://github.com/huggingface/transformers/pull/16126
* Added missing type hints - ELECTRA TF by kamalkraj in https://github.com/huggingface/transformers/pull/16104
* Dcoker images runtime -> devel by LysandreJik in https://github.com/huggingface/transformers/pull/16141
* Add type annotations for CLIP (torch) (16059) by jacobdineen in https://github.com/huggingface/transformers/pull/16106
* Add type hints for FNet PyTorch by wpan03 in https://github.com/huggingface/transformers/pull/16123
* Use `HF_ENDPOINT` for custom endpoints by sgugger in https://github.com/huggingface/transformers/pull/16139
* update albert with tf decorator by infinite-Joy in https://github.com/huggingface/transformers/pull/16147
* clearer model variable naming: ELECTRA by kamalkraj in https://github.com/huggingface/transformers/pull/16143
* Add type hints for GPTNeo PyTorch by Tegzes in https://github.com/huggingface/transformers/pull/16127
* Improve Swin for VisionEncoderDecoder by NielsRogge in https://github.com/huggingface/transformers/pull/16070
* Make transformers.utils.fx. _SUPPORTED_MODELS unique by pbelevich in https://github.com/huggingface/transformers/pull/16015
* Shift responsibilities a bit for issues by patrickvonplaten in https://github.com/huggingface/transformers/pull/16154
* typo "conaining" -> "containing" by marxav in https://github.com/huggingface/transformers/pull/16132
* Configurable Relative Position Max. Distance by agemagician in https://github.com/huggingface/transformers/pull/16155
* Added spanish translation of quicktour.mdx by Duedme in https://github.com/huggingface/transformers/pull/16158
* Use templates by sgugger in https://github.com/huggingface/transformers/pull/16142
* [Fix doc example] Fix first example for the custom_datasets tutorial by MarkusSagen in https://github.com/huggingface/transformers/pull/16087
* [Fix doc example] Fix 2 PyTorch Vilt docstring examples by ydshieh in https://github.com/huggingface/transformers/pull/16076
* TF XLA greedy generation by Rocketknight1 in https://github.com/huggingface/transformers/pull/15786
* clearer model variable naming: pegasus by kamalkraj in https://github.com/huggingface/transformers/pull/16152
* Change unpacking of TF layoutlm inputs to use decorator by vumichien in https://github.com/huggingface/transformers/pull/16112
* update transformer XL with tf decorator by infinite-Joy in https://github.com/huggingface/transformers/pull/16166
* added type hints to yoso by mowafess in https://github.com/huggingface/transformers/pull/16163
* Framework split by sgugger in https://github.com/huggingface/transformers/pull/16030
* [MT5Config] add relative_attention_max_distance in config by patil-suraj in https://github.com/huggingface/transformers/pull/16170
* clearer model variable naming: Tapas by kamalkraj in https://github.com/huggingface/transformers/pull/16145
* clearer model variable naming: Deberta by kamalkraj in https://github.com/huggingface/transformers/pull/16146
* Add flaubert types by ChainYo in https://github.com/huggingface/transformers/pull/16118
* clearer model variable naming: xlnet by kamalkraj in https://github.com/huggingface/transformers/pull/16150
* Add type hints for Perceiver Pytorch by jcmc00 in https://github.com/huggingface/transformers/pull/16174
* Add type hints for Reformer PyTorch by Tegzes in https://github.com/huggingface/transformers/pull/16175
* Fix some Flax models' `hidden_states` by ydshieh in https://github.com/huggingface/transformers/pull/16167
* Add the XTREME-S fine-tuning example by anton-l in https://github.com/huggingface/transformers/pull/15985
* [Xtreme-S] fix some namings by patrickvonplaten in https://github.com/huggingface/transformers/pull/16183
* Replace all deprecated `jax.ops` operations with jnp's `at` by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16078
* clearer model variable naming: funnel by utkusaglm in https://github.com/huggingface/transformers/pull/16178
* clearer model variable naming: blenderbot by utkusaglm in https://github.com/huggingface/transformers/pull/16192
* Minor fixes to XTREME-S by anton-l in https://github.com/huggingface/transformers/pull/16193
* unpack_input decorator for tf_convnext by johko in https://github.com/huggingface/transformers/pull/16181
* clearer model variable naming: blenderbot_small by utkusaglm in https://github.com/huggingface/transformers/pull/16194
* Adding type hints for Distilbert by johnryan465 in https://github.com/huggingface/transformers/pull/16090
* ResNet: update modules names by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16196
* Update a CI job step name by ydshieh in https://github.com/huggingface/transformers/pull/16189
* Fix loading CLIPVisionConfig and CLIPTextConfig by patil-suraj in https://github.com/huggingface/transformers/pull/16198
* TF: add beam search tests by gante in https://github.com/huggingface/transformers/pull/16202
* Swin support for any input size by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15986
* Fix generation min length by patrickvonplaten in https://github.com/huggingface/transformers/pull/16206
* Add/type annotations/model vision by johnnv1 in https://github.com/huggingface/transformers/pull/16151
* VAN: update modules names by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16201
* Fixes Loss for TransfoXL when using Trainer API v2 by LysandreJik in https://github.com/huggingface/transformers/pull/16140
* [Tests] Fix DiT test by NielsRogge in https://github.com/huggingface/transformers/pull/16218
* Fix FlaxRoFormerClassificationHead activation by ydshieh in https://github.com/huggingface/transformers/pull/16168
* Fix typos in docstrings of data_collator.py by daysm in https://github.com/huggingface/transformers/pull/16208
* Fix reproducibility in Training for PyTorch 1.11 by sgugger in https://github.com/huggingface/transformers/pull/16209
* Fix readmes by qqaatw in https://github.com/huggingface/transformers/pull/16217
* MaskFormer: fix device on test by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16219
* Adding Unpack Decorator For DPR model by forsc in https://github.com/huggingface/transformers/pull/16212
* Skip equivalence test for TransfoXL by LysandreJik in https://github.com/huggingface/transformers/pull/16224
* Fix Type Hint of Nan/Inf Logging Filter Arg by Sophylax in https://github.com/huggingface/transformers/pull/16227
* [Flax] remove jax.ops.index by patil-suraj in https://github.com/huggingface/transformers/pull/16220
* Support PEP 563 for HfArgumentParser by function2-llx in https://github.com/huggingface/transformers/pull/15795
* add unpack_inputs decorator for marian by johko in https://github.com/huggingface/transformers/pull/16226
* fix(flax): generate with logits processor/warper by borisdayma in https://github.com/huggingface/transformers/pull/16231
* [FlaxSpeechEncoderDecoderModel] Skip from_encoder_decoder_pretrained by patil-suraj in https://github.com/huggingface/transformers/pull/16236
* [Generate Docs] Correct docs by patrickvonplaten in https://github.com/huggingface/transformers/pull/16133
* [Deepspeed] non-HF Trainer doc update by stas00 in https://github.com/huggingface/transformers/pull/16238
* integrations: mlflow: skip start_run() if a run is already active and sanity check on enabling integration by ktzsh in https://github.com/huggingface/transformers/pull/16131
* Update expected slices for pillow > 9 by NielsRogge in https://github.com/huggingface/transformers/pull/16117
* Attention mask is important in the case of batching... by Narsil in https://github.com/huggingface/transformers/pull/16222
* Change assertion to warning when passing past_key_value to T5 encoder by ZhaofengWu in https://github.com/huggingface/transformers/pull/16153
* Override _pad in LEDTokenizer to deal with global_attention_mask by ydshieh in https://github.com/huggingface/transformers/pull/15940
* Update XLM with TF decorator by louisowen6 in https://github.com/huggingface/transformers/pull/16247
* Add unpack_inputs decorator for ctrl by johko in https://github.com/huggingface/transformers/pull/16242
* update jax version and re-enable some tests by patil-suraj in https://github.com/huggingface/transformers/pull/16254
* [Constrained Beam Search] Adding Notebook Example & Minor Typo Fix by cwkeam in https://github.com/huggingface/transformers/pull/16246
* value check for typical sampling by cimeister in https://github.com/huggingface/transformers/pull/16165
* Make Flax pt-flax equivalence test more aggressive by ydshieh in https://github.com/huggingface/transformers/pull/15841
* Aggressive PT/TF equivalence test on PT side by ydshieh in https://github.com/huggingface/transformers/pull/16250
* Update flaubert with TF decorator by Tegzes in https://github.com/huggingface/transformers/pull/16258
* Fix links in guides by stevhliu in https://github.com/huggingface/transformers/pull/16182
* Small fixes to the documentation by sgugger in https://github.com/huggingface/transformers/pull/16180
* [WIP] add `has_attentions` as done in PyTorch side by ydshieh in https://github.com/huggingface/transformers/pull/16259
* Make `add-new-model-like` work in an env without all frameworks by sgugger in https://github.com/huggingface/transformers/pull/16239
* Deberta v2 code simplification by guillaume-be in https://github.com/huggingface/transformers/pull/15732
* Add Slack notification support for doc tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/16253
* Framework split for Spanish version of doc quicktour.mdx by omarespejel in https://github.com/huggingface/transformers/pull/16215
* Removed the 'optional' string (in DETR post_process) by dinesh-GDK in https://github.com/huggingface/transformers/pull/16266
* Draft a guide with our code quirks for new models by sgugger in https://github.com/huggingface/transformers/pull/16237
* Fixed Error Raised Due to Wrongly Accessing Training Sample by aflah02 in https://github.com/huggingface/transformers/pull/16115
* Fix XGLM cross attention by patil-suraj in https://github.com/huggingface/transformers/pull/16290
* Fix a typo (add a coma) by PolarisRisingWar in https://github.com/huggingface/transformers/pull/16291
* Add type hints to xlnet by mowafess in https://github.com/huggingface/transformers/pull/16214
* Remove disclaimer from Longformer docs by gchhablani in https://github.com/huggingface/transformers/pull/16296
* Add argument "cache_dir" for transformers.onnx by happyXia in https://github.com/huggingface/transformers/pull/16284
* Add type hints transfoxl by jcmc00 in https://github.com/huggingface/transformers/pull/16267
* added type hints for BART model by robotjellyzone in https://github.com/huggingface/transformers/pull/16270
* ResNet & VAN: Fixed code sample tests by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16294
* GPT2 TensorFlow Type Hints by cakiki in https://github.com/huggingface/transformers/pull/16261
* Added type hints for PyTorch T5 model by yhl48 in https://github.com/huggingface/transformers/pull/16257
* Fix Marian conversion script by patil-suraj in https://github.com/huggingface/transformers/pull/16300
* [SegFormer] Remove unused attributes by NielsRogge in https://github.com/huggingface/transformers/pull/16285
* Update troubleshoot with more content by stevhliu in https://github.com/huggingface/transformers/pull/16243
* fix last element in hidden_states for XGLM by ydshieh in https://github.com/huggingface/transformers/pull/16301
* [FlaxGPTJ] Fix bug in rotary embeddings by patil-suraj in https://github.com/huggingface/transformers/pull/16298
* Add missing type hints for PyTorch Longformer models by johnnygreco in https://github.com/huggingface/transformers/pull/16244
* Fix Seq2SeqTrainingArguments docs by gchhablani in https://github.com/huggingface/transformers/pull/16295
* [xtreme-s] Update Minds14 results by anton-l in https://github.com/huggingface/transformers/pull/16241
* added type hints for blenderbot and blenderbot_small (v2) by IvanLauLinTiong in https://github.com/huggingface/transformers/pull/16307
* Update Makefile Phonies by gchhablani in https://github.com/huggingface/transformers/pull/16306
* TF - update (vision_)encoder_decoder past variable by gante in https://github.com/huggingface/transformers/pull/16260
* Add Flaubert OnnxConfig to Transformers by ChainYo in https://github.com/huggingface/transformers/pull/16279
* TFLongformer: Add missing type hints and unpack inputs decorator by johnnygreco in https://github.com/huggingface/transformers/pull/16228
* add xglm conversion script by patil-suraj in https://github.com/huggingface/transformers/pull/16305
* Fix bugs of s2t fairseq model converting by beomseok-lee in https://github.com/huggingface/transformers/pull/15593
* Add type hints for Pegasus model (PyTorch) by Tegzes in https://github.com/huggingface/transformers/pull/16324
* Funnel type hints by AMontgomerie in https://github.com/huggingface/transformers/pull/16323
* Add type hints for ProphetNet PyTorch by Tegzes in https://github.com/huggingface/transformers/pull/16272
* [GLPN] Improve docs by NielsRogge in https://github.com/huggingface/transformers/pull/16331
* Added type hints for Pytorch Marian calls by clefourrier in https://github.com/huggingface/transformers/pull/16200
* VAN: Code sample tests by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16340
* Add type annotations for Rembert/Splinter and copies by jacobdineen in https://github.com/huggingface/transformers/pull/16338
* [Bug template] Shift responsibilities for long-range by patrickvonplaten in https://github.com/huggingface/transformers/pull/16344
* Fix code repetition in serialization guide by osanseviero in https://github.com/huggingface/transformers/pull/16346
* Adopt framework-specific blocks for content by stevhliu in https://github.com/huggingface/transformers/pull/16342
* Updates the default branch from master to main by LysandreJik in https://github.com/huggingface/transformers/pull/16326
* [T5] Add t5 download script by patrickvonplaten in https://github.com/huggingface/transformers/pull/16328
* Reorganize file utils by sgugger in https://github.com/huggingface/transformers/pull/16264
* [FlaxBart] make sure no grads are computed an bias by patrickvonplaten in https://github.com/huggingface/transformers/pull/16345
* Trainer evaluation delay by OllieBroadhurst in https://github.com/huggingface/transformers/pull/16356
* Adding missing type hints for mBART model (TF) by reichenbch in https://github.com/huggingface/transformers/pull/16281
* Add type annotations of config for vision models by johnnv1 in https://github.com/huggingface/transformers/pull/16263
* TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 by gante in https://github.com/huggingface/transformers/pull/16332
* Swap inequalities by OllieBroadhurst in https://github.com/huggingface/transformers/pull/16368
* Make Transformers use cache files when hf.co is down by sgugger in https://github.com/huggingface/transformers/pull/16362
* Decision transformer gym by edbeeching in https://github.com/huggingface/transformers/pull/15845
* add GPT-J ONNX config to Transformers by ChainYo in https://github.com/huggingface/transformers/pull/16274
* Update docs/README.md by ydshieh in https://github.com/huggingface/transformers/pull/16333
* Make BigBird model compatiable to fp16 dtype. by xuzhao9 in https://github.com/huggingface/transformers/pull/16034
* [Doctests] Make roberta-like meaningfull by patrickvonplaten in https://github.com/huggingface/transformers/pull/16363
* [Doctests] Make TFRoberta-like meaningfull by ydshieh in https://github.com/huggingface/transformers/pull/16370
* Update readme with how to train offline and fix BPE command by ncoop57 in https://github.com/huggingface/transformers/pull/15897
* Fix BigBirdModelTester by ydshieh in https://github.com/huggingface/transformers/pull/16310
* Type hints and decorator for TF T5 by Dahlbomii in https://github.com/huggingface/transformers/pull/16376
* Add type hints for ConvBert model by simonzli in https://github.com/huggingface/transformers/pull/16377
* Update pt flax equivalence tests in pt by ydshieh in https://github.com/huggingface/transformers/pull/16280
* Bump cookiecutter version by ydshieh in https://github.com/huggingface/transformers/pull/16387
* Fix style by LysandreJik in https://github.com/huggingface/transformers/pull/16391
* Fix readme links and add CI check by sgugger in https://github.com/huggingface/transformers/pull/16392
* variable naming for Distilbert model by robotjellyzone in https://github.com/huggingface/transformers/pull/16384
* Added type hints by yhl48 in https://github.com/huggingface/transformers/pull/16389
* Rename semantic segmentation outputs by NielsRogge in https://github.com/huggingface/transformers/pull/15849
* Make FeaturesManager.get_model_from_feature a static method by michaelbenayoun in https://github.com/huggingface/transformers/pull/16357
* Big file_utils cleanup by sgugger in https://github.com/huggingface/transformers/pull/16396
* fixed typo from enable to disable in disable_progress_bar function by Gladiator07 in https://github.com/huggingface/transformers/pull/16406
* Rename master to main for notebooks links and leftovers by sgugger in https://github.com/huggingface/transformers/pull/16397
* TF PushToHubCallback fixes and updates by Rocketknight1 in https://github.com/huggingface/transformers/pull/16409
* Add ONNX support for Blenderbot and BlenderbotSmall by lewtun in https://github.com/huggingface/transformers/pull/15875
* [FlaxSpeechEncoderDecoder] Fix feature extractor gradient test by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16407
* Fix Typo in Argument of FlaxWav2Vec2ForPreTrainingModule by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16084
* Removed inputs_processing and replaced with decorator for lxmert by silvererudite in https://github.com/huggingface/transformers/pull/16414
* remove references to PDF reading via PIL by garfieldnate in https://github.com/huggingface/transformers/pull/15293
* Update comments in class BatchEncoding by basicv8vc in https://github.com/huggingface/transformers/pull/15932
* Fix broken links by kurianbenoy in https://github.com/huggingface/transformers/pull/16113
* `cached_download ∘ hf_hub_url` is `hf_hub_download` by julien-c in https://github.com/huggingface/transformers/pull/16375
* QDQBert example update by shangz-ai in https://github.com/huggingface/transformers/pull/16395
* [Flax] Improve Robustness of Back-Prop Tests by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16418
* Fix typo in language modeling example comment by dreamgonfly in https://github.com/huggingface/transformers/pull/16421
* Use doc builder styler by sgugger in https://github.com/huggingface/transformers/pull/16412
* Fix PerceiverMLP and test by jaesuny in https://github.com/huggingface/transformers/pull/16405
* [FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are **Not** Tied by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16444
* Translation from english to spanish of file pipeline_tutorial.mdx by FernandoLpz in https://github.com/huggingface/transformers/pull/16149
* Remove kwargs argument from IBERT MLM forward pass by lewtun in https://github.com/huggingface/transformers/pull/16449
* Fix blenderbot conversion script by patil-suraj in https://github.com/huggingface/transformers/pull/16472
* Adding DocTest to TrOCR by arnaudstiegler in https://github.com/huggingface/transformers/pull/16398
* [MNLI example] Prevent overwriting matched with mismatched metrics by eldarkurtic in https://github.com/huggingface/transformers/pull/16475
* Remove duplicate mLuke by stevhliu in https://github.com/huggingface/transformers/pull/16460
* Fix missing output_attentions in PT/Flax equivalence test by ydshieh in https://github.com/huggingface/transformers/pull/16271
* Fix some TF GPT-J CI testings by ydshieh in https://github.com/huggingface/transformers/pull/16454
* Fix example test and test_fetcher for examples by sgugger in https://github.com/huggingface/transformers/pull/16478
* fix wrong variable name by wesleyacheng in https://github.com/huggingface/transformers/pull/16467
* Add TF vision model code samples by ydshieh in https://github.com/huggingface/transformers/pull/16477
* missing trainer import by wesleyacheng in https://github.com/huggingface/transformers/pull/16469
* Add type hints for UniSpeech by Tegzes in https://github.com/huggingface/transformers/pull/16399
* TF: properly handle kwargs in encoder_decoder architectures by gante in https://github.com/huggingface/transformers/pull/16465
* added typehints for RAG pytorch models by akashe in https://github.com/huggingface/transformers/pull/16416
* Avoid accessing .dataset of a DataLoader in Trainer by sanderland in https://github.com/huggingface/transformers/pull/16451
* TF GPT2: clearer model variable naming with unpack_inputs by cakiki in https://github.com/huggingface/transformers/pull/16311
* Raise diff tolerance value for TFViTMAEModelTest by ydshieh in https://github.com/huggingface/transformers/pull/16483
* Do not initialize `torch.distributed` process group if one is already initailized by Yard1 in https://github.com/huggingface/transformers/pull/16487
* TF GPT-J Type hints and TF decorator by Dahlbomii in https://github.com/huggingface/transformers/pull/16488
* Nit: MCSCOCO -> MSCOCO by AdityaKane2001 in https://github.com/huggingface/transformers/pull/16481
* Add length to PreTrainedTokenizer train_new_from_iterator by dctelus in https://github.com/huggingface/transformers/pull/16493
* Add support for exporting GPT-J to ONNX-TRT by tomerip in https://github.com/huggingface/transformers/pull/16492
* TF: unpack inputs on Convbert, GPTJ, LED, and templates by gante in https://github.com/huggingface/transformers/pull/16491
* Feature Extractor accepts `segmentation_maps` by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15964
* [examples] max samples can't be bigger than the len of dataset by stas00 in https://github.com/huggingface/transformers/pull/16501
* update smddp api to v1.4.0 by roywei in https://github.com/huggingface/transformers/pull/16371
* Support reduce_bucket_size="auto" for deepspeed stages <3 by manuelciosici in https://github.com/huggingface/transformers/pull/16496
* Modeling Outputs by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16341
* make tuple annotation more specific to avoid failures during symbolic_trace by chenbohua3 in https://github.com/huggingface/transformers/pull/16490
* Spanish translation of the file multilingual.mdx by SimplyJuanjo in https://github.com/huggingface/transformers/pull/16329
* Translate installation.mdx to Spanish by lilianabs in https://github.com/huggingface/transformers/pull/16229
* Translate accelerate.mdx from english to spanish by Sangohe in https://github.com/huggingface/transformers/pull/16176
* [Typo][Example] Fixed a typo in `run_qa_no_trainer.py` by bhadreshpsavani in https://github.com/huggingface/transformers/pull/16508
* added type hints to xglm pytorch by mowafess in https://github.com/huggingface/transformers/pull/16500
* Fix syntax error in generate docstrings by sgugger in https://github.com/huggingface/transformers/pull/16516
* [research] link to the XTREME-S paper by anton-l in https://github.com/huggingface/transformers/pull/16519
* Fixed a typo in seq2seq_trainer.py by Agoniii in https://github.com/huggingface/transformers/pull/16531
* Add ONNX export for BeiT by akuma12 in https://github.com/huggingface/transformers/pull/16498
* call on_train_end when optuna trial is pruned by fschlatt in https://github.com/huggingface/transformers/pull/16536
* Type hints added to OpenAIGPT by Dahlbomii in https://github.com/huggingface/transformers/pull/16529
* Fix Bart type hints by gchhablani in https://github.com/huggingface/transformers/pull/16297
* Add VisualBert type hints by gchhablani in https://github.com/huggingface/transformers/pull/16544
* Adding missing type hints for mBART model (PyTorch) by reichenbch in https://github.com/huggingface/transformers/pull/16429
* Remove MBart subclass of XLMRoberta in tokenzier docs by gchhablani in https://github.com/huggingface/transformers/pull/16546
* Use random_attention_mask for TF tests by ydshieh in https://github.com/huggingface/transformers/pull/16517
* [GLPN] Improve code example by NielsRogge in https://github.com/huggingface/transformers/pull/16450
* Pin tokenizers version <0.13 by LysandreJik in https://github.com/huggingface/transformers/pull/16539
* add code samples for TF speech models by ydshieh in https://github.com/huggingface/transformers/pull/16494
* [FlaxSpeechEncoderDecoder] Fix dtype bug by patrickvonplaten in https://github.com/huggingface/transformers/pull/16581
* Making the impossible to connect error actually report the right URL. by Narsil in https://github.com/huggingface/transformers/pull/16446
* Fix flax import in `__init__.py`: `modeling_xglm -> modeling_flax_xglm` by stancld in https://github.com/huggingface/transformers/pull/16556
* Add utility to find model labels by sgugger in https://github.com/huggingface/transformers/pull/16526
* Enable doc in Spanish by sgugger in https://github.com/huggingface/transformers/pull/16518
* Add use_auth to load_datasets for private datasets to PT and TF examples by KMFODA in https://github.com/huggingface/transformers/pull/16521
* add a test checking the format of `convert_tokens_to_string`'s output by SaulLu in https://github.com/huggingface/transformers/pull/16540
* TF: Finalize `unpack_inputs`-related changes by gante in https://github.com/huggingface/transformers/pull/16499
* [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16586
* initialize the default rank set on TrainerState by andrescodas in https://github.com/huggingface/transformers/pull/16530
* Fix CI: test_inference_for_pretraining in ViTMAEModelTest by ydshieh in https://github.com/huggingface/transformers/pull/16591
* add a template to add missing tokenization test by SaulLu in https://github.com/huggingface/transformers/pull/16553
* PretrainedModel: made `_load_pretrained_model_low_mem` static + bug fix by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16548
* handle torch_dtype in low cpu mem usage by patil-suraj in https://github.com/huggingface/transformers/pull/16580
* [Doctests] Correct filenaming by patrickvonplaten in https://github.com/huggingface/transformers/pull/16599
* Adding new train_step logic to make things less confusing for users by Rocketknight1 in https://github.com/huggingface/transformers/pull/15994
* Adding missing type hints for BigBird model by reichenbch in https://github.com/huggingface/transformers/pull/16555
* [deepspeed] fix typo, adjust config name by stas00 in https://github.com/huggingface/transformers/pull/16597
* Add global_attention_mask to gen_kwargs in Seq2SeqTrainer.prediction_step by JohnGiorgi in https://github.com/huggingface/transformers/pull/16485
* [benchmark tool] trainer-benchmark.py by stas00 in https://github.com/huggingface/transformers/pull/14934
* Update summary of the tasks by stevhliu in https://github.com/huggingface/transformers/pull/16528
* added type hints to CTRL pytorch by anmolsjoshi in https://github.com/huggingface/transformers/pull/16593
* fix default num_attention_heads in segformer doc by JunMa11 in https://github.com/huggingface/transformers/pull/16612
* [Docs] Correct quicktour minds14 dataset by patrickvonplaten in https://github.com/huggingface/transformers/pull/16626
* Fix seq2seq doc tests by patil-suraj in https://github.com/huggingface/transformers/pull/16606
* don't load state_dict twice when using low_cpu_mem_usage in from_pretrained by patil-suraj in https://github.com/huggingface/transformers/pull/16602
* Use CLIP model config to set some kwargs for components by ydshieh in https://github.com/huggingface/transformers/pull/16609
* [modeling_utils] typo by stas00 in https://github.com/huggingface/transformers/pull/16621
* [Speech2Text Doc] Fix docs by patrickvonplaten in https://github.com/huggingface/transformers/pull/16611
* [FlaxSpeechEncoderDecoderModel] More Rigorous PT-Flax Equivalence Tests by sanchit-gandhi in https://github.com/huggingface/transformers/pull/16589
* Fix TFTransfoXLLMHeadModel outputs by ydshieh in https://github.com/huggingface/transformers/pull/16590

Impressive community contributors

The community contributors below have significantly contributed to the v4.18.0 release. Thank you!

sayakpaul, for contributing the TensorFlow version of ViTMAE
stancld, for contributing the TensorFlow version of of GPT-J

New Contributors
* Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727
* jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428
* ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432
* peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423
* bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480
* AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582
* thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403
* davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473
* sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519
* arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084
* cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504
* cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416
* Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831
* derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614
* tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636
* muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638
* versae made their first contribution in https://github.com/huggingface/transformers/pull/15590
* jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617
* arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413
* FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657
* coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680
* heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531
* gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702
* SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741
* Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740
* dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644
* lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468
* pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776
* sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750
* rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877
* rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884
* cosmoquester made their first contribution in https://github.com/huggingface/transformers/pull/15913
* konstantinjdobler made their first contribution in https://github.com/huggingface/transformers/pull/15951
* yhavinga made their first contribution in https://github.com/huggingface/transformers/pull/15963
* dlwh made their first contribution in https://github.com/huggingface/transformers/pull/15961
* basilevh made their first contribution in https://github.com/huggingface/transformers/pull/15972
* andstor made their first contribution in https://github.com/huggingface/transformers/pull/16033
* davidsbatista made their first contribution in https://github.com/huggingface/transformers/pull/16063
* feifang24 made their first contribution in https://github.com/huggingface/transformers/pull/16065
* kevinpl07 made their first contribution in https://github.com/huggingface/transformers/pull/15245
* johnnv1 made their first contribution in https://github.com/huggingface/transformers/pull/16088
* Abdelrhman-Hosny made their first contribution in https://github.com/huggingface/transformers/pull/16097
* p-mishra1 made their first contribution in https://github.com/huggingface/transformers/pull/16099
* jbrry made their first contribution in https://github.com/huggingface/transformers/pull/16108
* jorgtied made their first contribution in https://github.com/huggingface/transformers/pull/16124
* vumichien made their first contribution in https://github.com/huggingface/transformers/pull/16110
* merveenoyan made their first contribution in https://github.com/huggingface/transformers/pull/16138
* yharyarias made their first contribution in https://github.com/huggingface/transformers/pull/16047
* bhavika made their first contribution in https://github.com/huggingface/transformers/pull/16129
* PepijnBoers made their first contribution in https://github.com/huggingface/transformers/pull/16107
* soomiles made their first contribution in https://github.com/huggingface/transformers/pull/16121
* Tegzes made their first contribution in https://github.com/huggingface/transformers/pull/16126
* jacobdineen made their first contribution in https://github.com/huggingface/transformers/pull/16106
* wpan03 made their first contribution in https://github.com/huggingface/transformers/pull/16123
* infinite-Joy made their first contribution in https://github.com/huggingface/transformers/pull/16147
* marxav made their first contribution in https://github.com/huggingface/transformers/pull/16132
* Duedme made their first contribution in https://github.com/huggingface/transformers/pull/16158
* MarkusSagen made their first contribution in https://github.com/huggingface/transformers/pull/16087
* mowafess made their first contribution in https://github.com/huggingface/transformers/pull/16163
* jcmc00 made their first contribution in https://github.com/huggingface/transformers/pull/16174
* utkusaglm made their first contribution in https://github.com/huggingface/transformers/pull/16178
* johko made their first contribution in https://github.com/huggingface/transformers/pull/16181
* johnryan465 made their first contribution in https://github.com/huggingface/transformers/pull/16090
* daysm made their first contribution in https://github.com/huggingface/transformers/pull/16208
* forsc made their first contribution in https://github.com/huggingface/transformers/pull/16212
* Sophylax made their first contribution in https://github.com/huggingface/transformers/pull/16227
* function2-llx made their first contribution in https://github.com/huggingface/transformers/pull/15795
* ktzsh made their first contribution in https://github.com/huggingface/transformers/pull/16131
* louisowen6 made their first contribution in https://github.com/huggingface/transformers/pull/16247
* omarespejel made their first contribution in https://github.com/huggingface/transformers/pull/16215
* dinesh-GDK made their first contribution in https://github.com/huggingface/transformers/pull/16266
* aflah02 made their first contribution in https://github.com/huggingface/transformers/pull/16115
* PolarisRisingWar made their first contribution in https://github.com/huggingface/transformers/pull/16291
* happyXia made their first contribution in https://github.com/huggingface/transformers/pull/16284
* robotjellyzone made their first contribution in https://github.com/huggingface/transformers/pull/16270
* yhl48 made their first contribution in https://github.com/huggingface/transformers/pull/16257
* johnnygreco made their first contribution in https://github.com/huggingface/transformers/pull/16244
* IvanLauLinTiong made their first contribution in https://github.com/huggingface/transformers/pull/16307
* beomseok-lee made their first contribution in https://github.com/huggingface/transformers/pull/15593
* clefourrier made their first contribution in https://github.com/huggingface/transformers/pull/16200
* OllieBroadhurst made their first contribution in https://github.com/huggingface/transformers/pull/16356
* reichenbch made their first contribution in https://github.com/huggingface/transformers/pull/16281
* edbeeching made their first contribution in https://github.com/huggingface/transformers/pull/15845
* xuzhao9 made their first contribution in https://github.com/huggingface/transformers/pull/16034
* Dahlbomii made their first contribution in https://github.com/huggingface/transformers/pull/16376
* simonzli made their first contribution in https://github.com/huggingface/transformers/pull/16377
* Gladiator07 made their first contribution in https://github.com/huggingface/transformers/pull/16406
* silvererudite made their first contribution in https://github.com/huggingface/transformers/pull/16414
* garfieldnate made their first contribution in https://github.com/huggingface/transformers/pull/15293
* basicv8vc made their first contribution in https://github.com/huggingface/transformers/pull/15932
* kurianbenoy made their first contribution in https://github.com/huggingface/transformers/pull/16113
* jaesuny made their first contribution in https://github.com/huggingface/transformers/pull/16405
* FernandoLpz made their first contribution in https://github.com/huggingface/transformers/pull/16149
* arnaudstiegler made their first contribution in https://github.com/huggingface/transformers/pull/16398
* wesleyacheng made their first contribution in https://github.com/huggingface/transformers/pull/16467
* akashe made their first contribution in https://github.com/huggingface/transformers/pull/16416
* sanderland made their first contribution in https://github.com/huggingface/transformers/pull/16451
* AdityaKane2001 made their first contribution in https://github.com/huggingface/transformers/pull/16481
* dctelus made their first contribution in https://github.com/huggingface/transformers/pull/16493
* tomerip made their first contribution in https://github.com/huggingface/transformers/pull/16492
* roywei made their first contribution in https://github.com/huggingface/transformers/pull/16371
* chenbohua3 made their first contribution in https://github.com/huggingface/transformers/pull/16490
* SimplyJuanjo made their first contribution in https://github.com/huggingface/transformers/pull/16329
* lilianabs made their first contribution in https://github.com/huggingface/transformers/pull/16229
* Sangohe made their first contribution in https://github.com/huggingface/transformers/pull/16176
* Agoniii made their first contribution in https://github.com/huggingface/transformers/pull/16531
* akuma12 made their first contribution in https://github.com/huggingface/transformers/pull/16498
* fschlatt made their first contribution in https://github.com/huggingface/transformers/pull/16536
* KMFODA made their first contribution in https://github.com/huggingface/transformers/pull/16521
* andrescodas made their first contribution in https://github.com/huggingface/transformers/pull/16530
* JohnGiorgi made their first contribution in https://github.com/huggingface/transformers/pull/16485
* JunMa11 made their first contribution in https://github.com/huggingface/transformers/pull/16612

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.17.0...v4.18.0

4.17.0

Not secure
New models

XGLM

The XGLM model was proposed in [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

XGLM is a GPT3-like multilingual model trained on a balanced corpus covering a diverse set of languages.

* Add XGLM models by patil-suraj in https://github.com/huggingface/transformers/pull/14876

ConvNext

The ConvNeXT model was proposed in [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

* Add ConvNeXT by NielsRogge in https://github.com/huggingface/transformers/pull/15277
* Add TFConvNextModel by sayakpaul in https://github.com/huggingface/transformers/pull/15750

PoolFormer

The PoolFormer model was proposed in [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Sea AI Labs.

* Add PoolFormer by heytanay in https://github.com/huggingface/transformers/pull/15531

PLBart

The PLBART model was proposed in [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained model plbart-base has been trained using multilingual denoising task on Java, Python and English.

* Add PLBart by gchhablani in https://github.com/huggingface/transformers/pull/13269
* Add missing PLBart entry in README by gchhablani in https://github.com/huggingface/transformers/pull/15721

Data2Vec

The Data2Vec model was proposed in data2vec: [A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/pdf/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.

Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

* Add Data2Vec by edugp in https://github.com/huggingface/transformers/pull/15507

Maskformer

The MaskFormer model was proposed in [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

MaskFormer addresses semantic segmentation with a mask classification paradigm instead of performing classic pixel-level classification.

* Maskformer by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15682

Code in the Hub

This is a new experimental feature added to the library. It allows you to share a custom model (with configuration, tokenizer, feature extractor, processor) with anyone through the [Model Hub](https://huggingface.co/models) while still using the Auto-classes API of the Transformers library.

See the [documentation](https://huggingface.co/docs/transformers/custom_models) for more information!

* Allow relative imports in dynamic code by sgugger in https://github.com/huggingface/transformers/pull/15352
* Save code of registered custom models by sgugger in https://github.com/huggingface/transformers/pull/15379

Documentation

We are working on updating the existing guides in the documentation, and writing more!

* Update model share tutorial by stevhliu in https://github.com/huggingface/transformers/pull/15288
* Get started docs by stevhliu in https://github.com/huggingface/transformers/pull/15098
* Update fine-tune docs by stevhliu in https://github.com/huggingface/transformers/pull/15259
* Update tutorial docs by stevhliu in https://github.com/huggingface/transformers/pull/15165
* Create a custom model guide by stevhliu in https://github.com/huggingface/transformers/pull/15489
* 🧼 NLP task guides by stevhliu in https://github.com/huggingface/transformers/pull/15731
* Inference for multilingual models by stevhliu in https://github.com/huggingface/transformers/pull/15836

Time Stamps for Speech models

Speech models that have been trained with the CTC loss (Wav2Vec2, XLS-R, HuBERT, WavLM, ...) can now output the time
stamp in addition to the transcription of the input audio. E.g. one can retrieve the start and end time for every transcribed word
via the `Wav2Vec2CTCTokenizer.decode` method or the `Wav2Vec2ProcessorWithLM.decoder` method. See the documentation [here](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2CTCTokenizer.batch_decode) and [here](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2CTCTokenizer.batch_decode) respectively.

This feature can also be directly used via the ASR pipeline - see [here](https://huggingface.co/docs/transformers/v4.17.0/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline.__call__.return_timestamps) and [this](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2CTCTokenizer.batch_decode) example.

* Add time stamps for wav2vec2 with lm by patrickvonplaten in https://github.com/huggingface/transformers/pull/15854
* Adding timestamps for CTC with LM in ASR pipeline. by Narsil in https://github.com/huggingface/transformers/pull/15863
* Adding the option to return_timestamps on pure CTC ASR models. by Narsil in https://github.com/huggingface/transformers/pull/15792
* Time stamps for CTC models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15687

Breaking change

Unfortunately, some bugs had crept into `CLIPTokenizerFast` : the tokenization produced by `CLIPTokenizer` and `CLIPTokenizerFast` were not equal. `CLIPTokenizerFast` has been corrected to encode the text with the same strategy as `CLIPTokenizer`.

What does this mean for you ? You need to use the tokenizer that was used to train the CLIP template you are using. For example:
- Case 1 : you use `openai/clip-vit-base-patch32`, `openai/clip-vit-base-patch16` or `openai/clip-vit-large-patch14` , before v4.17.0 the good version of the tokenizer was `CLIPTokenizer`. From v4.17.0, you can use both `CLIPTokenizer` and `CLIPTokenizerFast`.
- Case 2 : you have trained your own CLIP model using `CLIPTokenizerFast`. Your tokenizer is no longer a `CLIPTokenizerFast` and we recommend you to load your `tokenizer.json` in a `PreTrainedTokenizerFast` directly or to continue to use a version prior to v4.17.0.
- Case 3: you have trained your own CLIP model using `CLIPTokenizer`. Now, you can produce a fast equivalent of your tokenizer by doing `CLIPTokenizerFast.from_pretrained("Path to local folder or Hub repo with slow tokenizer files", from_slow=True)`.

To make `CLIPTokenizerFast` identical to `CLIPTokenizer`, the template of the tokenization of a sentence pair `(A,B)` has been modified. The previous template was `<|startoftext|> A B <|endoftext|>` and the new one is `<|startoftext|> A <|endoftext|> <|endoftext|> B <|endoftext|>`.

What's Changed

* Fix tests_fetcher by sgugger in https://github.com/huggingface/transformers/pull/15376
* Fix code format for Accelerate doc by stevhliu in https://github.com/huggingface/transformers/pull/15335
* Add init to BORT by LysandreJik in https://github.com/huggingface/transformers/pull/15378
* Set syncfree AdamW as the default optimizer for xla:gpu device in amp mode by ymwangg in https://github.com/huggingface/transformers/pull/15361
* Fixing support `batch_size` and `num_return_Sequences` in `text-generation` pipeline by Narsil in https://github.com/huggingface/transformers/pull/15318
* Fix `bad_words_ids` not working with sentencepiece-based tokenizers by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15343
* [docs] fix wrong file name in `pr_check` by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15380
* Prepare deprecated ONNX exporter for torch v1.11 by lewtun in https://github.com/huggingface/transformers/pull/15388
* [Fix doc example] FlaxMarianPreTrainedModel by ydshieh in https://github.com/huggingface/transformers/pull/15391
* Make links explicit by Rocketknight1 in https://github.com/huggingface/transformers/pull/15395
* [deepspeed] saving checkpoint fallback when fp16 weights aren't saved by stas00 in https://github.com/huggingface/transformers/pull/14948
* Fix missing eps arg for LayerNorm in ElectraGeneratorPredictions by ydshieh in https://github.com/huggingface/transformers/pull/15332
* Use argument for preprocessing workers in run_summairzation by sgugger in https://github.com/huggingface/transformers/pull/15394
* Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py by Soonhwan-Kwon in https://github.com/huggingface/transformers/pull/13727
* Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel by ydshieh in https://github.com/huggingface/transformers/pull/15298
* [XGLMTokenizer] fix init and add in AutoTokenizer by patil-suraj in https://github.com/huggingface/transformers/pull/15406
* Add SegformerFeatureExtractor to Auto API by NielsRogge in https://github.com/huggingface/transformers/pull/15410
* Fix additional DataTrainingArguments documentation by FremyCompany in https://github.com/huggingface/transformers/pull/15408
* Add (M)Luke model training for Token Classification in the examples by jplu in https://github.com/huggingface/transformers/pull/14880
* Update README.md by kamalkraj in https://github.com/huggingface/transformers/pull/15430
* [Robust Speech Challenge] Add missing LR parameter by jonatasgrosman in https://github.com/huggingface/transformers/pull/15428
* [XGLM] fix gradient checkpointing by patil-suraj in https://github.com/huggingface/transformers/pull/15427
* [Hotfix] Fix Swin model outputs by NielsRogge in https://github.com/huggingface/transformers/pull/15414
* add t5 ner finetuning by ToluClassics in https://github.com/huggingface/transformers/pull/15432
* Add doc for add-new-model-like command by sgugger in https://github.com/huggingface/transformers/pull/15433
* [Swin] Add missing header by NielsRogge in https://github.com/huggingface/transformers/pull/15434
* [deepspeed doc] fix import, extra notes by stas00 in https://github.com/huggingface/transformers/pull/15400
* Fix loss calculation in TFXXXForTokenClassification models by ydshieh in https://github.com/huggingface/transformers/pull/15294
* Fix spurious warning in TF TokenClassification models by Rocketknight1 in https://github.com/huggingface/transformers/pull/15435
* Change REALM checkpoint to new ones by sgugger in https://github.com/huggingface/transformers/pull/15439
* [Trainer] suppress warning for length-related columns by patrickvonplaten in https://github.com/huggingface/transformers/pull/15421
* [examples/Flax] add a section about GPUs by patil-suraj in https://github.com/huggingface/transformers/pull/15198
* Fix TFLEDModel by ydshieh in https://github.com/huggingface/transformers/pull/15356
* [XGLMTokenizer] correct positional emb size by patil-suraj in https://github.com/huggingface/transformers/pull/15441
* [RobertaTokenizer] remove inheritance on GPT2Tokenizer by patil-suraj in https://github.com/huggingface/transformers/pull/15429
* Misfiring tf warnings by Rocketknight1 in https://github.com/huggingface/transformers/pull/15442
* Add 'with torch.no_grad()' to BEiT integration test forward passes by itsTurner in https://github.com/huggingface/transformers/pull/14961
* Update modeling_wav2vec2.py by peregilk in https://github.com/huggingface/transformers/pull/15423
* Error when group_by_length is used with an IterableDataset by sgugger in https://github.com/huggingface/transformers/pull/15437
* skip large generations pipeline test for XGLM by patil-suraj in https://github.com/huggingface/transformers/pull/15445
* [generate] fix synced_gpus default by stas00 in https://github.com/huggingface/transformers/pull/15446
* Remove "inputs" in tf common test script (no longer required) by ydshieh in https://github.com/huggingface/transformers/pull/15262
* Fix TF Causal LM models' returned logits by ydshieh in https://github.com/huggingface/transformers/pull/15256
* fix from_vision_text_pretrained doc example by ydshieh in https://github.com/huggingface/transformers/pull/15453
* [M2M100, XGLM] fix positional emb resize by patil-suraj in https://github.com/huggingface/transformers/pull/15444
* Update README.md by kamalkraj in https://github.com/huggingface/transformers/pull/15462
* replace assert with exception for padding_side arg in `PreTrainedTokenizerBase` `__init__` by SaulLu in https://github.com/huggingface/transformers/pull/15454
* fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available by SaulLu in https://github.com/huggingface/transformers/pull/15319
* use mean instead of elementwise_mean in XLMPredLayer by ydshieh in https://github.com/huggingface/transformers/pull/15436
* [BartTokenizer] remove inheritance on RobertaTokenizer by patil-suraj in https://github.com/huggingface/transformers/pull/15461
* `Trainer.push_to_hub` always tries to push to the Hub by sgugger in https://github.com/huggingface/transformers/pull/15463
* Harder check for IndexErrors in QA scripts by sgugger in https://github.com/huggingface/transformers/pull/15438
* Add option to resize like torchvision's Resize by NielsRogge in https://github.com/huggingface/transformers/pull/15419
* [Wav2Vec2ProcessorWithLM] add alpha & beta to batch decode & decode by patrickvonplaten in https://github.com/huggingface/transformers/pull/15465
* Adding support for `microphone` streaming within pipeline. by Narsil in https://github.com/huggingface/transformers/pull/15046
* fix error posted in issue 15448 by bugface in https://github.com/huggingface/transformers/pull/15480
* Fic docstring of ASR pipeline by sgugger in https://github.com/huggingface/transformers/pull/15481
* Add W&B backend for hyperparameter sweep by AyushExel in https://github.com/huggingface/transformers/pull/14582
* Fix labels stored in model config for token classification examples by sgugger in https://github.com/huggingface/transformers/pull/15482
* fix set truncation attribute in `__init__` of `PreTrainedTokenizerBase` by SaulLu in https://github.com/huggingface/transformers/pull/15456
* Correct eos_token_id settings in generate by thinksoso in https://github.com/huggingface/transformers/pull/15403
* fix TFMarianMTModel output by ydshieh in https://github.com/huggingface/transformers/pull/15494
* Cleanup load_weight_prefix in TFEncoderDecoderModel by ydshieh in https://github.com/huggingface/transformers/pull/15101
* [Flax tests] Disable scheduled GPU tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15503
* Add general vision docstrings by NielsRogge in https://github.com/huggingface/transformers/pull/15501
* [deepspeed] fix a bug in a test by stas00 in https://github.com/huggingface/transformers/pull/15493
* Add preprocess_logits_for_metrics Trainer param by davidleonfdez in https://github.com/huggingface/transformers/pull/15473
* [deepspeed docs] memory requirements by stas00 in https://github.com/huggingface/transformers/pull/15506
* Remove loss from some flax models docs & examples by ydshieh in https://github.com/huggingface/transformers/pull/15492
* Fix TFElectraForMultipleChoice by ydshieh in https://github.com/huggingface/transformers/pull/15509
* Handle PyTorch to Flax conversion of 1D convolutions by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15519
* Fix TFRemBertEncoder all_hidden_states by ydshieh in https://github.com/huggingface/transformers/pull/15510
* [parallelism docs] Megatron-Deepspeed info by stas00 in https://github.com/huggingface/transformers/pull/15488
* Standardize semantic segmentation models outputs by sgugger in https://github.com/huggingface/transformers/pull/15469
* [deepspeed docs] DeepSpeed ZeRO Inference by stas00 in https://github.com/huggingface/transformers/pull/15486
* Revert "Handle PyTorch to Flax conversion of 1D convolutions" by patrickvonplaten in https://github.com/huggingface/transformers/pull/15540
* [ASR pipeline] correct asr pipeline for seq2seq models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15541
* [torch_int_div] Correct true division in generation by patrickvonplaten in https://github.com/huggingface/transformers/pull/15498
* [Trainer] Deeper length checks for IterableDatasetShard by anton-l in https://github.com/huggingface/transformers/pull/15539
* Add ASR CTC streaming example by anton-l in https://github.com/huggingface/transformers/pull/15309
* Wav2Vec2 models must either throw or deal with add_apater by FremyCompany in https://github.com/huggingface/transformers/pull/15409
* Remove Longformers from ONNX-supported models by lewtun in https://github.com/huggingface/transformers/pull/15273
* Fix TF T5/LED missing cross attn in retrun values by ydshieh in https://github.com/huggingface/transformers/pull/15511
* Make TF Wav2Vec2 outputs the same as PT's version by ydshieh in https://github.com/huggingface/transformers/pull/15530
* FX tracing improvement by michaelbenayoun in https://github.com/huggingface/transformers/pull/14321
* electra is added to onnx supported model by arron1227 in https://github.com/huggingface/transformers/pull/15084
* [GPTJ] fix docs by patil-suraj in https://github.com/huggingface/transformers/pull/15558
* Force use_cache to be False in PyTorch by ydshieh in https://github.com/huggingface/transformers/pull/15385
* Add TFSpeech2Text by gante in https://github.com/huggingface/transformers/pull/15113
* feat(flax): allow encoder_outputs in generate by borisdayma in https://github.com/huggingface/transformers/pull/15554
* Add codecarbon callback to docs by nateraw in https://github.com/huggingface/transformers/pull/15563
* [Flax tests] fix test_model_outputs_equivalence by patil-suraj in https://github.com/huggingface/transformers/pull/15571
* logger.warn --> logger.warning by ydshieh in https://github.com/huggingface/transformers/pull/15572
* PoC for a ProcessorMixin class by sgugger in https://github.com/huggingface/transformers/pull/15549
* add model scaling section by lvwerra in https://github.com/huggingface/transformers/pull/15119
* Upgrade black to version ~=22.0 by LysandreJik in https://github.com/huggingface/transformers/pull/15565
* Make sure custom configs work with Transformers by sgugger in https://github.com/huggingface/transformers/pull/15569
* Add Wav2Vec2 Adapter Weights to Flax by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15566
* Click new version by LysandreJik in https://github.com/huggingface/transformers/pull/15579
* [Flax tests/FlaxBert] make from_pretrained test faster by patil-suraj in https://github.com/huggingface/transformers/pull/15561
* Add implementation of typical sampling by cimeister in https://github.com/huggingface/transformers/pull/15504
* Constrained Beam Search [without disjunctive decoding] by cwkeam in https://github.com/huggingface/transformers/pull/15416
* Fix tests hub failure by sgugger in https://github.com/huggingface/transformers/pull/15580
* update serving_output for some TF models by ydshieh in https://github.com/huggingface/transformers/pull/15568
* [trainer docs] document how to select specific gpus by stas00 in https://github.com/huggingface/transformers/pull/15551
* [ViTMAE] Add link to script by NielsRogge in https://github.com/huggingface/transformers/pull/15588
* Expand tutorial for custom models by sgugger in https://github.com/huggingface/transformers/pull/15587
* Add Tensorflow handling of ONNX conversion by Albertobegue in https://github.com/huggingface/transformers/pull/13831
* Add example batch size to all commands by patrickvonplaten in https://github.com/huggingface/transformers/pull/15596
* Compute loss independent from decoder for TF EncDec models (as 14139) by ydshieh in https://github.com/huggingface/transformers/pull/15175
* Fix Seq2SeqTrainer for VisionEncoderDecoderModel by NielsRogge in https://github.com/huggingface/transformers/pull/15603
* Add local and TensorFlow ONNX export examples to docs by lewtun in https://github.com/huggingface/transformers/pull/15604
* [deepspeed docs] Correct JSON format by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15600
* Small clean up generate by patrickvonplaten in https://github.com/huggingface/transformers/pull/15611
* Mark "code in the Hub" API as experimental by sgugger in https://github.com/huggingface/transformers/pull/15624
* Enable ONNX export when PyTorch and TensorFlow installed in the same env by lewtun in https://github.com/huggingface/transformers/pull/15625
* TF: Add informative warning for inexistent CPU backprop ops by gante in https://github.com/huggingface/transformers/pull/15612
* Add aws studio notebooks by mishig25 in https://github.com/huggingface/transformers/pull/15606
* TF MT5 embeddings resize by gante in https://github.com/huggingface/transformers/pull/15567
* Fix broken link in CTRL docs by stevhliu in https://github.com/huggingface/transformers/pull/15615
* Fix _configuration_file argument getting passed to model by sgugger in https://github.com/huggingface/transformers/pull/15629
* [deepspeed docs] misc additions by stas00 in https://github.com/huggingface/transformers/pull/15585
* [research_projects] deal with security alerts by stas00 in https://github.com/huggingface/transformers/pull/15594
* Custom feature extractor by sgugger in https://github.com/huggingface/transformers/pull/15630
* Fix grammar in tokenizer_summary docs by derenrich in https://github.com/huggingface/transformers/pull/15614
* Add push to hub to feature extractor by sgugger in https://github.com/huggingface/transformers/pull/15632
* [Fix doc example] FlaxVisionEncoderDecoder by ydshieh in https://github.com/huggingface/transformers/pull/15626
* Fix a bug that QuestionAnsweringPipeline ignores max_seq_len parameter by wptoux in https://github.com/huggingface/transformers/pull/15238
* Report only the failed imports in `requires_backends` by tkukurin in https://github.com/huggingface/transformers/pull/15636
* Make Swin work with VisionEncoderDecoderModel by NielsRogge in https://github.com/huggingface/transformers/pull/15527
* Remove redundant error logging in from_pretrained() method by lewtun in https://github.com/huggingface/transformers/pull/15631
* Register feature extractor by sgugger in https://github.com/huggingface/transformers/pull/15634
* fix bug for the log of RNG states are not properly loaded lead to exception. by muzhi1991 in https://github.com/huggingface/transformers/pull/15638
* [SpeechEncoderDecoder] Make sure no EOS is generated in test by patrickvonplaten in https://github.com/huggingface/transformers/pull/15655
* Require `tokenizers>=0.11.1` by aphedges in https://github.com/huggingface/transformers/pull/15266
* Fix ASR pipelines from local directories with wav2vec models that have language models attached by versae in https://github.com/huggingface/transformers/pull/15590
* Fix typo in speech2text2 doc by jonrbates in https://github.com/huggingface/transformers/pull/15617
* Allow custom code for Processors by sgugger in https://github.com/huggingface/transformers/pull/15649
* add scores to Wav2Vec2WithLMOutput by arampacha in https://github.com/huggingface/transformers/pull/15413
* Update bad_words_ids usage by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15641
* Updated the RAG training with latest Pytorch Lightning library and the RAY by shamanez in https://github.com/huggingface/transformers/pull/15653
* Add section about doc testing by patrickvonplaten in https://github.com/huggingface/transformers/pull/15659
* add a network debug script and document it by stas00 in https://github.com/huggingface/transformers/pull/15652
* Re-export `KeyDataset`. by Narsil in https://github.com/huggingface/transformers/pull/15645
* Add `decoder_kwargs` to send to LM on asr pipeline. by Narsil in https://github.com/huggingface/transformers/pull/15646
* TF generate refactor - Greedy Search by patrickvonplaten in https://github.com/huggingface/transformers/pull/15562
* [pipeline doc] fix api by stas00 in https://github.com/huggingface/transformers/pull/15660
* Fix TFSequenceSummary's activation by ydshieh in https://github.com/huggingface/transformers/pull/15643
* Fix model equivalence tests by LysandreJik in https://github.com/huggingface/transformers/pull/15670
* Fix vit test by LysandreJik in https://github.com/huggingface/transformers/pull/15671
* Add a missing space in a deprecation message by bryant1410 in https://github.com/huggingface/transformers/pull/15651
* [t5/t0/mt5 models] faster/leaner custom layer norm by stas00 in https://github.com/huggingface/transformers/pull/14656
* Add push_to_hub method to processors by sgugger in https://github.com/huggingface/transformers/pull/15668
* Usage examples for logger by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15657
* Fix dec_attn_mask in TFTransfoXLMainLayer by ydshieh in https://github.com/huggingface/transformers/pull/15665
* 🔥 Remove build_doc_test github action by coyotte508 in https://github.com/huggingface/transformers/pull/15680
* Add register method to AutoProcessor by sgugger in https://github.com/huggingface/transformers/pull/15669
* [Wav2Vec2ProcessorWithLM] Fix auto processor with lm by patrickvonplaten in https://github.com/huggingface/transformers/pull/15683
* Fix Funnel configuration doc by ydshieh in https://github.com/huggingface/transformers/pull/15686
* Implementation of activations as pytorch modules by eldarkurtic in https://github.com/huggingface/transformers/pull/15616
* Add image classification notebook by NielsRogge in https://github.com/huggingface/transformers/pull/15667
* Minor fix on README.md by ydshieh in https://github.com/huggingface/transformers/pull/15688
* Fix shape by gchhablani in https://github.com/huggingface/transformers/pull/15696
* Add SimMIM by NielsRogge in https://github.com/huggingface/transformers/pull/15586
* Adding a model, more doc for pushing to the hub by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15690
* fix CLIP fast tokenizer and change some properties of the slow version by SaulLu in https://github.com/huggingface/transformers/pull/15067
* Fix SiluActivation by sgugger in https://github.com/huggingface/transformers/pull/15718
* Add initializer_std to TFFunnelModelTester with a default value 0.02 by ydshieh in https://github.com/huggingface/transformers/pull/15684
* Fix DETR model deprecation warnings for int div by gautierdag in https://github.com/huggingface/transformers/pull/15702
* Fix LongformerModel hidden states by ydshieh in https://github.com/huggingface/transformers/pull/15537
* style_doc handles decorators in examples by sgugger in https://github.com/huggingface/transformers/pull/15719
* Fix auto model tests by LysandreJik in https://github.com/huggingface/transformers/pull/15706
* Fix `HfDeepSpeedConfig` argument in `Trainer` by jaketae in https://github.com/huggingface/transformers/pull/15711
* fix bug in PT speech-encoder-decoder by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15699
* Fix undoing preprocessing step in summarization example by SSardorf in https://github.com/huggingface/transformers/pull/15741
* Fix minor comment typos by Crabzmatic in https://github.com/huggingface/transformers/pull/15740
* add VisionTextDualEncoder and CLIP fine-tuning script by patil-suraj in https://github.com/huggingface/transformers/pull/15701
* Add layer_idx to CrossAttention of GPT2 model by hyunwoongko in https://github.com/huggingface/transformers/pull/15730
* TF text classification examples by gante in https://github.com/huggingface/transformers/pull/15704
* revert temporary addition to test next version of CLIPTokenizerFast by SaulLu in https://github.com/huggingface/transformers/pull/15717
* added link to our writing-doc document by FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15756
* TF train_step docstring by gante in https://github.com/huggingface/transformers/pull/15755
* Gelu10 by mfuntowicz in https://github.com/huggingface/transformers/pull/15676
* fixed pipeline code by Moumeneb1 in https://github.com/huggingface/transformers/pull/15607
* Fix typo on examples/pytorch/question-answering by dreamgonfly in https://github.com/huggingface/transformers/pull/15644
* Cleanup transformers-cli by julien-c in https://github.com/huggingface/transformers/pull/15767
* Fix `HfArgumentParser` when passing a generator by bryant1410 in https://github.com/huggingface/transformers/pull/15758
* Adding ZeroShotImageClassificationPipeline by Narsil in https://github.com/huggingface/transformers/pull/12119
* [M2M100, XGLM] fix create_position_ids_from_inputs_embeds by patil-suraj in https://github.com/huggingface/transformers/pull/15751
* Supporting Merges.txt files than contain an endline. (`hf-internal-testing/tiny-clip` for instance) by Narsil in https://github.com/huggingface/transformers/pull/15782
* [CLIP] fix gradient checkpointing by patil-suraj in https://github.com/huggingface/transformers/pull/15789
* [ViLT] Fix checkpoint url in config by patil-suraj in https://github.com/huggingface/transformers/pull/15790
* Enable `image-segmentation` on `AutoModelForSemanticSegmentation` by Narsil in https://github.com/huggingface/transformers/pull/15647
* [doc] custom_models: mention security features of the Hub by julien-c in https://github.com/huggingface/transformers/pull/15768
* [Wav2Vec2FeatureExtractor] Align documentation with code by lsb in https://github.com/huggingface/transformers/pull/15468
* HTML dev docs by coyotte508 in https://github.com/huggingface/transformers/pull/15678
* Fix indent in doc-builder CI by coyotte508 in https://github.com/huggingface/transformers/pull/15798
* [Test refactor 1/5] Per-folder tests reorganization by LysandreJik in https://github.com/huggingface/transformers/pull/15725
* [Test refactor 2/5] Tests fetcher by LysandreJik in https://github.com/huggingface/transformers/pull/15726
* [Test refactor 3/5] Notification service improvement by LysandreJik in https://github.com/huggingface/transformers/pull/15727
* [Test refactor 4/5] Improve the scheduled tests by LysandreJik in https://github.com/huggingface/transformers/pull/15728
* [Test refactor 5/5] Build docker images by LysandreJik in https://github.com/huggingface/transformers/pull/15729
* Fix build_documentation CI by coyotte508 in https://github.com/huggingface/transformers/pull/15803
* Fix model templates by LysandreJik in https://github.com/huggingface/transformers/pull/15806
* Fix add-new-model-like when old model checkpoint is not found by sgugger in https://github.com/huggingface/transformers/pull/15805
* Fix from_pretrained with default base_model_prefix by sgugger in https://github.com/huggingface/transformers/pull/15814
* Revert changes in logit size for semantic segmentation models by sgugger in https://github.com/huggingface/transformers/pull/15722
* [Unispeech] Fix slow tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15818
* [Barthez Tokenizer] Fix saving by patrickvonplaten in https://github.com/huggingface/transformers/pull/15815
* [TFXLNet] Correct tf xlnet generate by patrickvonplaten in https://github.com/huggingface/transformers/pull/15822
* Fixes the "push" CI run by LysandreJik in https://github.com/huggingface/transformers/pull/15807
* Fix semantic segmentation pipeline test by sgugger in https://github.com/huggingface/transformers/pull/15826
* Fix dummy_inputs() to dummy_inputs in symbolic_trace doc string by pbelevich in https://github.com/huggingface/transformers/pull/15776
* Add model specific output classes to PoolFormer model docs by heytanay in https://github.com/huggingface/transformers/pull/15746
* HFTracer.trace should use self.graph to be compatible with torch.fx.Tracer by pbelevich in https://github.com/huggingface/transformers/pull/15824
* Fix tf.concatenate + test past_key_values for TF models by ydshieh in https://github.com/huggingface/transformers/pull/15774
* [examples/summarization and translation] fix readme by patil-suraj in https://github.com/huggingface/transformers/pull/15833
* Add ONNX Runtime quantization for text classification notebook by echarlaix in https://github.com/huggingface/transformers/pull/15817
* Re-enable doctests for the quicktour by sgugger in https://github.com/huggingface/transformers/pull/15828
* Framework split model report by LysandreJik in https://github.com/huggingface/transformers/pull/15825
* [UniSpeechSat] Revert previous incorrect change of slow tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15847
* Flax Speech-Encoder-Decoder Model by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15613
* Fix (deprecated) ONNX exporter to account for new tf2onnx API by lewtun in https://github.com/huggingface/transformers/pull/15856
* Fixing the timestamps with chunking. by Narsil in https://github.com/huggingface/transformers/pull/15843
* [TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices by patrickvonplaten in https://github.com/huggingface/transformers/pull/15846
* [Benchmark tools] Deprecate all by patrickvonplaten in https://github.com/huggingface/transformers/pull/15848
* Add PT + TF automatic builds by LysandreJik in https://github.com/huggingface/transformers/pull/15860
* Update TF LM examples by gante in https://github.com/huggingface/transformers/pull/15855
* [ViLT] Add link to notebooks by NielsRogge in https://github.com/huggingface/transformers/pull/15791
* Scatter should run on CUDA by LysandreJik in https://github.com/huggingface/transformers/pull/15872
* [vision] Add problem_type support by NielsRogge in https://github.com/huggingface/transformers/pull/15851
* use python 3.7 for flax self-push tests by patil-suraj in https://github.com/huggingface/transformers/pull/15865
* Bump up doc node version to 16 by mishig25 in https://github.com/huggingface/transformers/pull/15874
* No self-hosted by LysandreJik in https://github.com/huggingface/transformers/pull/15710
* fix deepspeed tests by stas00 in https://github.com/huggingface/transformers/pull/15881
* Remove stash for now by LysandreJik in https://github.com/huggingface/transformers/pull/15882
* M2M100 support for ONNX export by michaelbenayoun in https://github.com/huggingface/transformers/pull/15193
* [Bart] Fix implementation note doc by patrickvonplaten in https://github.com/huggingface/transformers/pull/15879
* Add TF generate sample tests with all logit processors by gante in https://github.com/huggingface/transformers/pull/15852
* TF: Update QA example by gante in https://github.com/huggingface/transformers/pull/15870
* Updates in Trainer to support new features in SM Model Parallel library by rahul003 in https://github.com/huggingface/transformers/pull/15877
* Fix tiny typo in docs by rhjohnstone in https://github.com/huggingface/transformers/pull/15884
* Fix Bug in FlaxWav2Vec2 Slow Test by sanchit-gandhi in https://github.com/huggingface/transformers/pull/15887
* [SegFormer] Add deprecation warning by NielsRogge in https://github.com/huggingface/transformers/pull/15889
* TF generate refactor - Sample by gante in https://github.com/huggingface/transformers/pull/15793
* [XGLM] run sampling test on CPU to be deterministic by patil-suraj in https://github.com/huggingface/transformers/pull/15892
* Fix SegformerForImageClassification by NielsRogge in https://github.com/huggingface/transformers/pull/15895
* Update delete-dev-doc job to match build-dev-doc by sgugger in https://github.com/huggingface/transformers/pull/15891

Impressive community contributors

The community contributors below have significantly contributed to the v4.17.0 release. Thank you!

sayakpaul, for contributing the TensorFlow version of ConvNext
gchhablani, for contributing PLBart
edugp, for contributing Data2Vec

New Contributors
* Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727
* jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428
* ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432
* peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423
* bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480
* AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582
* thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403
* davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473
* sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519
* arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084
* cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504
* cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416
* Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831
* derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614
* tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636
* muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638
* versae made their first contribution in https://github.com/huggingface/transformers/pull/15590
* jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617
* arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413
* FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657
* coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680
* heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531
* gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702
* SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741
* Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740
* dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644
* lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468
* pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776
* sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750
* rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877
* rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.16.0...v4.17.0

4.16.2

Not secure
- Add header (huggingface15434)
- [Hotfix] Fix Swin model outputs (huggingface15414)

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.16.1...v4.16.2

4.16.1

Not secure
Add init to BORT (15378) by LysandreJik

4.16.0

Not secure
New models

Nyströmformer

The Nyströmformer model was proposed in [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.

The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.

* Add Nystromformer by novice03 in https://github.com/huggingface/transformers/pull/14659

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformer

REALM

The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

It’s a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.

* Add REALM by qqaatw in https://github.com/huggingface/transformers/pull/13292
* Add FastTokenizer to REALM by qqaatw in https://github.com/huggingface/transformers/pull/15211

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realm


ViTMAE

The ViTMAE model was proposed in [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377v2) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.

The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.

* Add MAE by NielsRogge in https://github.com/huggingface/transformers/pull/15120

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_mae

ViLT

The ViLT model was proposed in [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.

ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).

* Add ViLT by NielsRogge in https://github.com/huggingface/transformers/pull/14895

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vilt

Swin Transformer

The Swin Transformer was proposed in [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.

* Add Swin Transformer by novice03 in https://github.com/huggingface/transformers/pull/15085

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swin

YOSO

The YOSO model was proposed in [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714)
by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.

YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.

* Add YOSO by novice03 in 15091

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yoso

Add model like

To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run `transformers-cli add-new-model-like` and fill the questionnaire!

* Add model like by sgugger in https://github.com/huggingface/transformers/pull/14992

Training scripts

New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models.
Finally, an image captioning example in Flax gets added to the library.

* Add Speech Seq2Seq Training script by patrickvonplaten in https://github.com/huggingface/transformers/pull/14792
* [ViTMAE] Add image pretraining script by NielsRogge in https://github.com/huggingface/transformers/pull/15242
* Add Flax image captioning example by ydshieh in https://github.com/huggingface/transformers/pull/14864

Pipelines

Adding support for long files on `automatic-speech-recognition` (ASR) as well as supporting audio models with LM which increases the WER on many tasks [See the blogpost](https://huggingface.co/blog/wav2vec2-with-ngram).
Also continuously increasing homogeneity in arguments, framework support on all pipelines.

* Large audio chunking for the existing ASR pipeline by anton-l in https://github.com/huggingface/transformers/pull/14896
* Enabling `TF` on `image-classification` pipeline. by Narsil in https://github.com/huggingface/transformers/pull/15030
* Pipeline ASR with LM. by Narsil in https://github.com/huggingface/transformers/pull/15071
* ChunkPipeline: `batch_size` enabled on `zero-cls` and `qa` pipelines. by Narsil in https://github.com/huggingface/transformers/pull/14225

PyTorch improvements

The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.

* Add `ElectraForCausalLM` -> Enable Electra encoder-decoder model by stancld in https://github.com/huggingface/transformers/pull/14729

TensorFlow improvements

<FILL ME>

* Keras metric callback by Rocketknight1 and merveenoyan in https://github.com/huggingface/transformers/pull/14867

The vision encoder decoder model can now be used in TensorFlow.

* Add TFVisionEncoderDecoderModel by ydshieh in https://github.com/huggingface/transformers/pull/14148

CLIP gets ported to TensorFlow.

* Add TFCLIPModel by ydshieh in https://github.com/huggingface/transformers/pull/13967

Flax improvements

RoFormer gets ported to Flax.

* Add Flax RoFormer by stancld in https://github.com/huggingface/transformers/pull/15005

Deprecations

* Deprecates AdamW and adds `--optim` by manuelciosici in https://github.com/huggingface/transformers/pull/14744

Documentation

The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on [how to write good docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification).

* Convert rst files by sgugger in https://github.com/huggingface/transformers/pull/14888
* Doc styler v2 by sgugger in https://github.com/huggingface/transformers/pull/14950
* Convert last rst file by sgugger in https://github.com/huggingface/transformers/pull/14952
* Doc styler examples by sgugger in https://github.com/huggingface/transformers/pull/14953
* [doc] consistent True/False/None default format by stas00 in https://github.com/huggingface/transformers/pull/14951
* [doc] :obj: hunt by stas00 in https://github.com/huggingface/transformers/pull/14954
* [doc] :class: hunt by stas00 in https://github.com/huggingface/transformers/pull/14955

Bugfixes and improvements

* Fix installation instructions for BART ONNX example by lewtun in https://github.com/huggingface/transformers/pull/14885
* Fix doc examples: ... takes no keyword arguments by ydshieh in https://github.com/huggingface/transformers/pull/14701
* Fix `AttributeError` from `PreTrainedTokenizerFast.decoder` by aphedges in https://github.com/huggingface/transformers/pull/14691
* Add 'with torch.no_grad()' to ALBERT integration test forward pass by henholm in https://github.com/huggingface/transformers/pull/14808
* Add ONNX support for MarianMT models by lewtun in https://github.com/huggingface/transformers/pull/14586
* add custom stopping criteria to human eval script by lvwerra in https://github.com/huggingface/transformers/pull/14897
* Set `run_name` in MLflowCallback by YangDong2002 in https://github.com/huggingface/transformers/pull/14894
* [AutoTokenizer] Fix incorrect from pretrained by patrickvonplaten in https://github.com/huggingface/transformers/pull/14900
* [Tests] Update speech diarization and WavLM tolerances by anton-l in https://github.com/huggingface/transformers/pull/14902
* [doc] post-porting by stas00 in https://github.com/huggingface/transformers/pull/14890
* [Generate] Remove attention_mask and integrate model_main_input_name by patrickvonplaten in https://github.com/huggingface/transformers/pull/14856
* Fix failing GPU trainer tests by sgugger in https://github.com/huggingface/transformers/pull/14903
* Better logic for getting tokenizer config in AutoTokenizer by sgugger in https://github.com/huggingface/transformers/pull/14906
* [doc] install - add link to jax installation by stas00 in https://github.com/huggingface/transformers/pull/14912
* [WavLM] fix wavlm docs by patrickvonplaten in https://github.com/huggingface/transformers/pull/14910
* Fix Perceiver docs by Sanster in https://github.com/huggingface/transformers/pull/14917
* fix to issue 14833 in data_collator - consider no labels by kleinay in https://github.com/huggingface/transformers/pull/14930
* Fix duplicate call to save_checkpoint when using deepspeed by MihaiBalint in https://github.com/huggingface/transformers/pull/14946
* [WavLM] give model more precision tolerance in tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/14958
* [Speech Recognition Examples] Update README.md by patrickvonplaten in https://github.com/huggingface/transformers/pull/14965
* [Tests] Speed up tokenizer tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/14964
* [Wav2Vec2] Rename model's feature extractor to feature encoder by patrickvonplaten in https://github.com/huggingface/transformers/pull/14959
* Replace assertion with exception by jaketae in https://github.com/huggingface/transformers/pull/14970
* remove absl workaround as it's no longer needed by stas00 in https://github.com/huggingface/transformers/pull/14909
* Fixing a pathological case for slow tokenizers by Narsil in https://github.com/huggingface/transformers/pull/14981
* [AutoProcessor] Correct AutoProcessor and automatically add processor… by patrickvonplaten in https://github.com/huggingface/transformers/pull/14881
* [Generate] correct encoder_outputs are passed without attention_mask by patrickvonplaten in https://github.com/huggingface/transformers/pull/14980
* Adding `num_return_sequences` support for text2text generation. by Narsil in https://github.com/huggingface/transformers/pull/14988
* Enabling `tokenizers` upgrade. by Narsil in https://github.com/huggingface/transformers/pull/14941
* Allow training to resume even if RNG states are not properly loaded by sgugger in https://github.com/huggingface/transformers/pull/14994
* Map model_type and doc pages names by sgugger in https://github.com/huggingface/transformers/pull/14944
* Fixing t2t pipelines lists outputs. by Narsil in https://github.com/huggingface/transformers/pull/15008
* Improve truncation_side by Narsil in https://github.com/huggingface/transformers/pull/14947
* Fix doc examples: name 'torch' is not defined by ydshieh in https://github.com/huggingface/transformers/pull/15016
* [Tests] Correct Wav2Vec2 & WavLM tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15015
* [doc] Update parallelism.mdx by hyunwoongko in https://github.com/huggingface/transformers/pull/15013
* Fix Code block speech pretraining example by flozi00 in https://github.com/huggingface/transformers/pull/14983
* Fix a little typo by milyiyo in https://github.com/huggingface/transformers/pull/15002
* Hotfix `chunk_length_s` instead of `_ms`. by Narsil in https://github.com/huggingface/transformers/pull/15029
* [doc] Update parallelism.mdx by hyunwoongko in https://github.com/huggingface/transformers/pull/15018
* [megatron convert] PYTHONPATH requirements by stas00 in https://github.com/huggingface/transformers/pull/14956
* Fix doc example: mask_time_indices (numpy) has no attribute 'to' by ydshieh in https://github.com/huggingface/transformers/pull/15033
* Adding QoL for `batch_size` arg (like others enabled everywhere). by Narsil in https://github.com/huggingface/transformers/pull/15027
* [CLIP] Fix PT test by patrickvonplaten in https://github.com/huggingface/transformers/pull/15041
* [SpeechEncoderDecoder] Fix from pretrained by patrickvonplaten in https://github.com/huggingface/transformers/pull/15043
* [CLIP] Fix TF test by patil-suraj in https://github.com/huggingface/transformers/pull/15042
* Wrap Roberta integration test forward passes with torch.no_grad() by mattchurgin in https://github.com/huggingface/transformers/pull/15037
* Add Detectron2 to Github actions by NielsRogge in https://github.com/huggingface/transformers/pull/15053
* Remove old asserts. by Narsil in https://github.com/huggingface/transformers/pull/15012
* Add 'with torch.no_grad()' to BertGeneration integration test forward passes by itsTurner in https://github.com/huggingface/transformers/pull/14963
* Update run_speech_recognition_seq2seq.py (max_eval_samples instead of train_samples) by flozi00 in https://github.com/huggingface/transformers/pull/14967
* [VisionTextDualEncoder] Fix doc example by ydshieh in https://github.com/huggingface/transformers/pull/15057
* Resubmit changes after rebase to master by kct22aws in https://github.com/huggingface/transformers/pull/14982
* [Fix doc examples] missing from_pretrained by ydshieh in https://github.com/huggingface/transformers/pull/15044
* [VisionTextDualEncoder] Add token_type_ids param by ydshieh in https://github.com/huggingface/transformers/pull/15073
* Fix convert for newer megatron-lm bert model by yoquankara in https://github.com/huggingface/transformers/pull/14082
* [Wav2Vec2 Speech Event] Add speech event v2 by patrickvonplaten in https://github.com/huggingface/transformers/pull/15083
* fix model table cell text alignment by ydshieh in https://github.com/huggingface/transformers/pull/14999
* Update check_repo.py by kamalkraj in https://github.com/huggingface/transformers/pull/15014
* Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x by cody-moveworks in https://github.com/huggingface/transformers/pull/15019
* Change assignee for tokenizers by LysandreJik in https://github.com/huggingface/transformers/pull/15088
* support the trocr small models by liminghao1630 in https://github.com/huggingface/transformers/pull/14893
* [Fix doc example] RagModel by ydshieh in https://github.com/huggingface/transformers/pull/15076
* Model summary doc page horizontal banners by mishig25 in https://github.com/huggingface/transformers/pull/15058
* Use tqdm.auto in Pipeline docs by bryant1410 in https://github.com/huggingface/transformers/pull/14920
* [doc] normalize HF Transformers string by stas00 in https://github.com/huggingface/transformers/pull/15023
* Happy New Year! by sgugger in https://github.com/huggingface/transformers/pull/15094
* [DOC] fix doc examples for bart-like models by patil-suraj in https://github.com/huggingface/transformers/pull/15093
* [performance doc] Power and Cooling by stas00 in https://github.com/huggingface/transformers/pull/14935
* Add test to check reported training loss by sgugger in https://github.com/huggingface/transformers/pull/15096
* Take gradient accumulation into account when defining samplers by sgugger in https://github.com/huggingface/transformers/pull/15095
* [Fix doc example] Speech2TextForConditionalGeneration by ydshieh in https://github.com/huggingface/transformers/pull/15092
* Fix cookiecutter by NielsRogge in https://github.com/huggingface/transformers/pull/15100
* [Wav2Vec2ProcessorWithLM] improve decoder download by patrickvonplaten in https://github.com/huggingface/transformers/pull/15040
* Adds IBERT to models exportable with ONNX by MaximovaIrina in https://github.com/huggingface/transformers/pull/14868
* change metric_key_prefix in seq2seq_trainer.py by JejuWayfarer in https://github.com/huggingface/transformers/pull/15099
* Print out durations of all scheduled tests by LysandreJik in https://github.com/huggingface/transformers/pull/15102
* Fix failing W2V2 test by LysandreJik in https://github.com/huggingface/transformers/pull/15104
* Doc styler tip by sgugger in https://github.com/huggingface/transformers/pull/15105
* Update ONNX docs by lewtun in https://github.com/huggingface/transformers/pull/14904
* Fix saving FlaubertTokenizer configs by vmaryasin in https://github.com/huggingface/transformers/pull/14991
* Update TF test_step to match train_step by Rocketknight1 in https://github.com/huggingface/transformers/pull/15111
* use block_size instead of max_seq_length in tf run_clm example by riklopfer in https://github.com/huggingface/transformers/pull/15036
* fix: switch from slow to generic tokenizer class by lvwerra in https://github.com/huggingface/transformers/pull/15122
* Fix TFEncoderDecoder labels handling 14357 by ydshieh in https://github.com/huggingface/transformers/pull/15001
* Add ONNX configuration classes to docs by lewtun in https://github.com/huggingface/transformers/pull/15121
* Add `with torch.no_grad()` to DistilBERT integration test forward pass by jaketae in https://github.com/huggingface/transformers/pull/14979
* mBART support for run_summarization.py by banda-larga in https://github.com/huggingface/transformers/pull/15125
* doc-builder -> doc-build by LysandreJik in https://github.com/huggingface/transformers/pull/15134
* [Fix doc example] - ProphetNetDecoder by ydshieh in https://github.com/huggingface/transformers/pull/15124
* [examples/flax/language-modeling] set loglevel by stas00 in https://github.com/huggingface/transformers/pull/15129
* Update model_sharing.mdx by carlos-aguayo in https://github.com/huggingface/transformers/pull/15142
* Enable AMP for xla:gpu device in trainer class by ymwangg in https://github.com/huggingface/transformers/pull/15022
* [deepspeed tests] fix summarization by stas00 in https://github.com/huggingface/transformers/pull/15149
* Check the repo consistency in model templates test by sgugger in https://github.com/huggingface/transformers/pull/15141
* Add TF glu activation function by gante in https://github.com/huggingface/transformers/pull/15146
* Make sure all submodules are properly registered by sgugger in https://github.com/huggingface/transformers/pull/15144
* [Fix doc example] - OpenAIGPTDoubleHeadsModel by ydshieh in https://github.com/huggingface/transformers/pull/15143
* fix BertTokenizerFast `tokenize_chinese_chars` arg by SaulLu in https://github.com/huggingface/transformers/pull/15158
* Fix typo in test_configuration_common.py by novice03 in https://github.com/huggingface/transformers/pull/15160
* Add "open in hf spaces" gradio button issue 73 by AK391 in https://github.com/huggingface/transformers/pull/15106
* TF Bert inference - support `np.ndarray` optional arguments by gante in https://github.com/huggingface/transformers/pull/15074
* Fixing flaky test (hopefully). by Narsil in https://github.com/huggingface/transformers/pull/15154
* Better dummies by sgugger in https://github.com/huggingface/transformers/pull/15148
* Update from keras2onnx to tf2onnx by gante in https://github.com/huggingface/transformers/pull/15162
* [doc] performance: Efficient Software Prebuilds by stas00 in https://github.com/huggingface/transformers/pull/15147
* [Speech models] Disable non-existing chunking in tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15163
* Added forward pass of test_inference_image_classification_head by MrinalTyagi in https://github.com/huggingface/transformers/pull/14777
* Fix dtype issue in TF BART by Rocketknight1 in https://github.com/huggingface/transformers/pull/15178
* [doc] new MoE paper by stas00 in https://github.com/huggingface/transformers/pull/15184
* Mark bad tokenizers version by sgugger in https://github.com/huggingface/transformers/pull/15188
* [Fix doc example] UniSpeechSatForPreTraining by ydshieh in https://github.com/huggingface/transformers/pull/15152
* `is_ctc` needs to be updated to `self.type == "ctc". by Narsil in https://github.com/huggingface/transformers/pull/15194
* [Fix doc example] TFRagModel by ydshieh in https://github.com/huggingface/transformers/pull/15187
* Error when code examples are improperly closed by sgugger in https://github.com/huggingface/transformers/pull/15186
* Fix deprecation warnings for int div by sgugger in https://github.com/huggingface/transformers/pull/15180
* Copies and docstring styling by sgugger in https://github.com/huggingface/transformers/pull/15202
* [ASR pipeline] correct with lm pipeline by patrickvonplaten in https://github.com/huggingface/transformers/pull/15200
* Remove dependency to quiet Dependabot by sgugger in https://github.com/huggingface/transformers/pull/15205
* Ignore empty subfolders when identifying submodules by sgugger in https://github.com/huggingface/transformers/pull/15204
* [MBartTokenizer] remove dep on xlm-roberta tokenizer by patil-suraj in https://github.com/huggingface/transformers/pull/15201
* fix: 14486 do not use BertPooler in DPR by PaulLerner in https://github.com/huggingface/transformers/pull/15068
* [Fix doc example] Wrong checkpoint name by ydshieh in https://github.com/huggingface/transformers/pull/15079
* [Robust Speech Event] Add guides by patrickvonplaten in https://github.com/huggingface/transformers/pull/15155
* Enable tqdm toggling by jaketae in https://github.com/huggingface/transformers/pull/15167
* [FLAX] glue training example refactor by kamalkraj in https://github.com/huggingface/transformers/pull/13815
* Rename compute_loss in TF models by Rocketknight1 in https://github.com/huggingface/transformers/pull/15207
* Build dev documentation by LysandreJik in https://github.com/huggingface/transformers/pull/15210
* [Fix doc example] TFFunnelTokenizer' is not defined by ydshieh in https://github.com/huggingface/transformers/pull/15225
* Correct Speech Event Readme by patrickvonplaten in https://github.com/huggingface/transformers/pull/15226
* [ViTMAE] Various fixes by NielsRogge in https://github.com/huggingface/transformers/pull/15221
* [Speech Event] Fix speech event readme by patil-suraj in https://github.com/huggingface/transformers/pull/15227
* Fix typo in BERT tokenization file by qqaatw in https://github.com/huggingface/transformers/pull/15228
* Fix PR number by LysandreJik in https://github.com/huggingface/transformers/pull/15231
* Adapt Common Voice Talk Title and Abstract by patrickvonplaten in https://github.com/huggingface/transformers/pull/15233
* Update Trainer code example by NielsRogge in https://github.com/huggingface/transformers/pull/15070
* Make chuking smartly (long files) work on asr ctc_with_lm. by Narsil in https://github.com/huggingface/transformers/pull/15219
* Fix usage of additional kwargs in `from_encoder_decoder_pretrained` in encoder-decoder models by jsnfly in https://github.com/huggingface/transformers/pull/15056
* Update README.md by anton-l in https://github.com/huggingface/transformers/pull/15239
* Update README.md by anton-l in https://github.com/huggingface/transformers/pull/15246
* Update pipelines.mdx by kamalkraj in https://github.com/huggingface/transformers/pull/15243
* [Fix doc example] missing import by ydshieh in https://github.com/huggingface/transformers/pull/15240
* Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels by Rocketknight1 in https://github.com/huggingface/transformers/pull/15234
* Make sure to raise NotImplementedError with correct method name by kumapo in https://github.com/huggingface/transformers/pull/15253
* Fix crash when logs are empty because Keras has wiped them out of spite by Rocketknight1 in https://github.com/huggingface/transformers/pull/15258
* Tentative workflow improvement by LysandreJik in https://github.com/huggingface/transformers/pull/15255
* Fix code examples by NielsRogge in https://github.com/huggingface/transformers/pull/15257
* Adds missing module_specs for usages of _LazyModule by jkuball in https://github.com/huggingface/transformers/pull/15230
* Prepare ONNX export for torch v1.11 by lewtun in https://github.com/huggingface/transformers/pull/15270
* Fix by novice03 in https://github.com/huggingface/transformers/pull/15276
* Move BART + ONNX example to research_projects by lewtun in https://github.com/huggingface/transformers/pull/15271
* Specify providers explicitly in ORT session initialization by wangyems in https://github.com/huggingface/transformers/pull/15235
* Fixes Benchmark example link by evandrosks in https://github.com/huggingface/transformers/pull/15278
* [Robust Speech Challenge] Add timeline by patrickvonplaten in https://github.com/huggingface/transformers/pull/15274
* [Fix doc example] TFLayoutLMForTokenClassification: missing import tf by ydshieh in https://github.com/huggingface/transformers/pull/15268
* [Wav2Vec2ProcessorWithLM] improve multi processing by patrickvonplaten in https://github.com/huggingface/transformers/pull/15247
* Refine errors for pretrained objects by sgugger in https://github.com/huggingface/transformers/pull/15261
* [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests by patrickvonplaten in https://github.com/huggingface/transformers/pull/15272
* Update eval.py by patrickvonplaten in https://github.com/huggingface/transformers/pull/15310
* Update CONTRIBUTING.md by kamalkraj in https://github.com/huggingface/transformers/pull/15290
* Fix a typo in tag addition by sgugger in https://github.com/huggingface/transformers/pull/15286
* Remove old debug code leftover. by Narsil in https://github.com/huggingface/transformers/pull/15306
* [Fix doc example] fix missing import jnp by ydshieh in https://github.com/huggingface/transformers/pull/15291
* [LayoutLMV2 Tests] Make sure input is on GPU by patrickvonplaten in https://github.com/huggingface/transformers/pull/15314
* Replace NystromformerTokenizer with AutoTokenizer by novice03 in https://github.com/huggingface/transformers/pull/15312
* [Beam Search] Correct returned beam scores by patrickvonplaten in https://github.com/huggingface/transformers/pull/14654
* [Examples] Correct run ner label2id for fine-tuned models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15017
* Avoid using get_list_of_files by sgugger in https://github.com/huggingface/transformers/pull/15287
* [Tests] Fix test by NielsRogge in https://github.com/huggingface/transformers/pull/15324
* Add 🤗 Accelerate tutorial by stevhliu in https://github.com/huggingface/transformers/pull/15263
* Added missing code in exemplary notebook - custom datasets fine-tuning by Pawloch247 in https://github.com/huggingface/transformers/pull/15300
* Fix encoder-decoder models when labels is passed by ydshieh in https://github.com/huggingface/transformers/pull/15172
* Fix table formatting in SegFormer docs by deppen8 in https://github.com/huggingface/transformers/pull/15337
* Fix deepspeed docs by ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15346
* Fix 'eval_split_name' described as defaulting to 'train' by FremyCompany in https://github.com/huggingface/transformers/pull/15348
* Update doc writing guide by sgugger in https://github.com/huggingface/transformers/pull/15350
* Add YOSO by novice03 in https://github.com/huggingface/transformers/pull/15091
* [docs] post-PR merge fix by stas00 in https://github.com/huggingface/transformers/pull/15355
* Fix YosoConfig doc by sgugger in https://github.com/huggingface/transformers/pull/15353
* [DocTests Speech] Add doc tests for all speech models by patrickvonplaten in https://github.com/huggingface/transformers/pull/15031
* Push to hub save by sgugger in https://github.com/huggingface/transformers/pull/15327
* Fix KerasMetricCallback prediction with generate() and inference of column names by Rocketknight1 in https://github.com/huggingface/transformers/pull/15351
* Add a device argument to the eval script by anton-l in https://github.com/huggingface/transformers/pull/15371
* improve saving strategy of sentencepiece tokenizer by SaulLu in https://github.com/huggingface/transformers/pull/15328
* Implement fixes for TrainingArguments doc by sgugger in https://github.com/huggingface/transformers/pull/15370
* Super-small fix stops us confusing Keras console logging by modifying… by Rocketknight1 in https://github.com/huggingface/transformers/pull/15373
* Add proper documentation for Keras callbacks by sgugger in https://github.com/huggingface/transformers/pull/15374
* Example script for PushToHubCallback by Rocketknight1 in https://github.com/huggingface/transformers/pull/15375


Impressive community contributors

The community contributors below have significantly contributed to the v4.16.0 release. Thank you!
- novice03, for contributing Nyströmformer, Swin Transformer and YOSO
- qqaatw, for contributing REALM
- stancld, for adding support for ELECTRA as a decoder, and porting RoFormer to Flax
- ydshieh, for a myriad of documentation fixes, the port of CLIP to TensorFlow, the addition of the TensorFlow vision encoder-decoder model, and the contribution of an image captioning example in Flax.

New Contributors

* YangDong2002 made their first contribution in https://github.com/huggingface/transformers/pull/14894
* Sanster made their first contribution in https://github.com/huggingface/transformers/pull/14917
* kleinay made their first contribution in https://github.com/huggingface/transformers/pull/14930
* MihaiBalint made their first contribution in https://github.com/huggingface/transformers/pull/14946
* milyiyo made their first contribution in https://github.com/huggingface/transformers/pull/15002
* mattchurgin made their first contribution in https://github.com/huggingface/transformers/pull/15037
* itsTurner made their first contribution in https://github.com/huggingface/transformers/pull/14963
* kct22aws made their first contribution in https://github.com/huggingface/transformers/pull/14982
* yoquankara made their first contribution in https://github.com/huggingface/transformers/pull/14082
* cody-moveworks made their first contribution in https://github.com/huggingface/transformers/pull/15019
* MaximovaIrina made their first contribution in https://github.com/huggingface/transformers/pull/14868
* JejuWayfarer made their first contribution in https://github.com/huggingface/transformers/pull/15099
* novice03 made their first contribution in https://github.com/huggingface/transformers/pull/14659
* banda-larga made their first contribution in https://github.com/huggingface/transformers/pull/15125
* manuelciosici made their first contribution in https://github.com/huggingface/transformers/pull/14744
* carlos-aguayo made their first contribution in https://github.com/huggingface/transformers/pull/15142
* gante made their first contribution in https://github.com/huggingface/transformers/pull/15146
* AK391 made their first contribution in https://github.com/huggingface/transformers/pull/15106
* MrinalTyagi made their first contribution in https://github.com/huggingface/transformers/pull/14777
* jsnfly made their first contribution in https://github.com/huggingface/transformers/pull/15056
* jkuball made their first contribution in https://github.com/huggingface/transformers/pull/15230
* wangyems made their first contribution in https://github.com/huggingface/transformers/pull/15235
* evandrosks made their first contribution in https://github.com/huggingface/transformers/pull/15278
* Pawloch247 made their first contribution in https://github.com/huggingface/transformers/pull/15300
* deppen8 made their first contribution in https://github.com/huggingface/transformers/pull/15337
* ngoquanghuy99 made their first contribution in https://github.com/huggingface/transformers/pull/15346

**Full Changelog**: https://github.com/huggingface/transformers/compare/v4.15.0...v4.16.0

Page 11 of 26

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.