Transformers

Latest version: v4.41.0

Safety actively analyzes 631073 Python packages for vulnerabilities to keep your Python projects secure.

Page 20 of 26

3.2.0

Not secure

Bert Seq2Seq models, FSMT, Funnel Transformer, LXMERT

BERT Seq2seq models

The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using EncoderDecoderModel as proposed in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

It was added to the library in PyTorch with the following checkpoints:

- `google/roberta2roberta_L-24_bbc`
- `google/roberta2roberta_L-24_gigaword`
- `google/roberta2roberta_L-24_cnn_daily_mail`
- `google/roberta2roberta_L-24_discofuse`
- `google/roberta2roberta_L-24_wikisplit`
- `google/bert2bert_L-24_wmt_de_en`
- `google/bert2bert_L-24_wmt_en_de`

Contributions:

- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. 6594 (patrickvonplaten)

FSMT (FairSeq MachineTranslation)

FSMT (FairSeq MachineTranslation) models were introduced in [Facebook FAIR’s WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616) by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.

It was added to the library in PyTorch, with the following checkpoints:

- `facebook/wmt19-en-ru`
- `facebook/wmt19-en-de`
- `facebook/wmt19-ru-en`
- `facebook/wmt19-de-en`

Contributions:

- [ported model] FSMT (FairSeq MachineTranslation) 6940 (stas00)
- build/eval/gen-card scripts for fsmt 7155 (stas00)
- skip failing FSMT CUDA tests until investigated 7220 (stas00)
- [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test 7224 (stas00)
- [s2s] adjust finetune + test to work with fsmt 7263 (stas00)
- [fsmt] SinusoidalPositionalEmbedding no need to pass device 7292 (stas00)
- Adds FSMT to LM head AutoModel 7312 (LysandreJik)

LayoutLM

The LayoutLM model was proposed in [LayoutLM: Pre-training of Text and Layout for Document Image Understandin](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.

It was added to the library in PyTorch with the following checkpoints:

- `layoutlm-base-uncased`
- `layoutlm-large-uncased`

Contributions:

- Add LayoutLM Model 7064 (liminghao1630)
- Fixes for LayoutLM 7318 (sgugger)
Funnel Transformer

The Funnel Transformer model was proposed in the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236). It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.

It was added to the library in both PyTorch and TensorFlow, with the following checkpoints:

- `funnel-transformer/small`
- `funnel-transformer/small-base`
- `funnel-transformer/medium`
- `funnel-transformer/medium-base`
- `funnel-transformer/intermediate`
- `funnel-transformer/intermediate-base`
- `funnel-transformer/large`
- `funnel-transformer/large-base`
- `funnel-transformer/xlarge`
- `funnel-transformer/xlarge-base`

Contributions:

- Funnel transformer 6908 (sgugger)
- Add TF Funnel Transformer 7029 (sgugger)

LXMERT

The LXMERT model was proposed in [LXMERT: Learning Cross-Modality Encoder Representations from Transformers](https://arxiv.org/abs/1908.07490) by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders (one for the vision modality, one for the language modality, and then one to fuse both modalities) pre-trained using a combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.

It was added to the library in TensorFlow with the following checkpoints:

- `unc-nlp/lxmert-base-uncased`
- `unc-nlp/lxmert-vqa-uncased`
- `unc-nlp/lxmert-gqa-uncased`

Contributions

- Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models 5793 (eltoto1219)
- [LXMERT] Fix tests on gpu 6946 (patrickvonplaten)

New pipelines

The following pipeline was added to the library:

- [pipelines] Text2TextGenerationPipeline 6744 (patil-suraj)

Notebooks

The following community notebooks were contributed to the library:

- Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. 6986 (eltoto1219)
- [Community notebooks] Add notebook on fine-tuning GPT-2 Model with Trainer Class 7005 (philschmid)
- Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks 7255 (NadirEM)
- added multilabel text classification notebook using distilbert to community notebooks 7201 (DhavalTaunk08)

Encoder-decoder architectures

An additional encoder-decoder architecture was added:

- [EncoderDecoder] Add xlm-roberta to encoder decoder 6878 (patrickvonplaten)

Bug fixes and improvements

- TF Flaubert w/ pre-norm 6841 (LysandreJik)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task 6644 (HuangLianzhe)
- Fix in Adafactor docstrings 6845 (sgugger)
- Fix resuming training for Windows 6847 (sgugger)
- Only access loss tensor every logging_steps 6802 (jysohn23)
- Marian distill scripts + integration test 6799 (sshleifer)
- Add checkpointing to Ray Tune HPO 6747 (krfricke)
- Split hp search methods 6857 (sgugger)
- Update ONNX notebook to include section on quantization. 6831 (mfuntowicz)
- Fix marian slow test 6854 (sshleifer)
- [s2s] command line args for faster val steps 6833 (sshleifer)
- Bart can make decoder_input_ids from labels 6758 (sshleifer)
- add a final report to all pytest jobs 6861 (stas00)
- Logging doc 6852 (sgugger)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. 6875 (mfuntowicz)
- [Generate] Facilitate PyTorch generate using `ModelOutputs` 6735 (patrickvonplaten)
- Add cache_dir to save features TextDataset 6879 (jysohn23)
- [Docs, Examples] Fix QA example for PT 6890 (patrickvonplaten)
- Update modeling_bert.py 6897 (parthe)
- [Electra] fix warning for position ids 6884 (patrickvonplaten)
- minor docs grammar fixes 6889 (harrywang)
- Fix error class instantiation 6634 (tamuhey)
- Output attention takes an s 6903 (sgugger)
- [testing] fix ambiguous test 6898 (stas00)
- test_tf_common: remove un_used mixin class parameters 6866 (PuneethaPai)
- Template updates 6914 (sgugger)
- Changed link to the correct paper in the second paragraph 6905 (sengl)
- tweak tar command in readme 6919 (brettkoonce)
- [s2s]: script to convert pl checkpoints to hf checkpoints 6911 (sshleifer)
- [s2s] allow task_specific_params=summarization_xsum 6923 (sshleifer)
- move wandb/comet logger init to train() to allow parallel logging 6850 (krfricke)
- [s2s] use --eval_beams command line arg 6926 (sshleifer)
- [s2s] support early stopping based on loss, rather than rouge 6927 (sshleifer)
- Fix mixed precision issue in TF DistilBert 6915 (chiapas)
- [docstring] misc arg doc corrections 6932 (stas00)
- [s2s] distill: --normalize_hidden --supervise_forward 6834 (sshleifer)
- [s2s] run_eval.py parses generate_kwargs 6948 (sshleifer)
- [doc] remove the implied defaults to :obj:`None`, s/True/ :obj:`True/, etc. 6956 (stas00)
- [s2s] warn if --fp16 for torch 1.6 6977 (sshleifer)
- feat: allow prefix for any generative model 5885 (borisdayma)
- Trainer with grad accum 6930 (sgugger)
- Cannot index `None` 6984 (LysandreJik)
- [docstring] missing arg 6933 (stas00)
- [testing] add dependency: parametrize 6958 (stas00)
- Fixed the default number of attention heads in Reformer Configuration 6973 (tznurmin)
- [gen utils] missing else case 6980 (stas00)
- match CI's version of flake8 6941 (stas00)
- Conversion scripts shouldn't have relative imports 6991 (LysandreJik)
- Add missing arguments for BertWordPieceTokenizer 5810 (monologg)
- fixed trainer tr_loss memory leak 6999 (StuartMesham)
- Floating-point operations logging in trainer 6768 (TevenLeScao)
- Fixing FLOPS merge by checking if torch is available 7013 (LysandreJik)
- [Longformer] Fix longformer documentation 7016 (patrickvonplaten)
- pegasus.rst: fix expected output 7017 (sshleifer)
- adding TRANSFORMERS_VERBOSITY env var 6961 (stas00)
- [generation] consistently add eos tokens 6982 (stas00)
- [from_pretrained] Allow tokenizer_type ≠ model_type 6995 (julien-c)
- replace torch.triu with onnx compatible code 6929 (HenryDashwood)
- Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence 6677 (LysandreJik)
- add -y to bypass prompt for transformers-cli upload 7035 (stas00)
- Fix confusing warnings during TF2 import from PyTorch 6623 (jcrocholl)
- Albert pretrain datasets/ datacollator 6168 (yl-to)
- Fix template 7040 (LysandreJik)
- Small fixes in tf template 7044 (sgugger)
- Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. 6594 (patrickvonplaten)
- fix to ensure that returned tensors after the tokenization is Long 7039 (GeetDsa)
- [BertGeneration] Correct Doc Title 7048 (patrickvonplaten)
- [BertGeneration, Docs] Fix another old name in docs 7050 (patrickvonplaten)
- [xlm tok] config dict: fix str into int to match definition 7034 (stas00)
- [s2s] --eval_max_generate_length 7018 (sshleifer)
- Fix CI with change of name of nlp 7054 (sgugger)
- [wip/s2s] DistributedSortishSampler 7056 (sshleifer)
- these tests require non-multigpu env 7059 (stas00)
- [BertGeneration] Clean naming 7068 (patrickvonplaten)
- Document the dependcy on datasets 7058 (sgugger)
- Automate the lists in auto-xxx docs 7061 (sgugger)
- Add tests and fix various bugs in ModelOutput 7073 (sgugger)
- Compute loss method 7074 (sgugger)
- [T5Tokenizer] remove prefix_tokens 7078 (patil-suraj)
- [s2s] run_eval supports --prefix clarg. 6953 (sshleifer)
- fix bug in pegasus converter 7094 (sshleifer)
- [s2s] two stage run_distributed_eval.py 7105 (sshleifer)
- Update xsum length penalty to better values 7107 (sshleifer)
- [s2s] distributed eval cleanup 7110 (sshleifer)
- [s2s distill] allow pegasus-12-12 7104 (sshleifer)
- Temporarily skip failing tests due to dependency change 7118 (LysandreJik)
- fix link to paper 7116 (btel)
- ignore FutureWarning in tests 7079 (stas00)
- fix deprecation warnings 7033 (stas00)
- [examples testing] restore code 7099 (stas00)
- Clean up autoclass doc 7081 (sgugger)
- Add Mirror Option for Downloads 6679 (JetRunner)
- [s2s] distributed eval in one command 7124 (sshleifer)
- [QOL] add signature for prepare_seq2seq_batch 7108 (sshleifer)
- Fix reproducible tests in Trainer 7119 (sgugger)
- [logging] remove no longer needed verbosity override 7100 (stas00)
- Fix TF Trainer loss calculation 6998 (chiapas)
- Add quotes to paths in MeCab arguments 7142 (polm)
- Multi predictions trainer 7126 (sgugger)
- fix ZeroDivisionError and epoch counting 7125 (chiapas)
- [EncoderDecoderModel] fix indentation error 7131 (patrickvonplaten)
- [docs] add testing documentation 7101 (stas00)
- Refactoring the TF activations functions 7150 (jplu)
- fix the warning message of overflowed sequence 7151 (xiye17)
- [doc] [testing] improve/expand the Parametrization section 7156 (stas00)
- Add empty random document case to DataCollatorForNextSentencePrediction 7161 (choidongyeon)
- [s2s run_eval] new features 7109 (stas00)
- use the correct add_start_docstrings 7174 (stas00)
- [s2s] distributed eval cleanup 7186 (sshleifer)
- remove duplicated code 7173 (stas00)
- remove deprecated flag 7171 (stas00)
- Transformer-XL: Remove unused parameters 7087 (RafaelWO)
- Trainer multi label 7191 (sgugger)
- Change to use relative imports in some files & Add python prompt symbols to example codes 7202 (soheeyang)
- [s2s] run_eval/run_eval_search tweaks 7192 (stas00)
- [s2s] dynamic batch size with --max_tokens_per_batch 7030 (sshleifer)
- [s2s] remove double assert 7223 (sshleifer)
- Add customized text to widget 7204 (mrm8488)
- Rewrites BERT in Flax to the new Linen API 7211 (marcvanzee)
- token-classification: update url of GermEval 2014 dataset 6571 (stefan-it)
- Fix a few countings (steps / epochs) in trainer_tf.py 7175 (chiapas)
- Add new pre-trained models BERTweet and PhoBERT 6129 (datquocnguyen)
- [s2s] distributed_eval.py saves better speed info 7242 (sshleifer)
- [testing doc] slow has to be last 7251 (stas00)
- examples/seq2seq/__init__.py mutates sys.path 7194 (stas00)
- [Bug fix] Fixed target_mapping preparation for XLNet (Pytorch) 7267 (guillaume-be)
- [example/glue] fix compute_metrics_fn for bart like models 7248 (patil-suraj)
- Disable missing weight warning for RobertaForMaskedLM/CamembertForMaskedLM 7282 (raphael0202)
- Fix 7284 7289 (sgugger)
- [s2s tests] fix test_run_eval_search 7297 (stas00)
- [s2s] s/alpha_loss_encoder/alpha_encoder_loss/ 7298 (stas00)
- [s2s] save hostname with repo info 7301 (sshleifer)
- Copy code from Bert to Roberta and add safeguard script 7219 (sgugger)
- Fix 7304 7305 (sgugger)
- Fix saving TF custom models 7291 (jplu)
- is_pretokenized -> is_split_into_words 7236 (sgugger)
- Add possibility to evaluate every epoch 7302 (sgugger)
- Support for Windows in check_copies 7316 (sgugger)
- Create an XLA parameter and fix the mixed precision 7311 (jplu)

3.1.0

Not secure

Pegasus, mBART, DPR, self-documented outputs and new pipelines

Pegasus

The Pegasus model from [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu, was added to the library in PyTorch.

Model implemented as a collaboration between Jingqing Zhang and sshleifer in 6340

- PegasusForConditionalGeneration (torch version) 6340
- add pegasus finetuning script 6811 [script](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune_pegasus_xsum.sh). (warning very slow)

DPR

The DPR model from [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih was added to the library in PyTorch.

- Add DPR model 5279 (lhoestq)
- Fix tests imports dpr 5576 (lhoestq)

DeeBERT

The DeeBERT model from [DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference](https://www.aclweb.org/anthology/2020.acl-main.204/) by Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin has been added to the `examples/` folder alongside its training script, in PyTorch.

- Add DeeBERT (entropy-based early exiting for *BERT) 5477 (Ji-Xin)

Self-documented outputs

As well as returning tuples, PyTorch and TensorFlow models now return a subclass of `ModelOutput` that is appropriate. A `ModelOutput` is a dataclass containing all model returns. This allows for easier inspection, and for self-documenting model outputs.

- Change model outputs types to self-document outputs 5438 (sgugger)
- Tf model outputs 6247 (sgugger)

Models return tuples by default, and return self-documented outputs if the `return_dict` configuration flag is set to `True` or if the `return_dict=True` keyword argument is passed to the forward/call method.

Summary of the behavior:
python
The new outputs are opt-in, you have to activate them explicitly with `return_dict=True`
Either at instantiation
model = BertForSequenceClassification.from_pretrained('bert-base-cased', return_dict=True)
Or when calling the model
output = model(**inputs, return_dict=True)

You can access the elements of the outputs with
(1) named attributes
loss = outputs.loss
logits = outputs.logits

(2) their names as strings like a dict
loss = outputs["loss"]
logits = outputs["logits"]

(3) their index as integers or slices in the pre-3.1.0 outputs tuples
loss = outputs[0]
logits = outputs[1]
loss, logits = outputs[:2]

One **breaking behavior** of these new outputs (which is the reason you have to opt-in to use these new outputs:
Iterating on the outputs now return the names (keys) instead of the values:
print([element for element in outputs])
>>> ['loss', 'logits']
Thus you cannot unpack the output like pre-3.1.0 (you get the string names instead of the values):
(But you can query a slice like indicated in (3) above)
loss_keys, logits_key = outputs

Encoder-Decoder framework
The encoder-decoder framework has been enhanced to allow more encoder decoder model combinations, *e.g.*:
Bert2Bert, Bert2GPT2, Roberta2Roberta, Longformer2Roberta, ....

- [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer 6411 (patrickvonplaten)
- [EncoderDecoder] Add Cross Attention for GPT2 6415 (patrickvonplaten)
- [EncoderDecoder] Add functionality to tie encoder decoder weights 6538 (patrickvonplaten)
- Multiple combinations of EncoderDecoder models have been fine-tuned and evaluated on CNN/Daily-Mail summarization: https://huggingface.co/models?search=cnn_dailymail-fp16 (patrickvonplaten)

TensorFlow as a first-class citizen

As we continue working towards having TensorFlow be a first-class citizen, we continually improve on our TensorFlow API and models.

- [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile 5395 (patrickvonplaten)
- [Benchmark] Add benchmarks for TF Training 5594 (patrickvonplaten)
Machine Translation

MarianMTModel

- [en-zh](https://huggingface.co/Helsinki-NLP/opus-mt-en-zh?text=My+name+is+Wolfgang+and+I+live+in+Berlin) and **357** other checkpoints for machine translation were added from the Helsinki-NLP group's Tatoeba Project (sshleifer + jorgtied). There are now > 1300 supported pairs for machine translation.
- Marian converter updates 6342 (sshleifer)
- Marian distill scripts + integration test 6799 (sshleifer)

mBART

The mBART model from [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) was can now be accessed through `MBartForConditionalGeneration`.

- Add mbart-large-cc25, support translation finetuning 5129 (sshleifer)
- [mbart] prepare_translation_batch passes **kwargs to allow DeprecationWarning 5581 (sshleifer)
- MBartForConditionalGeneration 6441 (patil-suraj)
- [fix] mbart_en_ro_generate test now identical to fairseq 5731 (sshleifer)
- [Doc] explaining romanian postprocessing for MBART BLEU hacking 5943 (sshleifer)
- [test] partial coverage for train_mbart_enro_cc25.sh 5976 (sshleifer)
- MbartTokenizer: do not hardcode vocab size 5998 (sshleifer)
- MBART: support summarization tasks where max_src_len > max_tgt_len 6003 (sshleifer)
- Fix 6096: MBartTokenizer's mask token 6098 (sshleifer)
- [s2s] Document better mbart finetuning command 6229 (sshleifer)
- mBART Conversion script 6230 (sshleifer)
- [s2s] add BartTranslationDistiller for distilling mBART 6363 (sshleifer)
- [Doc] add more MBart and other doc 6490 (patil-suraj)

examples/seq2seq

- examples/seq2seq/finetune.py supports --task translation
- All sequence to sequence tokenizers (T5, Bart, Marian, Pegasus) expose a `prepare_seq2seq_batch` method that makes batches for sequence to sequence trianing.

PRs:

- Seq2SeqDataset uses linecache to save memory 5792 (Pradhy729)
- [examples/seq2seq]: add --label_smoothing option 5919 (sshleifer)
- seq2seq/run_eval.py can take decoder_start_token_id 5949 (sshleifer)
- [examples (seq2seq)] fix preparing decoder_input_ids for T5 5994 (patil-suraj)
- [s2s] add support for overriding config params 6149 (stas00)
- s2s: fix LR logging, remove some dead code. 6205 (sshleifer)
- [s2s] tiny QOL improvement: run_eval prints scores 6341 (sshleifer)
- [s2s] fix label_smoothed_nll_loss 6344 (patil-suraj)
- [s2s] fix --gpus clarg collision 6358 (sshleifer)
- [s2s] Script to save wmt data to disk 6403 (sshleifer)
- rename prepare_translation_batch -> prepare_seq2seq_batch 6103 (sshleifer)
- Mult rouge by 100: standard units 6359 (sshleifer)
- allow spaces in bash args with "$" 6521 (sshleifer)
- [seq2seq] MAX_LEN env var for MT commands 5837 (sshleifer)
- [seq2seq] distillation.py accepts trainer arguments 5865 (sshleifer)
- [s2s]Use prepare_translation_batch for Marian finetuning 6293 (sshleifer)
- [BartTokenizer] add prepare s2s batch 6212 (patil-suraj)
- [T5Tokenizer] add prepare_seq2seq_batch method 6122 (patil-suraj)
- [s2s] round runtime in run_eval 6798 (sshleifer)
- [s2s README] Add more dataset download instructions 6737 (sshleifer)
- [s2s] round bleu, rouge to 4 digits 6704 (sshleifer)
- [s2s] command line args for faster val steps 6833

New documentation

Several new documentation pages have been added and older documentation has been tweaked to be more accurate and understandable. An open in colab button has been added on the tutorial pages.

- Guide to fixed-length model perplexity evaluation 5449 (joeddav)
- Improvements to PretrainedConfig documentation 5642 (sgugger)
- Document model outputs 5673 (sgugger)
- docs(wandb): explain how to use W&B integration 5607 (borisdayma)
- Model utils doc 6005 (sgugger)
- ONNX documentation 5992 (mfuntowicz)
- Tokenizer documentation 6110 (sgugger)
- Pipeline documentation 6175 (sgugger)
- Encoder decoder config docs 6195 (afcruzs)
- Colab button 6389 (sgugger)
- Generation documentation 6470 (sgugger)
- Add custom datasets tutorial 6466 (joeddav)
- Logging documentation 6852 (sgugger)

Trainer updates

New additions to the `Trainer`

- Added data collator for permutation (XLNet) language modeling and related calls 5522 (shngt)
- Trainer support for iterabledataset 5834 (Pradhy729)
- Adding PaddingDataCollator 6442 (sgugger)
- Add hyperparameter search to Trainer 6576 (sgugger)
- [examples] Add trainer support for question-answering 4829 (patil-suraj)
- Adds comet_ml to the list of auto-experiment loggers 6176 (dsblank)
- Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task 6644 (HuangLianzhe)

New models & model architectures

The following model architectures have been added to the library

- FlaubertForTokenClassification 5644 (stas00)
- TFXLMForTokenClassification 5614 (LysandreJik)
- TFXLMForMultipleChoice 5614 (LysandreJik)
- TFFlaubertForTokenClassification 5614 (LysandreJik)
- TFFlaubertForMultipleChoice 5614 (LysandreJik)
- TFElectraForSequenceClassification 6227 (jplu)
- TFElectraForMultipleChoice 6227 (jplu)
- TF Longformer 5764 (patrickvonplaten)
- CamembertForCausalLM 6577 (patil-suraj)

Regression testing on TPU & TPU CI

Thanks to zcain117 we now have access to TPU CI for the PyTorch/xla framework. This enables regression testing on the TPU aspects of the `Trainer`, and offers very simple regression testing on model training performance.

- Test XLA examples 5583
- Add setup for TPU CI to run every hour. 6219 (zcain117)
- Add missing docker arg for TPU CI. 6393 (zcain117)
- Get GKE logs via kubectl logs instead of gcloud logging read. 6446 (zcain117)

New pipelines

New pipelines have been added:

- Zero shot classification pipeline 5760 (joeddav)
- Addition of a DialoguePipeline 5516 (guillaume-be)
- Add targets arg to fill-mask pipeline 6239 (joeddav)

Community notebooks

- [Fine-tune Electra and interpret with Integrated Gradients](https://github.com/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb) #6321 (elsanns)
- Update ONNX notebook to include section on quantization. 6831 (mfuntowicz)

Centralized logging

Logging is now centralized. The library offers methods to handle the verbosity level of all loggers contained in the library. [Link to logging doc here]:

- Centralize logging 6434 (LysandreJik)

Bug fixes and improvements

- [Reformer] Adapt Reformer MaskedLM Attn mask 5560 (patrickvonplaten)
- Make T5 compatible with ONNX 5518 (abelriboulot)
- [Bart] enable test_torchscript, update test_tie_weights 5457 (sshleifer)
- [docs] fix model_doc links in model summary 5566 (patil-suraj)
- [Benchmark] Readme for benchmark 5363 (patrickvonplaten)
- Fix Inconsistent NER Grouping (Pipeline) 4987 (enzoampil)
- QA pipeline BART compatible 5496 (mfuntowicz)
- More explicit error when failing to tensorize overflowing tokens 5633 (LysandreJik)
- Should check that torch TPU is available 5636 (LysandreJik)
- Add forum link in the docs 5637 (sgugger)
- Fixed TextGenerationPipeline on torch + GPU 5629 (TevenLeScao)
- Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) 5632 (TevenLeScao)
- [squad] add version tag to squad cache 5669 (lazovich)
- Deprecate old past arguments 5671 (sgugger)
- Pipeline model type check 5679 (JetRunner)
- rename the functions to match the rest of the test convention 5692 (stas00)
- doc improvements 5688 (stas00)
- Fix Trainer in DataParallel setting 5685 (sgugger)
- [Longformer] fix longformer global attention output 5659 (patrickvonplaten)
- [Fix] github actions CI by reverting 5138 5686 (sshleifer)
- [Reformer classification head] Implement the reformer model classification head for text classification 5198 (as-stevens)
- Cleanup bart caching logic 5640 (sshleifer)
- [AutoModels] Fix config params handling of all PT and TF AutoModels 5665 (patrickvonplaten)
- [cleanup] T5 test, warnings 5761 (sshleifer)
- [fix] T5 ONNX test: model.to(torch_device) 5769 (mfuntowicz)
- [Benchmark] fix benchmark non standard model 5801 (patrickvonplaten)
- [Benchmark] Fix models without `architectures` param in config 5808 (patrickvonplaten)
- [Longformer] fix longformer slow-down 5811 (patrickvonplaten)
- [seq2seq] pack_dataset.py rewrites dataset in max_tokens format 5819 (sshleifer)
- [seq2seq] Don't copy self.source in sortishsampler 5818 (sshleifer)
- [cleanups] make Marian save as Marian 5830 (sshleifer)
- [Reformer] - Cache hidden states and buckets to speed up inference 5578 (patrickvonplaten)
- Lightning Updates for v0.8.5 5798 (nateraw)
- Update tokenizers to 0.8.1.rc to fix Mac OS X issues 5867 (sepal)
- Xlnet outputs 5883 (TevenLeScao)

- DataParallel fixes 5733 (stas00)
- [cleanup] squad processor 5868 (sshleifer)
- Improve doc of use_cache 5912 (sgugger)
- [Fix] seq2seq pack_dataset.py actually packs 5913 (sshleifer)
- Add AlbertForPretraining to doc 5914 (sgugger)
- DataParallel fix: multi gpu evaluation 5926 (csarron)
- Clarify arg class 5916 (sgugger)

- [CI] self-scheduled runner tests examples/ 5927 (sshleifer)
- Update doc to new model outputs 5946 (sgugger)
- [CI] Install examples/requirements.txt 5956 (sshleifer)
- Expose padding_strategy on squad processor to fix QA pipeline performance regression 5932 (mfuntowicz)
- [docs] Add integration test example to copy pasta template 5961 (sshleifer)
- Cleanup Trainer and expose customization points 5982 (sgugger)
- Avoid unnecessary warnings when loading pretrained model 5922 (sgugger)
- Ensure OpenAI GPT position_ids is correctly initialized and registered at init. 5773 (mfuntowicz)
- [CI] Don't test apex 6021 (sshleifer)
- add a summary report flag for run_examples on CI 6035 (stas00)
- don't complain about missing W&B when WANDB_DISABLED=true 6036 (stas00)
- Allow to set Adam beta1, beta2 in TrainingArgs 5592 (gonglinyuan)
- Fix the return documentation rendering for all model outputs 6022 (sgugger)

- Fix typo (model saving TF) 5734 (Colanim)
- Add new AutoModel classes in pipeline 6062 (patil-suraj)
- [pack_dataset] don't sort before packing, only pack train 5954 (sshleifer)
- CL util to convert models to fp16 before upload 5953 (sshleifer)
- Add fire to setup.cfg to make isort happy 6066 (sgugger)
- [fix] no warning for position_ids buffer 6063 (sshleifer)
- Pipelines should use tuples instead of namedtuples 6061 (LysandreJik)
- Moving
transformers package import statements to relative imports in some files 5796 (afcruzs)
- github issue template suggests who to tag 5790 (sshleifer)
- Make all data collators accept dict 6065 (sgugger)
- Add inference widget examples 5825 (clmnt)
- [s2s] Delete useless method, log tokens_per_batch 6081 (sshleifer)
- Logs should not be hidden behind a logger.info 6097 (LysandreJik)
- Fix zero-shot pipeline single seq output shape 6104 (joeddav)
- [fix] add bart to LM_MAPPING 6099 (sshleifer)
- [Fix] position_ids tests again 6100 (sshleifer)
- Fix deebert tests 6102 (sshleifer)
- Use FutureWarning to deprecate 6111 (sgugger)
- Added capability to quantize a model while exporting through ONNX. 6089 (mfuntowicz)
- XLNet PLM Readme 6121 (LysandreJik)
- Fix TF CTRL model naming 6134 (jplu)
- Use google style to document properties 6130 (sgugger)
- Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} 5614
- Rework TF trainer 6038 (jplu)
- Actually the extra_id are from 0-99 and not from 1-100 5967 (orena1)
- add another e.g. to avoid confusion 6055 (orena1)
- Tf trainer cleanup 6143 (sgugger)
- Switch from return_tuple to return_dict 6138 (sgugger)
- Fix FlauBERT GPU test 6142 (LysandreJik)
- Enable ONNX/ONNXRuntime optimizations through converter script 6131 (mfuntowicz)
- Add Pytorch Native AMP support in Trainer 6151 (prajjwal1)
- enable easy checkout switch 5645 (stas00)
- Replace mecab-python3 with fugashi for Japanese tokenization 6086 (polm)
- parse arguments from dict 4869 (patil-suraj)
- Harmonize both Trainers API 6157 (sgugger)
- Model output test 6155 (sgugger)
- [s2s] clean up + doc 6184 (stas00)
- Add script to convert BERT tf2.x checkpoint to PyTorch 5791 (mar-muel)
- Empty assert hunt 6056 (TevenLeScao)
- Fix saved model creation 5468 (jplu)
- Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. 5331 (jaymody)
- [DataCollatorForLanguageModeling] fix labels 6213 (patil-suraj)
- Fix _shift_right function in TFT5PreTrainedModel 6214 (maurice-g)
- Remove outdated BERT tips 6217 (JetRunner)
- run_hans label fix 6221 (VictorSanh)
- Make the order of additional special tokens deterministic 5704 (gonglinyuan)
- cleanup torch unittests 6196 (stas00)
- test_tokenization_common.py: Remove redundant coverage 6224 (sshleifer)
- [Reformer] fix reformer fp16 test 6237 (patrickvonplaten)
- [Reformer] Make random seed generator available on random seed and not on model device 6244 (patrickvonplaten)
- Update to match renamed attributes in fairseq master 5972 (LilianBordeau)
- [WIP] lightning_base: support --lr_scheduler with multiple possibilities 6232 (stas00)
- Trainer + wandb quality of life logging tweaks 6241 (TevenLeScao)
- Add strip_accents to basic BertTokenizer. 6280 (PhilipMay)
- Argument to set GPT2 inner dimension 6296 (TevenLeScao)
- [Reformer] fix default generators for pytorch < 1.6 6300 (patrickvonplaten)
- Remove redundant line in run_pl_glue.py 6305 (xujiaze13)
- [Fix] text-classification PL example 6027 (bhashithe)
- fix the shuffle agrument usage and the default 6307 (stas00)
- CI dependency wheel caching 6287 (LysandreJik)
- Patch GPU failures 6281 (LysandreJik)
- fix consistency CrossEntropyLoss in modeling_bart 6265 (idoh)
- Add a script to check all models are tested and documented 6298 (sgugger)
- Fix the tests for Electra 6284 (jplu)
- [examples] consistently use --gpus, instead of --n_gpu 6315 (stas00)

- refactor almost identical tests 6339 (stas00)
- Small docfile fixes 6328 (sgugger)
- Patch models 6326 (LysandreJik)
- Ci GitHub caching 6382 (LysandreJik)
- Fix links for open in colab 6391 (sgugger)
- [EncoderDecoderModel] add a `add_cross_attention` boolean to config 6377 (patrickvonplaten)

- Feed forward chunking 6024 (Pradhy729)
- add pl_glue example test 6034 (stas00)
- testing utils: capturing std streams context manager 6231 (stas00)
- Fix tokenizer saving and loading error 6026 (yobekiko)
- Warn if debug requested without TPU 6390 (dmlap)
- [Performance improvement] "Bad tokens ids" optimization 6064 (guillaume-be)
- pl version: examples/requirements.txt is single source of truth 6309 (stas00)
- [s2s] wmt download script use less ram 6405 (stas00)

- [pl] restore lr logging behavior for glue, ner examples 6314 (stas00)
- lr_schedulers: add get_polynomial_decay_schedule_with_warmup 6361 (stas00)
- [examples] add pytest dependency 6425 (sshleifer)
- [test] replace capsys with the more refined CaptureStderr/CaptureStdout 6422 (stas00)
- Fixes to make life easier with the nlp library 6423 (sgugger)
- Move prediction_loss_only to TrainingArguments 6426 (sgugger)
- Activate check on the CI 6427 (sgugger)
- cleanup tf unittests: part 2 6260 (stas00)
- Fix docs and bad word tokens generation_utils.py 6387 (ZhuBaohe)
- Test model outputs equivalence 6445 (LysandreJik)
- add LongformerTokenizerFast in AutoTokenizer 6463 (patil-suraj)
- add BartTokenizerFast in AutoTokenizer 6464 (patil-suraj)
- Add POS tagging and Phrase chunking token classification examples 6457 (vblagoje)

- Clean directory after script testing 6453 (JetRunner)
- Use hash to clean the test dirs 6475 (JetRunner)
- Sort unique_no_split_tokens to make it deterministic 6461 (lhoestq)
- Fix TPU Convergence bug 6488 (jysohn23)
- Support additional dictionaries for BERT Japanese tokenizers 6515 (singletongue)
- [doc] Summary of the models fixes 6511 (stas00)
- Remove deprecated assertEquals 6532 (JetRunner)
- [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() 6494 (stas00)
- [sched] polynomial_decay_schedule use default power=1.0 6473 (stas00)
- Fix flaky ONNX tests 6531 (mfuntowicz)

- [doc] make the text more readable, fix some typos, add some disambiguation 6508 (stas00)

- [doc] multiple corrections to "Summary of the tasks" 6509 (stas00)
- replace _ with __ rst links 6541 (stas00)
- Fixed label datatype for STS-B 6492 (amodaresi)
- fix incorrect codecov reports 6553 (stas00)
- [docs] Fix wrong newline in the middle of a paragraph 6573 (romainr)
- [docs] Fix number of 'ug' occurrences in tokenizer_summary 6574 (romainr)
- add BartConfig.force_bos_token_to_be_generated 6526 (sshleifer)
- Fix bart base test 6587 (sshleifer)
- Feed forward chunking others 6365 (Pradhy729)
- tf generation utils: remove unused kwargs 6591 (sshleifer)
- [BartTokenizerFast] add prepare_seq2seq_batch 6543 (patil-suraj)
- [doc] lighter 'make test' 6512 (stas00)
- [docs] Copy code button misses '...' prefixed code 6518 (romainr)
- removed redundant arg in prepare_inputs 6614 (prajjwal1)
- add intro to nlp lib & dataset links to custom datasets tutorial 6583 (joeddav)
- Add tests to Trainer 6605 (sgugger)
- TFTrainer dataset doc & fix evaluation bug 6618 (joeddav)
- Add tests/test_tokenization_reformer.py 6485 (D-Roberts)
- [Tests] fix attention masks in Tests 6621 (patrickvonplaten)
- XLNet Bug when training with apex 16-bit precision 6567 (johndolgov)
- Move threshold up for flaky test with Electra 6622 (sgugger)
- Regression test for pegasus bugfix 6606 (sshleifer)
- Trainer automatically drops unused columns in nlp datasets 6449 (sgugger)
- [Docs model summaries] Add pegasus to docs 6640 (patrickvonplaten)
- [Doc model summary] add MBart model summary 6649 (patil-suraj)
- Specify config filename in HfArgumentParser 6626 (jarednielsen)
- Don't reset the dataset type + plug for rm unused columns 6683 (sgugger)
- Fixed DataCollatorForLanguageModeling not accepting lists of lists 6685 (TevenLeScao)
- Update repo to isort v5 6686 (sgugger)
- Fix PL token classification examples 6682 (vblagoje)
- Lat fix for Ray HP search 6691 (sgugger)
- Create PULL_REQUEST_TEMPLATE.md 6660 (stas00)
- [doc] remove BartForConditionalGeneration.generate 6659 (stas00)
- Move unused args to kwargs 6694 (sgugger)
- [fixdoc] Add import to pegasus usage doc 6698 (sshleifer)
- Fix hyperparameter_search doc 6695 (sgugger)
- Remove hard-coded uses of float32 to fix mixed precision use 6648 (schmidek)
- Add DPR to models summary 6690 (lhoestq)
- Add typing.overload for convert_ids_tokens 6637 (tamuhey)
- Allow tests in examples to use cuda or fp16,if they are available 5512 (Joel-hanson)
- ci/gh/self-scheduled: add newline to make examples tests run even if src/ tests fail 6706 (sshleifer)
- Use separate tqdm progressbars 6696 (sgugger)
- More tests to Trainer 6699 (sgugger)
- Add tokenizer to Trainer 6689 (sgugger)
- tensor.nonzero() is deprecated in PyTorch 1.6 6715 (mfuntowicz)
- [Albert] Add position ids to allowed uninitialized weights 6719 (patrickvonplaten)
- Fix ONNX test_quantize unittest 6716 (mfuntowicz)
- [squad] make examples and dataset accessible from SquadDataset object 6710 (lazovich)
- Fix pegasus-xsum integration test 6726 (sshleifer)
- T5Tokenizer adds EOS token if not already added 5866 (sshleifer)
- Install nlp for github actions test 6728 (sgugger)
- [Torchscript] Fix docs 6740 (patrickvonplaten)
- Add "tie_word_embeddings" config param 6692 (patrickvonplaten)
- Fix tf boolean mask in graph mode 6741 (JayYip)
- Fix TF optimizer 6717 (jplu)
- [TF Longformer] Improve Speed for TF Longformer 6447 (patrickvonplaten)
- add __init__.py to utils 6754 (joeddav)
- [s2s] run_eval.py QOL improvements and cleanup 6746 (sshleifer)
- s2s distillation uses AutoModelForSeqToSeqLM 6761 (sshleifer)
- Add AdaFactor optimizer from fairseq 6722 (moscow25)
- Adds Adafactor to the docs and slightly fixes the formatting 6765 (LysandreJik)
- Fix the TF Trainer gradient accumulation and the TF NER example 6713 (jplu)
- Fix run_squad.py to work with BART 6756 (tomgrek)
- Add NLP install to self-scheduled CI 6767 (sshleifer)
- [testing] replace hardcoded paths to allow running tests from anywhere 6523 (stas00)
- [test schedulers] adjust to test the first step's reading 6429 (stas00)
- new Makefile target: docs 6510 (stas00)
- [transformers-cli] fix logger getter 6777 (stas00)
- PL: --adafactor option 6776 (sshleifer)
- [style] set the minimal required version for `black` 6784 (stas00)
- Transformer-XL: Improved tokenization with sacremoses 6322 (RafaelWO)
- prepare_seq2seq_batch makes labels/ decoder_input_ids made later. 6654 (sshleifer)
- t5 model should make decoder_attention_mask 6800 (sshleifer)
- [s2s] Test hub configs in self-scheduled CI 6809 (sshleifer)
- [bart] rename self-attention -> attention 6708 (sshleifer)
- [tests] fix typos in inputs 6818 (stas00)
- Fixed open in colab link 6825 (PandaWhoCodes)
- clarify shuffle 6312 (xujiaze13)
- TF Flaubert w/ pre-norm 6841 (LysandreJik)
- Fix resuming training for Windows 6847 (sgugger)
- Only access loss tensor every logging_steps 6802 (jysohn23)
- Add checkpointing to Ray Tune HPO 6747 (krfricke)
- Split hp search methods 6857 (sgugger)
- Fix marian slow test 6854 (sshleifer)
- Bart can make decoder_input_ids from labels 6758 (sshleifer)
- add a final report to all pytest jobs 6861 (stas00)
- Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. 6875 (mfuntowicz)
- [Generate] Facilitate PyTorch generate using `ModelOutputs` 6735 (patrickvonplaten)

3.0.2

Not secure

Tokenizer fixes

Fixes bugs introduced by v3.0.0 and v3.0.1 in tokenizers.

3.0.1

Not secure

Better backward-compatibility for tokenizers following v3.0.0 refactoring

Version v3.0.0, included a refactoring of the tokenizers' backend to allow a simpler and more flexible [user-facing API](https://huggingface.co/transformers/preprocessing.html).

This refactoring was conducted with a particular focus on keeping backward compatibility for the v2.X encoding, truncation and padding API but still led to two breaking changes that could have been avoided.

This patch aims to bring back better backward compatibility, by implementing the following updates:
- the `prepare_for_model` method is now publicly exposed again for both slow and fast tokenizers with an API compatible with both the v2.X truncation/padding API and [the v3.0 recommended API](https://huggingface.co/transformers/preprocessing.html).
- the truncation strategy now defaults again to `longest_first` instead of `first_only`.

Bug fixes and improvements:
- Better support for TransfoXL tokenizer when using TextGenerationPipeline https://github.com/huggingface/transformers/pull/5465 (TevenLeScao)
- Fix use of meme Transformer-XL generations https://github.com/huggingface/transformers/pull/4826 (tommccoy1)
- Fixing a bug in the NER pipeline which lead to discarding the last identified entity https://github.com/huggingface/transformers/pull/5439 (mfuntowicz and enzoampil)
- Better QAPipelines https://github.com/huggingface/transformers/pull/5429 (mfuntowicz)
- Add Question-Answering and MLM heads to the Reformer model https://github.com/huggingface/transformers/pull/5433 (patrickvonplaten)
- Refactoring the LongFormer https://github.com/huggingface/transformers/pull/5219 (patrickvonplaten)
- Various fixes on tokenizers and tests (sshleifer)
- Many improvements to the doc and tutorials (sgugger)
- Fix TensorFlow dataset generator in run_glue https://github.com/huggingface/transformers/pull/4881 (jplu)
- Update Bertabs example to work again https://github.com/huggingface/transformers/pull/5355 (MichaelJanz)
- Move GenerationMixin to separate file https://github.com/huggingface/transformers/pull/5254 (yjernite)

3.0.0

Not secure

New tokenizer API, TensorFlow improvements, enhanced documentation & tutorials

Breaking changes since `v2`

- In 4874 the language modeling BERT has been split in two: `BertForMaskedLM` and `BertLMHeadModel`. `BertForMaskedLM` therefore cannot do causal language modeling anymore, and cannot accept the `lm_labels` argument.
- The `Trainer` data collator is now a method instead of a class
- Directly setting a tokenizer special token attributes (e.g. `tokenizer.mask_token = '<mask>'` now only associate the token to the attribute of the tokenizer but doesn't add the token to the vocabulary if it is not in the vocabulary. Tokens are only added by using the `tokenizer.add_special_tokens()` and `tokenizer.add_tokens()` methods
- The `prepare_for_model` method was removed as part of the new tokenizer API.
- The truncation method is now `only_first` by default.

New Tokenizer API (n1t0, thomwolf, mfuntowicz)

The tokenizers has evolved quickly in version 2, with the addition of rust tokenizers. It now has a simpler and more flexible API aligned between Python (slow) and Rust (fast) tokenizers. This new API let you control truncation and padding deeper allowing things like dynamic padding or padding to a multiple of 8.

The redesigned API is explained in detail here 4510 and here: https://huggingface.co/transformers/master/preprocessing.html

Notable changes:

- it's now possible to truncate to the max input length of a model while padding the longest sequence in a batch
- padding and truncation are decoupled and easier to control
- it's possible to pad to a multiple of a predefined length, e.g. 8 which can give significant speeds up on recent NVIDIA GPU (V100)
- a generic wrapper using `tokenizer.__call__` can be used for all case (single sequence, pair of sequences to groups, batches, etc...)
- tokenizers now accept pre-tokenized inputs (when the input is already split in word strings e.g. for NER)
- All the Rust tokenizers are now fully tested like slow tokenizers
- A new class `AddedToken` can be used to have a more fine-grained control on how added tokens behave during tokenization. In particular the user can control (1) whether left and right spaces are removed around the token during tokenization (2) whether the token will be identified inside another word and (3) whether the token will be recognized in normalized forms (e.g. in lower case if the tokenizer uses lower-casing)
- Serialization issues where fixed
- Possiblity to create NumPy tensors when using `return_tensors` parameter on tokenizers.
- Introduced a new enum `TensorType` to map all the possible tensor backends we support: `TensorType.TENSORFLOW`, `TensorType.PYTORCH`, `TensorType.NUMPY`
- Tokenizers now accept `TensorType` enum on `encode(...)`, `encode_plus(...)`, `batch_encode_plus(...)` tokenizer method for `return_tensors` parameters.
- `BatchEncoding` new property `is_fast` indicates if the `BatchEncoding` comes from a Python (slow) tokenizer or a Rust (fast) tokenizer.
- Slow and Fast Tokenizers are now picklable. So is their output, the dict sub-class `BatchEncoding`.

Several PRs to make the API more stable have been made:

- [tokenizers] Fix 5081 and improve backward compatibility 5125 (thomwolf)
- Tokenizers API developments 5103 (thomwolf)
- Clearer error message in the use-case of 5169 (thomwolf)
- Add more tests on tokenizers serialization - fix bugs 5056 (thomwolf)
- [Tokenization] Fix 5181 - make 5155 more explicit - move back the default logging level in tests to WARNING 5252 (thomwolf)
- [tokenizers] Several small improvements and bug fixes 5287
- Add `pad_to_multiple_of` on tokenizers (reimport) 5054 (mfuntowicz)
- [tokenizers] Updates data processors, docstring, examples and model cards to the new API 5308

TensorFlow improvements (jplu, dzorlu, LysandreJik)

Very big release for TensorFlow!
- TensorFlow models can now compute the loss themselves, using the `TFPretrainedModel.compute_loss` method. 4530
- Can now resize token embeddings in TensorFlow 4351
- Cleaning TensorFlow models 5229

Enhanced documentation (sgugger)

We welcome sgugger as a team member in New York. He already introduced a lot of very cool documentation changes:

- Added a [model summary](https://huggingface.co/transformers/master/model_summary.html) #4789
- Expose classes used in documentation 4808
- Explain how to preview the docs in a PR 4795
- Clean documentation 4849
- Remove old doc page and add note about cache in installation 5027
- Fix all sphynx warnings 5068 (sgugger)
- Update pipeline examples to doctest syntax 5030
- Reorganize documentation 5064
- Update installation page and add contributing to the doc 5084
- Update glossary 5148
- Quick tour 5145
- Switch master/stable doc and add older releases 5193
- Add version control menu 5222
- Don't recreate old docs 5243
- Tokenization tutorial 5257
- Remove links for all docs 5280
- New model sharing tutorial 5323

Training & fine-tuning quickstart

- Our own joeddav added a training & fine-tuning quickstart to the documentation 5034!

MobileBERT

The MobileBERT from [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou, was added to the library for both PyTorch and TensorFlow.

A single checkpoint is added: `mobilebert-uncased` which is the `uncased_L-24_H-128_B-512_A-4_F-4_OPT` checkpoint converted to our API.

This model was first implemented in PyTorch by lonePatient, ported to the library by vshampor, then finalized and implemented in Tensorflow by LysandreJik.

Eli5 examples (yjernite) 4968

- The examples/eli5 folder contains training code for the dense retriever and to fine-tune a BART model, the jupyter notebook for the blog post, and the code for the live demo.

- The RetriBert model implements the dense passage retriever. It's basically a wrapper for two Bert models and projection matrices, but it does gradient checkpointing in a way that is very different from a concurrent PR and Yacine thought it would be easier to write its own class for now and see if we can merge into the BART code later.

Enhanced examples/seq2seq (sshleifer)

- the `examples/seq2seq` [folder](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md) is a combination of the old `examples/summarization` and `examples/translation` folders.
- Finetuning works well for summarization, more experiments needed for translation. Finetuning works on multi-gpu, saves rouge scores during validation, and provides `--freeze_encoder` and `--freeze_embeds` options. These options make finetuning BART 5x faster on the cnn/dailymail dataset.
- Distillbart code is added in distillation.py. It only supports summarization, for now.
- Evaluation works well for both summarization and translation.
- New weights and biases [shared task](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md#xsum-shared-task) for collaboration on the XSUM summarization task

Distilbart (sshleifer)
- Distilbart models are smaller versions of `bart-large-cnn` and `bart-large-xsum`. They can be loaded using `BartForConditionalGeneration.from_pretrained('sshleifer/distilbart-xsum-12-6')`, for example See this [tweet](https://twitter.com/sam_shleifer/status/1276160367853547522?s=20) for more info on available models and their speed/performance.
- Commands to reproduce are available in the `examples/seq2seq` [folder](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md)

BERT Loses Patience (JetRunner)

Add BERT Loses Patience (Patience-based Early Exit) based on the paper https://arxiv.org/abs/2006.04152 and the official implementation https://github.com/JetRunner/PABEE

Unifying `label` arguments (sgugger) 4722

- Deprecate any argument that's not `labels` (like `masked_lm_labels`, `lm_labels`, etc.) to `labels`.

NumPy type in tokenizers (mfuntowicz) 4585

Introduce a new tensor type for return_tensors on tokenizer for NumPy.

- As we're introducing more than two tensor backend alternatives I created an enum TensorType listing all the possible tensor we can create TensorType.TENSORFLOW, TensorType.PYTORCH, TensorType.NUMPY. This might help newcomers who don't know about "tf", "pt".
*Note: TensorType are compatible with previous "tf", "pt" and now "np" str to allow backward compatibility (+unittest)*

- Numpy is now a possible target when creating tensors. This is usefull for JAX.

Community notebooks

- Adding notebooks for Fine Tuning 4732 (abhimishra91):
- Multi-class classification: Using DistilBert
- Multi-label classification: Using Bert
- Summarization: Using T5 - Model Tracking with WandB
- [Speed up Fine-Tuning in Transformers with Dynamic Padding / Bucketing](https://github.com/ELS-RD/transformers-notebook/blob/master/Divide_Hugging_Face_Transformers_training_time_by_2_or_more.ipynb) #5195 (pommedeterresautee)
- [How to use Benchmarks](https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb) (patrickvonplaten) #5312

Benchmarks (patrickvonplaten)

The benchmark script was consolidated and some features were added:

Adds the functionality to measure the following functionalities for TF and PT (4912):

- Tensorflow:
- Inference: CPU, GPU, GPU + XLA, GPU + eager mode, CPU + eager mode, TPU
- PyTorch:
- Inference: CPU, CPU + torchscript, GPU, GPU + torchscript, GPU + mixed precision, Torch/XLA TPU
- Training: CPU, GPU, GPU + mixed precision, Torch/XLA TPU

- [Benchmark] Add encoder decoder to benchmark and clean labels 4810
- [Benchmark] add tpu and torchscipt for benchmark 4850
- [Benchmark] Extend Benchmark to all model type extensions 5241
- [Benchmarks] improve Example Plotter 5245

Hidden states, attentions and cache

Before v3.0.0, the way to handle attentions, model hidden states, and whether to use the cache in models that have it for sequential decoding was to specify an argument in the configuration. In version v3.0.0, while we do maintain that argument for backwards compatibility, we introduce a new way of handling these through the `forward` and `call` methods.

- Output attentions 4538 (Bharat123rox)
- Output hidden states 4978 (drjosephliu)
- Use cache 5194 (patrickvonplaten)

Revamped `AutoModel`s (patrickvonplaten)

The `AutoModelWithLMHead` encompasses all models with a language modeling head, not making the distinction between causal, masked and seq2seq models. Three new auto models are added:

- `AutoModelForCausalLM` for Autoregressive models
- `AutoModelForMaskedLM` for Autoencoding models
- `AutoModelForSeq2SeqCausalLM` for Sequence-to-sequence models with causal LM for the decoder

New model & tokenizer architectures

- XLMRobertaForQuestionAnswering 4855 (sgugger)
- ElectraForQuestionAnswering 4913 (patil-suraj)
- Add AlbertForMultipleChoice 4959 (sgugger)
- BartForQuestionAnswering 4908 (patil-suraj)
- BartTokenizerFast 4878 (patil-suraj)
- Add DistilBertForMultipleChoice 5032 (sgugger)
- ElectraForMultipleChoice 4954 (sgugger)

ONNX

- Fixed a bug causing invalid ordering of the inputs in the underlying ONNX IR.
- Increased logging to giv ethe user more information about the exported variables.

Bug fixes and improvements

- TFRobertaModelIntegrationTest requires tf 4726 (sshleifer)
- Cleanup glue for TPU 4621 (jysohn23)
- [Reformer] Improved memory if input is shorter than chunk length 4720 (patrickvonplaten)
- Pipelines: miscellanea of QoL improvements and small features 4632 (julien-c)
- Fix bug when changing the <EOS> token for generate 4745 (patrickvonplaten)
- never_split on slow tokenizers should not split 4723 (mfuntowicz)
- PretrainedModel.generate: remove unused kwargs 4761 (sshleifer)
- Codecov is now setup differently to have better insights into code coverage 4768 (LysandreJik)
- Don't access pad_token_id if there is no pad_token 4773 (sgugger)
- Removed deprecated use of Variable API from pplm example 4619 (prajjwal1)
- Add drop_last arg for data loader 4757 4925 (setu4993)
- No silent error when XLNet's `d_head` is already in the configuration 4747 (LysandreJik)
- MarianTokenizer: delete unused constants 4802 (sshleifer)
- NER: Add new WNUT’17 example 4681 (stefan-it)
- [EncoderDecoderConfig] automatically set decoder config to decoder 4809 (patrickvonplaten)
- Add matplotlib to known 3rd party dependencies 4800 (sshleifer)
- Pipelines test and new kwarg 4812 (sshleifer)
- Updated path "cd examples/text-generation/pplm" 4778 (Mr-Ruben)
- [marian tests] pass device to pipeline 4815 (sshleifer)
- Export PretrainedBartModel from __init__ 4819 (BramVanroy)
- Updates args in tf squad example. 4820 (daniel-shan)
- [Generate] beam search should generate without replacement (patrickvonplaten)
- TFTrainer: Align how the checkpoints are managed the same way than in the PyTorch trainer. 4831 (jplu)
- [Longformer] Remove redundant code 4839 (ZhuBaohe)
- [cleanup] consolidate some prune_heads logic 4799 (sshleifer)
- Fix the __getattr__ method in BatchEncoding 4772 (jplu)
- Consolidate summarization examples 4837 (aretius)
- Fix a bug in the initialization and serialization of TFRobertaClassificationHead 4884 (harkous)
- [examples] Cleanup summarization docs 4876 (sshleifer)
- run_pplm.py bug fix 4867 (songyouwei)
- Remove unused arguments in Multiple Choice example 4853 (sgugger)
- Deal with multiple choice in common tests 4886 (sgugger
- Fix the CI 4903 (sgugger)
- [All models] fix docs after adding output attentions to all forward functions 4909 (patrickvonplaten)
- Add more models to common tests 4910 (sgugger)
- [ctrl] fix pruning of MultiHeadAttention 4904 (aretius)
- Don't init TPU device twice 4916 (patrickvonplaten)
- Run a single wandb instance per TPU run 4851 (LysandreJik)
- check type before logging in trainer to ensure values are scalars 4883 (mgoldey)
- Split LMBert model in two 4874 (sgugger)
- Make multiple choice models work with input_embeds 4921 (sgugger)
- Fix resize_token_embeddings for Transformer-XL 4759 (RafaelWO)
- [mbart] Fix fp16 testing logic 4949 (sshleifer)
- Hans data with newer tokenizer API 4854 (sgugger)
- Fix parameter 'output_attentions' docstring 4976 (ZhuBaohe)
- Improve ONNX logging 4999 (mfuntowicz)
- NER: fix construction of input examples for RoBERTa 4943 (stefan-it)
- Possible fix to make AMP work with DDP in the trainer 4728 (BramVanroy)
- Make DataCollator a callable 5015 (sgugger)
- Increase pipeline support for ONNX export. 5005 (mfuntowicz)
- Fix importing transformers on Windows - SIGKILL not defined 4997 (mfuntowicz)
- TFTrainer: improve logging 4946 (borisdayma)
- Add position_ids in TFElectra models docstring 5021 (sgugger)
- [Bart] Question Answering Model is added to tests 5024 (patrickvonplaten)
- Ability to pickle/unpickle BatchEncoding pickle (reimport) 5039 (mfuntowicz)
- refactor(wandb): consolidate import 5044 (borisdayma)
- [cleanup] Hoist ModelTester objects to top level 4939 (aretius)
- Convert hans to Trainer 5025 (sgugger)
- Fix marian tokenizer save pretrained 5043 (sshleifer)
- [cleanup] examples test_run_squad uses tiny model 5059 (sshleifer)
- Add header and fix command for HANS 5082 (sgugger)
- [examples] SummarizationModule improvements 4951 (sshleifer)
- Some changes to simplify the generation function 5031 (yjernite)
- Make default_data_collator more flexible and deprecate old behavior 5060 (sgugger)
- [MarianTokenizer] Switch to sacremoses for punc normalization 5092 (sshleifer)
- [style] add pandas to setup.cfg 5093 (sshleifer)
- [ElectraForQuestionAnswering] fix qa example in doc 4929 (patil-suraj)
- Fixing TPU training by disabling wandb.watch gradients logging 4926 (patil-suraj)
- [docs] fix T5 training doc 5080 (patil-suraj)
- support local_files_only option for tf models 5116 (ogarin)
- [cleanup] generate_beam_search comments 5115 (sshleifer)
- [fix] Move _adjust_logits above postprocess to fix Marian.generate 5126 (sshleifer)
- Pin `sphinx-rtd-theme` 5128 (LysandreJik)
- Add missing arg in 02-transformers notebook 5085 (pri-ax)
- [cleanup] remove redundant code in SummarizationDataset 5119 (sshleifer)
- AutoTokenizer supports mbart-large-en-ro 5121(sshleifer)
- Fix in Reformer Config documentation 5138 (erickrf)
- [bart-mnli] Fix class flipping bug 5141 (sshleifer)
- [MobileBert] fix dropout 5150 (ZhuBaohe)
- SummarizationPipeline: init required task name 5086 (julien-c)
- [examples] fixes arguments for summarization finetune scripts 5157 (ieBoytsov)
- Fixing docs for Encoder Decoder Config 5171 (mikaelsouza)
- fix bart doc 5132 (fuzihaofzh)
- Added feature to move added tokens in vocabulary for Transformer-XL 4953 (RafaelWO)
- Add support for gradient checkpointing in BERT 4659 (ibeltagy)
- Fix for IndexError when Roberta Tokenizer is called on empty text 4209 (malteos)
- Add TF auto model to the docs + fix sphinx warnings (again) 5187 (sgugger)
- Have documentation fail on warning 5189 (LysandreJik)
- Cleaner warning when loading pretrained models 4557 (thomwolf)
- Upgrade examples to pl=0.8.1 5146 (sshleifer)
- [fix] mobilebert had wrong path, causing slow test failure 5205 (sshleifer)
- [fix] remove unused import 5206 (sshleifer)
- [Reformer] Axial Pos Emb Improve mem usage reformer 5209 (patrickvonplaten)
- [pl_examples] revert deletion of optimizer_step 5227 (sshleifer)
- [bart] add config.extra_pos_embeddings to facilitate reuse 5190 (sshleifer)
- Only put tensors on a device 5223 (sgugger)
- Fix PABEE division by zero error 5233 (JetRunner)
- Use the script in utils 5224 (sgugger)
- Delay decay schedule until the end of warmup 4940 (amodaresi)
- Replace pad_token with -100 for LM loss calculation 4718 (setu4993)
- examples/seq2seq supports translation 5202 (sshleifer)
- Fix convert_graph_to_onnx script 5230 (n1t0)
- Refactor Code samples; Test code samples 5036 (LysandreJik)
- [Generation] fix docs for decoder_input_ids 5306 (patrickvonplaten)
- [pipelines] Change summarization default to distilbart-cnn-12-6 5289 (sshleifer)
- Add BART-base modeling and configuration 5315 (JetRunner)
- CircleCI stores cleaner output at test_outputs.txt 5291 (sshleifer)
- [pl_examples] default warmup steps=0 5316 (sshleifer)

2.11.0

Not secure

Longformer

- Longformer (ibeltagy)
- Longformer for QA (patil-suraj + patrickvonplaten)
- Longformer fast tokenizer (patil-suraj)
- Longformer for sequence classification (patil-suraj)
- Longformer for token classification (patil-suraj)
- Longformer for Multiple Choice (patrickvonplaten)
- More user-friendly handling of global attention mask vs local attention mask (patrickvonplaten)
- Fix longformer attention mask type casting when using APEX (peskotivesgeroff)

New community notebooks!

- Long Sequence Modeling with Reformer (patrickvonplaten)
- Fine-tune BART for summarization (ohmeow)
- Fine-tune a pre-trained Transformer on anyone's tweets (borisdayma, lavanyashukla)
- A step-by-step guide to tracking hugging face model performance with wandb (jxmorris12, lavanyashukla)
- Fine-tune Longformer for QA (patil-suraj)
- Pretrain Longformer (ibeltagy)
- Fine-tune T5 for sentiment span extraction (enzoampil)

URLs to model weights are not hardcoded anymore (julien-c)

Archive maps were dictionaries linking pre-trained models to their S3 URLs. Since the arrival of the model hub, these have become obsolete.

⚠️ This PR is breaking for the following models: BART, Flaubert, bert-japanese, bert-base-finnish, bert-base-dutch. ⚠️
Those models now have to be instantiated with their full model id:

> "cl-tohoku/bert-base-japanese"
> "cl-tohoku/bert-base-japanese-whole-word-masking"
> "cl-tohoku/bert-base-japanese-char"
> "cl-tohoku/bert-base-japanese-char-whole-word-masking"
> "TurkuNLP/bert-base-finnish-cased-v1"
> "TurkuNLP/bert-base-finnish-uncased-v1"
> "wietsedv/bert-base-dutch-cased"
> "flaubert/flaubert_small_cased"
> "flaubert/flaubert_base_uncased"
> "flaubert/flaubert_base_cased"
> "flaubert/flaubert_large_cased"
>
> all variants of "facebook/bart"

Update: ⚠️ This PR is also breaking for ALBERT from Tensorflow. See issue 4806 for discussion and resolution ⚠️

Fixes and improvements

- Fix convert_token_type_ids_from_sequences for fast tokenizers (n1t0, 4503)
- Fixed the default tokenizer of the summarization pipeline (sshleifer, 4506)
- The `max_len` attribute is now more robust, and warns the user about deprecation (mfuntowicz, 4528)
- Added type hints to `modeling_utils.py` (bglearning, 3911)
- MMBT model now has `nn.Module` as a superclass (shoarora, 4533)
- Fixing tokenization of extra_id symbols in the T5 tokenizer (mansimov, 4353)
- Slow GPU tests run daily (julien-c, 4465)
- Removed PyTorch artifacts in TensorFlow XLNet implementation (ZhuBaohe, 4410)
- Fixed the T5 Cross Attention Position Bias (ZhuBaohe, 4499)
- The `transformers-cli` is now cross-platform (BramVanroy, 4131) + (patrickvonplaten, 4614)
- GPT-2, CTRL: Accept `input_ids` and `past` of variable length (patrickvonplaten, 4581)
- Added back `--do_lower_case` to SQuAD examples.
- Correct framework test requirement for language generation tests (sshleifer, 4616)
- Fix `add_special_tokens` on fast tokenizers (n1t0, 4531)
- MNLI & SST-2 bugs were fixed (stdcoutzyx, 4546)
- Fixed BERT example for NSP and multiple choice (siboehm, 3953)
- Encoder/decoder fix initialization and save/load bug (patrickvonplaten, 4680)
- Fix onnx export input names order (RensDimmendaal, 4641)
- Configuration: ensure that id2label always takes precedence over `num_labels` (julien-c, direct commit to `master`)
- Make docstring match argument (sgugger, 4711)
- Specify PyTorch versions for examples (LysandreJik, 4710)
- Override get_vocab for fast tokenizers (mfuntowicz, 4717)
- Tokenizer should not add special tokens for text generation (patrickvonplaten, 4686)

Page 20 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 20 of 26

3.2.0

3.1.0

3.0.2

3.0.1

3.0.0

2.11.0

Page 20 of 26

Links

Releases