ProphetNet, Blenderbot, SqueezeBERT, DeBERTa
ProphetNET
Two new models are released as part of the ProphetNet implementation: `ProphetNet` and `XLM-ProphetNet`.
ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.
XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.
The ProphetNet model was proposed in [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063), by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.
It was added to the library in PyTorch with the following checkpoints:
- `microsoft/xprophetnet-large-wiki100-cased-xglue-ntg`
- `microsoft/prophetnet-large-uncased`
- `microsoft/prophetnet-large-uncased-cnndm`
- `microsoft/xprophetnet-large-wiki100-cased`
- `microsoft/xprophetnet-large-wiki100-cased-xglue-qg`
Contributions:
- ProphetNet 7157 (qiweizhen, patrickvonplaten)
BlenderBot
Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.
The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
It was added to the library in PyTorch with the following checkpoints:
- `facebook/blenderbot-90M`
- `facebook/blenderbot-3B`
Contributions:
- Blenderbot 7418 (sshleifer)
SqueezeBERT
The SqueezeBERT model was proposed in [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.
It was added to the library in PyTorch with the following checkpoints:
- `squeezebert/squeezebert-mnli`
- `squeezebert/squeezebert-uncased`
- `squeezebert/squeezebert-mnli-headless`
Contributions:
- SqueezeBERT architecture 7083 (forresti)
- Fix squeezebert docs 7587 (LysandreJik)
DeBERTa
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.
It was added to the library in PyTorch with the following checkpoints:
- `microsoft/deberta-base`
- `microsoft/deberta-large`
Contributions:
- Add DeBERTa model 5929 (BigBird01)
- Fix DeBERTa integration tests 7729 (LysandreJik)
Both SentencePiece and Tokenizers are now optional libraries
Support for SentencePiece is now part of the `tokenizers` library! Thanks to this we now have near-full support of fast tokenizers in the library.
With this new feature, we slightly change the paradigm regarding installation:
- SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future
- Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.
- [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies 7659 (thomwolf)
The main `__init__` has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an `ImportError` will be raised at init (with instructions on how to install the missing dependency) 7537 (sgugger)
Improvements made to the `Trainer`
The `Trainer` API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new `TrainerCallback` class has been added to allow the user to easily customize the default training loop.
- Remove config assumption in Trainer 7464 (sgugger)
- Clean the Trainer state 7490 (sgugger)
- Small QOL improvements to TrainingArguments 7475 (sgugger)
- Allow nested tensors in predicted logits 7542 (sgugger)
- Trainer callbacks 7596 (sgugger)
- Add specific notebook ProgressCalback 7793 (sgugger)
- Small fixes to NotebookProgressCallback 7813 (sgugger)
- Add predict step accumulation 7767 (sgugger)
- Don't use `store_xxx` on optional bools 7786 (sgugger)
Seq2Seq Trainer
A child of `Trainer` specialized for training seq2seq models, from patil-suraj and sshleifer. Accessible through `examples/seq2seq/finetune_trainer.py`.
- example scripts at `examples/seq2seq/builtin_trainer/`
- same functionality as `examples/seq2seq/finetune.py`, but better TPU support.
- [examples/s2s] clean up finetune_trainer 7509 (patil-suraj)
- [s2s] trainer scripts: Remove --run_name, thanks sylvain! 7521 (sshleifer)
- [s2s] Adafactor support for builtin trainer 7522 (sshleifer)
- [s2s] add config params like Dropout in Seq2SeqTrainingArguments 7532 (patil-suraj)
- Distributed Trainer: 2 little fixes 7461 (sshleifer)
- [s2sTrainer] test + code cleanup 7467 (sshleifer)
- Seq2SeqDataset: avoid passing src_lang everywhere 7470 (amanpreet692)
- [s2strainer] fix eval dataset loading 7477 (patil-suraj)
- [pseudolabels] cleanup markdown table 7653 (sshleifer)
Distributed Generation
- You can run `model.generate` in pytorch on a large dataset and split the work across multiple GPUs, using `examples/seq2seq/run_distributed_eval.py`
- [s2s] release pseudolabel links and instructions 7639 (sshleifer)
- [s2s] Fix t5 warning for distributed eval 7487 (sshleifer)
- [s2s] fix kwargs style 7488 (sshleifer)
- [s2s] fix lockfile and peg distillation constants 7545 (sshleifer)
- [s2s] fix nltk pytest race condition with FileLock 7515 (sshleifer)
Notebooks
- Train T5 in Tensoflow 2 Community Notebook 7428 (HarrisDePerceptron)
General improvements and bugfixes
- remove codecov PR comments 7400 (sshleifer)
- Get a better error when check_copies fails 7457 (sgugger)
- Multi-GPU Testing setup 7453 (LysandreJik)
- Fix LXMERT with DataParallel 7471 (LysandreJik)
- Number of GPUs for multi-gpu 7472 (LysandreJik)
- Make transformers install check positive 7473 (FremyCompany)
- Alphabetize model lists 7478 (sgugger)
- Bump isort version. 7484 (sgugger)
- Add forgotten return_dict argument in the docs 7483 (sgugger)
- Enable pegasus fp16 by clamping large activations 7243 (sshleifer)
- Update LayoutLM doc 7388 (al31415)
- Report Tune metrics in final evaluation 7507 (krfricke)
- Fix Ray Tune progress_reporter kwarg 7508 (krfricke)
- [Seq2Seq] Fix a couple of bugs and clean examples 7474 (patrickvonplaten)
- [Attention Mask] Fix data type 7513 (patrickvonplaten)
- Fix seq2seq example test 7518 (sgugger)
- Remove labels from the RagModel example 7560 (sgugger)
- added script for fine-tuning roberta for sentiment analysis task 7505 (DhavalTaunk08)
- LayoutLM: add exception handling for bbox values 7452 (al31415)
- Cleanup documentation for BART, Marian, MBART and Pegasus 7523 (sgugger)
- Add Electra unexpected keys 7569 (LysandreJik)
- Fix tokenization in SQuAD for RoBERTa, Longformer, BART 7387 (tholor)
- docs(pretrained_models): fix num parameters 7575 (amineabdaoui)
- Update Code example according to deprecation of AutoModeWithLMHead 7555 (jshamg)
- Allow soft dependencies in the namespace with ImportErrors at use 7537 (sgugger)
- Fix post_init of some TrainingArguments 7525 (sgugger)
- Check and update model list in index.rst automatically 7527 (sgugger)
- Expand test to locate flakiness 7580 (sgugger)
- Custom TF weights loading 7422 (jplu)
- Documentation fixes 7585 (sgugger)
- Documentation framework toggle should stick 7586 (LysandreJik)
- Support T5 Distillation w/hidden state supervision 7599 (sshleifer)
- [makefile] check only .py files 7588 (stas00)
- [TF generation] Fix typo 7582 (SidJain1412)
- change return dicitonary for DataCollatorForNextSentencePrediction from masked_lm_labels to labels 7595 (gmihaila)
- Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch 7598 (AdrienDS)
- typo fix 7611 (agemagician)
- [bart] fix config.classif_dropout 7593 (sshleifer)
- [s2s] save first batch to json for debugging purposes 6810 (sshleifer)
- Add GPT2ForSequenceClassification based on DialogRPT 7501 (LysandreJik)
- Fix wrong reference name/filename in docstring of `SquadProcessor` 7616 (phiyodr)
- Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH 7610 (GabrielePicco)
- Add GPT2 to sequence classification auto model 7630 (LysandreJik)
- Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load 6935 (w4nderlust)
- Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer 7141 (thomwolf)
- Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) 7658 (thomwolf)
- Fix RobertaForCausalLM docs 7642 (LysandreJik)
- [s2s] configure lr_scheduler from command line 7641 (patil-suraj)
- [pseudo] Switch URLS to CDN 7661 (sshleifer)
- [s2s] Switch README urls to cdn 7670 (sshleifer)
- fix nn.DataParallel compatibility with PyTorch 1.5 7671 (guhur)
- Update XLM-RoBERTa pretrained model details 7669 (noahtren)
- Fix dataset cardinality 7678 (jplu)
- [pegasus] Faster tokenizer tests 7672 (stas00)
- Delete extra test file in repo root 7681 (sshleifer)
- Better links for models in README and doc index 7680 (sgugger)
- Import integration libraries first 7650 (dsblank)
- Fix title level in Blenderbot doc 7687 (sgugger)
- Fix flaky test in test_trainer 7689 (sgugger)
- Adds license information for default and distilbert models 7688 (ankane)
- Fix docstring in AutoModel class 7694 (al31415)
- [examples] bump pl=0.9.0 7053 (sshleifer)
- Corrected typo: maked → masked 7703 (miggymigz)
- fixed typo in warning line 207. 7718 (Berowne)
- Fix typo in all model docs 7714 (sgugger)
- Fix check for xla in PreTrainedModel.save_pretrained() 7699 (fteufel)
- Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural. 7696 (AndreaSottana)
- The input training data files (multiple files in glob format). 7717 (kfkelvinng)
- Fix trainer callback 7720 (cccntu)
- Fix tf text class 7724 (jplu)
- Fix 7731 7732 (LysandreJik)
- Fix 3 failing slow bart/blender tests 7652 (sshleifer)
- Add license info to nlptown/bert-base-multilingual-uncased-sentiment 7738 (alexcombessie)
- [marian] Automate Tatoeba-Challenge conversion 7709 (sshleifer)
- ElectraTokenizerFast 7754 (LysandreJik)
- Gpt1 for sequence classification 7683 (fmcurti)
- [Rag] Fix loading of pretrained Rag Tokenizer 7756 (patrickvonplaten)
- Do not softmax when num_labels==1 7726 (LysandreJik)
- Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 7742 (noamwies)
- fixed lots of typos. 7758 (NieTiger)
- Adding optional trial argument to model_init 7759 (madlag)
- Faster pegasus tokenization test with reduced data size 7762 (sshleifer)
- Fix bert position ids in DPR convert script 7776 (lhoestq)
- Add batch inferencing support for GPT2LMHeadModel 7552 (cccntu)
- fix examples/rag imports, tests 7712 (sshleifer)
- Fix TF savedmodel in Roberta 7795 (jplu)
- Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. 7728 (Narsil)
- Upgrading in pipelines TFAutoModelWithLMHead to new Causal/Masked/Seq2Seq LM classes 7730 (Narsil)
- fix wandb/comet problems 7830 (stas00)
- [utils/check_copies.py] fix DeprecationWarning 7834 (stas00)
- [DOC] Typo and fix the input of labels to `cross_entropy` 7841 (katarinaslama)
- [seq2seq] get_git_info fails gracefully 7843 (stas00)
- [Pipelines] Fix links to model lists 7826 (julien-c)
- Herbert polish model 7798 (rmroczkowski)
- [cleanup] assign todos, faster bart-cnn test 7835 (sshleifer)
- Remove masked_lm_labels from returned dictionary in DataCollatorForNextSentencePrediction 7818 (vblagoje)
- [testing] fix/hide warnings 7837 (stas00)
- Small fixes to HP search 7839 (sgugger)
- [testing] disable FutureWarning in examples tests 7842 (stas00)
- Fix missing reference titles in retrieval evaluation of RAG 7817 (lhoestq)
- [seq2seq testing] improve readability 7845 (stas00)
- [s2s testing] turn all to unittests, use auto-delete temp dirs 7859 (stas00)
- Fix Rag example docstring 7872 (patrickvonplaten)
- Remove duplicated mish activation function 7856 (Razcle)
- [tests] fix slow bart cnn test, faster marian tests 7888 (sshleifer)
- Fix small type hinting error 7820 (AndreaSottana)
- Add support to provide initial tokens to decoder of encoder-decoder type models 7577 (ayushtiku5)
- style: fix typo 7883 (rememberYou)
- [testing] remove USE_CUDA 7861 (stas00)
- [CIs] report slow tests add --durations=0 to some pytest jobs 7884 (stas00)
- style: fix typo in the README 7882 (rememberYou)
- [RAG] Propagating of n_docs as parameter to all RagModel's related functions 7891 (lalitpagaria)
- Trainer with Iterable Dataset 7858 (j-rossi-nl)
- Allow Custom Dataset in RAG Retriever 7763 (lhoestq)
- Modelling Encoder-Decoder | Error :- `decoder_config` used before intialisation 7903 (ayubSubhaniya)
- [Docstring] fix t5 training docstring 7911 (patrickvonplaten)
- Raise error when using AMP on non-CUDA device 7869 (BramVanroy)
- [EncoderDecoder] Fix Typo 7915 (patrickvonplaten)
- [testing] rename skip targets + docs 7863 (stas00)