Transformers

Latest version: v4.41.0

Safety actively analyzes 631215 Python packages for vulnerabilities to keep your Python projects secure.

Page 19 of 26

4.0.0rc1

Not secure

Breaking changes since v3.x

Version v4.0.0 introduces several breaking changes that were necessary.

1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.

The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.

How to obtain the same behavior as v3.x in v4.x

- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:

In version `v3.x`:
py
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx")

to obtain the same in version `v4.x`:
py
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)

2. SentencePiece is removed from the required dependencies

The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation.

This includes the **slow** versions of:
- `XLNetTokenizer`
- `AlbertTokenizer`
- `CamembertTokenizer`
- `MBartTokenizer`
- `PegasusTokenizer`
- `T5Tokenizer`
- `ReformerTokenizer`
- `XLMRobertaTokenizer`

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally:

In version `v3.x`:
bash
pip install transformers

to obtain the same in version `v4.x`:
bash
pip install transformers[sentencepiece]

or
bash
pip install transformers sentencepiece

3. The architecture of the repo has been updated so that each model resides in its folder

The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.

This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.

In version `v3.x`:
bash
from transformers.modeling_bert import BertLayer

to obtain the same in version `v4.x`:
bash
from transformers.models.bert.modeling_bert import BertLayer

4. Switching the `return_dict` argument to `True` by default

The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of named-tuples-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.

This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass.

In version `v3.x`:
bash
outputs = model(**inputs)

to obtain the same in version `v4.x`:
bash
outputs = model(**inputs, return_dict=False)

5. Removed some deprecated attributes

Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in https://github.com/huggingface/transformers/pull/8604.

Model Templates

Version 4.0.0 will be the first to include the experimental feature of model templates. These model templates aim to facilitate the addition of new models to the library by doing most of the work: generating the model/configuration/tokenization/test files that fit the API, with respect to the choice the user has made in terms of naming and functionality.

This release includes a model template for the encoder model (similar to the BERT architecture). Generating a model using the template will generate the files, put them at the appropriate location, reference them throughout the code-base, and generate a working test suite. The user should then only modify the files to their liking, rather than creating the model from scratch.

Feedback welcome, get started from the README [here](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model).

- Model templates encoder only 8509 (LysandreJik)

New model additions

mT5 and T5 version 1.1 (patrickvonplaten )

The T5v1.1 is an improved version of the original T5 model, see here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md

The multilingual T5 model (mT5) was presented in https://arxiv.org/abs/2010.11934 and is based on the T5v1.1 architecture.

Multiple pre-trained checkpoints have been added to the library:
- t5v1_1: https://huggingface.co/models?search=t5-v1_1
- mT5: https://huggingface.co/models?search=mt5

Relevant pull requests:
- T5 & mT5 8552 (patrickvonplaten)
- [MT5] More docs 8589 (patrickvonplaten)
- Fix init for MT5 8591 (sgugger)

TF DPR

The DPR model has been added in TensorFlow to match its PyTorch counterpart by ratthachat

- Add TFDPR 8203 (ratthachat)

TF Longformer

Additional heads have been added to the TensorFlow Longformer implementation: SequenceClassification, MultipleChoice and TokenClassification

- Tf longformer for sequence classification 8231 (elk-cloner)

Bug fixes and improvements

- [s2s/distill] hparams.tokenizer_name = hparams.teacher 8382 (ShichaoSun)
- [examples] better PL version check 8429 (stas00)
- Question template 8440 (sgugger)
- [docs] improve bart/marian/mBART/pegasus docs 8421 (sshleifer)
- Add auto next sentence prediction 8432 (jplu)
- Windows dev section in the contributing file 8436 (jplu)
- [testing utils] get_auto_remove_tmp_dir more intuitive behavior 8401 (stas00)
- Add missing import 8444 (jplu)
- [T5 Tokenizer] Fix t5 special tokens 8435 (patrickvonplaten)
- using multi_gpu consistently 8446 (stas00)
- Add missing tasks to `pipeline` docstring 8428 (bryant1410)
- [No merge] TF integration testing 7621 (LysandreJik)
- [T5Tokenizer] fix t5 token type ids 8437 (patrickvonplaten)
- Bug fix for apply_chunking_to_forward chunking dimension check 8391 (pedrocolon93)
- Fix TF Longformer 8460 (jplu)
- Add next sentence prediction loss computation 8462 (jplu)
- Fix TF next sentence output 8466 (jplu)
- Example NER script predicts on tokenized dataset 8468 (sarnoult)
- Replaced unnecessary iadd operations on lists in tokenization_utils.py with proper list methods 8433 (bombs-kim)
- Flax/Jax documentation 8331 (mfuntowicz)
- [s2s] distill t5-large -> t5-small 8376 (sbhaktha)
- Update deploy-docs dependencies on CI to enable Flax 8475 (mfuntowicz)
- Fix on "examples/language-modeling" to support more datasets 8474 (zeyuyun1)
- Fix doc bug 8500 (mymusise)
- Model sharing doc 8498 (sgugger)
- Fix SqueezeBERT for masked language model 8479 (forresti)
- Fix logging in the examples 8458 (jplu)
- Fix check scripts for Windows 8491 (jplu)
- Add pretraining loss computation for TF Bert pretraining 8470 (jplu)
- [T5] Bug correction & Refactor 8518 (patrickvonplaten)
- Model sharing doc: more tweaks 8520 (julien-c)
- [T5] Fix load weights function 8528 (patrickvonplaten)
- Rework some TF tests 8492 (jplu)
- [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency 8073 (thomwolf)
- Adding the prepare_seq2seq_batch function to ProphetNet 8515 (forest1988)
- Update version to v4.0.0-dev 8568 (sgugger)
- TAPAS tokenizer & tokenizer tests 8482 (LysandreJik)
- Switch `return_dict` to `True` by default. 8530 (sgugger)
- Fix mixed precision issue for GPT2 8572 (jplu)
- Reorganize repo 8580 (sgugger)
- Tokenizers: ability to load from model subfolder 8586 (julien-c)
- Fix model templates 8595 (sgugger)
- [examples tests] tests that are fine on multi-gpu 8582 (stas00)
- Fix check repo utils 8600 (sgugger)
- Tokenizers should be framework agnostic 8599 (LysandreJik)
- Remove deprecated 8604 (sgugger)
- Fixed link to the wrong paper. 8607 (cronoik)
- Reset loss to zero on logging in Trainer to avoid bfloat16 issues 8561 (bminixhofer)
- Fix DataCollatorForLanguageModeling 8621 (sgugger)
- [s2s] multigpu skip 8613 (stas00)
- [s2s] fix finetune.py to adjust for 8530 changes 8612 (stas00)
- tf_bart typo - self.self.activation_dropout 8611 (ratthachat)
- New TF loading weights 8490 (jplu)
- Adding PrefixConstrainedLogitsProcessor 8529 (nicola-decao)
- [Tokenizer Doc] Improve tokenizer summary 8622 (patrickvonplaten)
- Fixes the training resuming with gradient accumulation 8624 (sgugger)
- Fix training from scratch in new scripts 8623 (sgugger)
- [s2s] distillation apex breaks return_dict obj 8631 (stas00)
- Updated the Extractive Question Answering code snippets 8636 (cronoik)
- Fix missing return_dict in RAG example to use a custom knowledge source 8653 (lhoestq)
- Fix a bunch of slow tests 8634 (LysandreJik)
- Better filtering of the model outputs in Trainer 8633 (sgugger)

3.5.1

Not secure

Fix a typo that raised an error instead of a deprecation warning.

3.5.0

Not secure

Model versioning, TensorFlow encoder-decoder models, new scripts, refactor of the `generate` method

Model versioning

We host more and more of the community's models which is awesome ❤️. To scale this sharing, we needed to change the infra to both support more models, and unlock new powerful features.

To that effect, we have rebuilt the storage backend that we use for models (currently S3), to **our own git repos (using S3 as a git-lfs endpoint for large files), with one model = one repo**.

The benefits of this switch are:

* **built-in versioning** (I mean… it’s git. It’s pretty much what you use for versioning. Versioning in S3 has a ton a limitations)
* **access control** (will unlock private models, private datasets, etc)
* **scalability** (our usage of S3 to maintain lists of models was starting to bottleneck)

Let's dive in to the actual changes:

I. On the website
---

You'll now see a "Browse files and versions" tab or button on each model page. (design is not final, we'll make it more prominent/streamlined in the near future)

This is what this page looks like:

![Screenshot 2020-11-06 at 19.23.05|584x500](https://aws1.discourse-cdn.com/standard14/uploads/hellohellohello/optimized/1X/02272b7436c2db622eb873d9ca2b61349cdebfa6_2_1168x1000.jpeg)

**The UX should look familiar and self-explanatory**, but we'll add more ML-specific features in the future.

You can:
* see commit histories and diffs of changes made to any text file, like config.json:
* changes made by the HuggingFace team will be way clearer – we can perform updates to the models to ensure they work well with the library(ies) (you'll be able to opt out from those changes)
* Large binary files are stored using https://git-lfs.github.com/ which is pretty standard now, and interoperable out of the box with git
* Ability to update your text files, like your README.md model card, directly on the website!
* with instant preview 🔥

II. In the transformers library
---

The PR to enable this new storage mode in the `transformers` library is available here: **https://github.com/huggingface/transformers/pull/8324**

This PR has two parts:

**1. changes to the file downloading code** used in `from_pretrained()` methods to use the new file URLs.
Large files are stored in an S3 bucket and served by Cloudfront so downloads should be as fast as they are right now.

In addition, you now have a way to pin a specific version of a model, to a commit hash, tag or branch.

For instance:
python
tokenizer = AutoTokenizer.from_pretrained(
"julien-c/EsperBERTo-small",
revision="v2.0.1" tag name, or branch name, or commit hash
)

Finally, the networking code is more robust and doesn't gobble up errors anymore, so in case you have trouble downloading a specific file you'll know exactly why.

**2. changes to the model upload CLI** to create a model repo then be able to git clone and git push to it.
We are intentionally not wrapping `git` too much because we expect most model authors to be familiar with git (and possibly git-lfs), let us know if not the case.

To create a repo:
bash
transformers-cli repo create your-model-name

Then you'll get a repo url that you'll be able to clone:
bash
git clone https://huggingface.co/username/your-model-name

Then commit as usual
cd your-model-name
echo "hello" >> README.md
git add . && git commit -m "Update from $USER"

A nice side effect of the new system on the upload side is that file uploading should be more robust for very large files (hello T5!) as git-lfs handles the networking code.

By the way, again, **every model is its own repo**. So you can git clone any public model if you'd like:

bash
git clone https://huggingface.co/gpt2

But you won't be able to push unless it's one of your models (or one of your orgs').

III. Backward compatibility
---

* **Backward compatibility on model downloads is expected**, because even though the new models will be stored in huggingface.co-hosted git repos, we will backport all file changes to S3 automatically.
* **⚠️ Model uploads using the current system won't work anymore**: you'll need to upgrade your transformers installation to the next release, `v3.5.0`, or to build from `master`.
Alternatively, in the next week or so we'll add the ability to create a repo from the website directly so you'll be able to push even without the transformers library.

TFMarian, TFMbart, TFPegasus, TFBlenderbot

- Add tensorflow 2.0 functionality for SOTA seq2seq transformers 7987 (sshleifer)

New and updated scripts

We'working on giving examples on how to leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library and the Trainer API. Those scripts are meant as examples easy to customize, with lots of comments explaining the various steps. The following tasks are now covered:

- Text classification : New run glue script 7917 (sgugger)
- Causal Language Modeling: New run_clm script 8105 (sgugger)
- Masked Language Modeling: Add line by line option to mlm/plm scripts 8240 (sgugger)
- Token classification: Add new token classification example 8340 (sgugger)

Seq2Seq Trainer

A child of `Trainer` specialized for training seq2seq models, from patil-suraj, stas00 and sshleifer. Accessible through `examples/seq2seq/finetune_trainer.py`. API is similar to `examples/seq2seq/finetune.py`, but API support is better. Example scripts are in `examples/seq2seq/builtin_trainer`.

- [seq2seq testing] multigpu test run via subprocess 7281 (stas00)
- [s2s trainer] tests to use distributed on multi-gpu machine 7965 (stas00)
- [Seq2Seq] Allow EncoderDecoderModels to be trained with Seq2Seq 7809 (patrickvonplaten)
- [Seq2Seq Trainer] Make sure padding is implemented for models without pad_token 8043 (patrickvonplaten)
- [Seq2SeqTrainer] Move import to init to make file self-contained 8194 (patrickvonplaten)
- [s2s test] cleanup 8131 (stas00)
- [Seq2Seq] Correct import in Seq2Seq Trainer 8254 (patrickvonplaten)
- [Seq2Seq] Make Seq2SeqArguments an independent file 8267 (patrickvonplaten)
- [Seq2SeqDataCollator] dont pass add_ prefix_space=False to all tokenizers 8329 (sshleifer)

Seq2Seq Testing and Documentation Improvements
- [s2s] create doc for pegasus/fsmt replication 7934 (stas00)
- [s2s] test_distributed_eval 8315 (stas00)
- [s2s] test_bash_script.py - actually learn something 8318 (stas00)
- [s2s examples test] fix data path 8398 (stas00)
- [s2s test_finetune_trainer] failing multigpu test 8400 (stas00)
- [s2s/distill] remove run_distiller.sh, fix xsum script 8412 (sshleifer)

Docs for DistillBART Paper Replication
Re-run experiments from [the paper](https://arxiv.org/abs/2010.13002) [here](https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md#distilbart)
- [s2s] distillBART docs for paper replication 8150 (sshleifer)

Refactoring the `generate()` function

The `generate()` method now has a new design so that the user can directly call upon the methods
`sample()`, `greedy_search()`, `beam_search()` and `beam_sample()`. The code was made more readable, and beam search was sped-up by *ca.* 5-10%.

Refactoring the generate() function 6949 (patrickvonplaten)

Notebooks

- added qg evaluation notebook 7958 (zolekode)
- adding beginner-friendly notebook on text classification with DistilBERT/TF 7964 (peterbayerle)
- [Notebooks] Add new encoder-decoder notebooks 8246 (patrickvonplaten)

General improvements and bugfixes

- Respect the 119 line chars 7928 (LysandreJik)
- PPL guide code snippet minor fix 7938 (joeddav)
- [ProphetNet] Add Question Generation Model + Test 7942 (patrickvonplaten)
- [multiple models] skip saving/loading deterministic state_dict keys 7878 (stas00)
- Add missing comma 7870 (mrm8488)
- TensorBoard/Wandb/optuna/raytune integration improvements. 7935 (madlag)
- [ProphetNet] Correct Doc string example 7944 (patrickvonplaten)
- [GPT2 batch generation] Make test clearer. `do_sample=True` is not deterministic. 7947 (patrickvonplaten)
- fix 'encode_plus' docstring for 'special_tokens_mask' (0s and 1s were reversed) 7949 (epwalsh)
- Herbert tokenizer auto load 7968 (rmroczkowski)
- [testing] slow tests should be marked as slow 7895 (stas00)
- support relative path for best_model_checkpoint 7973 (HaebinShin)
- Disable inference API for t5-11b 7978 (julien-c)
- [fsmt test] basic config test with online model + super tiny model 7860 (stas00)
- Add whole word mask support for lm fine-tune 7925 (wlhgtc)
- [PretrainedConfig] Fix save pretrained config for edge case 7943 (patrickvonplaten)
- GPT2 - Remove else branch adding 0 to the hidden state if token_type_embeds is None. 7977 (mfuntowicz)
- Fixing the "translation", "translation_XX_to_YY" pipelines. 7975 (Narsil)
- FillMaskPipeline: support passing top_k on __call__ 7971 (julien-c)
- Only log total_flos at the end of training 7981 (sgugger)
- add zero shot pipeline tags & examples 7983 (joeddav)
- Reload checkpoint 7984 (sgugger)
- [gh ci] less output ( --durations=50) 7989 (sshleifer)
- Move NoLayerEmbedTokens 7945 (sshleifer)
- update zero shot default widget example 7992 (joeddav)
- [RAG] Handle the case when title is None while loading own datasets 7941 (lalitpagaria)
- [tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups 7970 (thomwolf)
- [Reformer] remove reformer pad_token_id 7991 (patrickvonplaten)
- Fix BatchEncoding.word_to_tokens for removed tokens 7939 (n1t0)
- Handling longformer model_type 7990 (ethanjperez)
- [doc prepare_seq2seq_batch] fix docs 8013 (patil-suraj)
- [tokenizers] Fixing 8001 - Adding tests on tokenizers serialization 8006 (thomwolf)
- Add mixed precision evaluation 8036 (luyug)
- [docs] [testing] distributed training 7993 (stas00)
- [fix] FSMT slow test uses lists instead of torch tensors 8031 (sshleifer)
- update version for scipy 7998 (suliuzh)
- [cleanup] pegasus,marian,mbart pytorch tests 8033 (sshleifer)
- Fix label name in DataCollatorForNextSentencePrediction test 8048 (sgugger)
- Tiny TF Bart fixes 8023 (LysandreJik)
- Mlflow integration callback 8016 (noise-field)
- Minor error fix of 'bart-large-cnn' details in the pretrained_models doc 8053 (forest1988)
- invalid argument wwm passed to the run_language_modeling.py file 8050 (mohammadreza-Banaei73)
- Fix + Test 8049 (LysandreJik)
- [testing] fixing crash in deberta 8057 (stas00)
- [TF] from_pt should respect authorized_unexpected_keys 8056 (sshleifer)
- Fix TF training arguments instantiation 8063 (LysandreJik)
- Doc fixes in preparation for the docstyle PR 8061 (sgugger)
- Doc styling 8067 (sgugger)
- Fix doc examples 8082 (mymusise)
- Fix comet_ml import and add ensure availability 7933 (dsblank)
- Doc styling fixes 8074 (sgugger)
- Fix DeBERTa docs 8092 (LysandreJik)
- [CI] generate separate report files as artifacts 7995 (stas00)
- Move style_doc to extra_quality_checks 8081 (sshleifer)
- Fix IterableDataset with __len__ in Trainer 8095 (cccntu)
- Fix assertion error message for MLflowCallback 8091 (harupy)
- Fix a bug for `CallbackHandler.callback_list` 8052 (harupy)
- [setup] update/add setup targets 8076 (stas00)
- DEP: pinned sentencepiece to 0.1.91 in setup.py to fix build issues with newer versions 8069 (jmwoloso)
- infer entailment label id on zero shot pipeline 8059 (joeddav)
- Fully remove codecov 8093 (LysandreJik)
- Add AzureML in integrations via dedicated callback 8062 (davidefiocco)
- Adjust setup so that all extras run on Windows 8102 (sgugger)
- Move installation instructions to the top 8106 (sgugger)
- [gh actions] run artifacts job always 8110 (stas00)
- [testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements 8107 (stas00)
- [DOC] Improve pipeline() docstrings for config and tokenizer 8123 (BramVanroy)
- Document the various LM Auto models 8118 (sgugger)
- Rename add_start_docstrings_to_callable 8120 (sgugger)
- Update CI cache 8126 (LysandreJik)
- Upgrade PyTorch Lightning to 1.0.2 7852 (SeanNaren)
- Document tokenizer_class in configurations 8152 (sgugger)
- Smarter prediction loop and no- -> no_ in console args 8151 (sgugger)
- Add a template for examples and apply it for mlm and plm examples 8153 (sgugger)
- [testing] distributed: correct subprocess output checking 8157 (stas00)
- Fix eval ref miss in Chinese WWM. 8115 (wlhgtc)
- [CI] Better reports 2 8163 (stas00)
- Fixing some warnings in DeBerta 8176 (Narsil)
- Ci test tf super slow 8007 (LysandreJik)
- Doc fixes and filter warning in wandb 8189 (sgugger)
- Finalize lm examples 8188 (sgugger)
- Replace swish with silu 8166 (TFUsers)
- Remove deprecated arguments from new run_clm 8197 (sgugger)
- Minor style improvements for the Flax BERT and RoBERTa examples 8178 (avital)
- Fix two bugs with --logging_first_step 8193 (abisee)
- [Bug fix] Fixed value for BlenderBot pad token 8205 (guillaume-be)
- Fix the behaviour of DefaultArgumentHandler (removing it). 8180 (Narsil)
- Fix ignore files behavior in doctests 8213 (bryant1410)
- Patch reports 8238 (LysandreJik)
- Fix bad import with PyTorch <= 1.4.1 8237 (sgugger)
- Fix TensorBoardCallback for older versions of PyTorch 8239 (sgugger)
- Add XLMProphetNetTokenizer to tokenization auto 8245 (LysandreJik)
- [EncoderDecoder] fix encoder decoder config model type bug 8243 (patrickvonplaten)
- [bart] 2 SinusoidalPositionalEmbedding fixes 8226 (stas00)
- [fix] Skip tatoeba tests if Tatoeba-Challenge not cloned 8260 (sshleifer)
- [FIX] TextGenerationPipeline is currently broken. 8256 (Narsil)
- Updated ConversationalPipeline to work with encoder-decoder models 8207 (guillaume-be)
- [distributed testing] forward the worker stderr to the parent process 8262 (stas00)
- [examples] minimal version requirement run-time check in PL 8133 (stas00)
- Clean Trainer tests and datasets dep 8268 (sgugger)
- improve documentation of training_args.py 8270 (PhilipMay)
- Data collator for token classification 8274 (sgugger)
- [CIs] Better reports everywhere 8275 (stas00)
- [blenderbot] regex fix 8282 (stas00)
- [Generate Test] fix greedy generate test 8293 (patrickvonplaten)
- Fix validation file loading in scripts 8298 (sgugger)
- Improve QA pipeline error handling 8286 (Narsil)
- Speedup doc build 8301 (sgugger)
- Fix path to old run_language_modeling.py script 8302 (mrm8488)
- Clean up data collators and datasets 8308 (sgugger)
- change TokenClassificationTask class methods to static methods 7902 (donchev7)
- Output global_attentions in Longformer models 7562 (gui11aume)
- Make Trainer evaluation handle dynamic seq_length 8336 (sgugger)
- Docs bart training ref 8330 (lvwerra)
- Some added tests for TokenClassificationArgumentHandler 8366 (Narsil)
- [All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs 8071 (ysgit)
- [TF generate] Cut encoder outptus to just last hidden states for now 8368 (patrickvonplaten)
- [make] rewrite modified_py_files in python to be cross-platform 8371 (stas00)
- Fix DataCollatorForWholeWordMask 8379 (cccntu)
- Fix DataCollatorForWholeWordMask again 8397 (cccntu)
- comet_ml init weirdness 8410 (stas00)
- updating tag for exbert viz 8408 (smanjil)
- Fix some tooling for windows 8359 (jplu)
- examples/docs: caveat that PL examples don't work on TPU 8309 (sshleifer)
- add evaluate doc - trainer.evaluate returns 'epoch' from training 8273 (PhilipMay)
- Bug fix for permutation language modelling 8409 (shngt)
- [fsmt tokenizer] support lowercase tokenizer 8389 (stas00)
- Bump tokenizers 8419 (sgugger)
- [fsmt convert script] fairseq broke chkpt data - fixing that 8377 (stas00)
- Deprecate old data/metrics functions 8420 (sgugger)
- [Tests] Add Common Test for Training + Fix a couple of bugs 8415 (patrickvonplaten)
- [docs] remove sshleifer from issue-template :( 8418 (sshleifer)
- Fix bart shape comment 8423 (sshleifer)
- [docs] [testing] gpu decorators table 8422 (stas00)
- Check all models are in an auto class 8425 (sgugger)
- [github CI] add a multi-gpu job for all example tests 8341 (stas00)
- Changing XLNet default from not using memories to 512 context size following paper 8417 (TevenLeScao)
- Patch token classification pipeline 8364 (LysandreJik)

3.4.0

Not secure

ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

ProphetNET

Two new models are released as part of the ProphetNet implementation: `ProphetNet` and `XLM-ProphetNet`.

ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.

XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.

The ProphetNet model was proposed in [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063), by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.

It was added to the library in PyTorch with the following checkpoints:

- `microsoft/xprophetnet-large-wiki100-cased-xglue-ntg`
- `microsoft/prophetnet-large-uncased`
- `microsoft/prophetnet-large-uncased-cnndm`
- `microsoft/xprophetnet-large-wiki100-cased`
- `microsoft/xprophetnet-large-wiki100-cased-xglue-qg`

Contributions:
- ProphetNet 7157 (qiweizhen, patrickvonplaten)

BlenderBot

Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.

The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.

It was added to the library in PyTorch with the following checkpoints:
- `facebook/blenderbot-90M`
- `facebook/blenderbot-3B`

Contributions:
- Blenderbot 7418 (sshleifer)

SqueezeBERT

The SqueezeBERT model was proposed in [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.

It was added to the library in PyTorch with the following checkpoints:

- `squeezebert/squeezebert-mnli`
- `squeezebert/squeezebert-uncased`
- `squeezebert/squeezebert-mnli-headless`

Contributions:
- SqueezeBERT architecture 7083 (forresti)
- Fix squeezebert docs 7587 (LysandreJik)

DeBERTa

The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It was added to the library in PyTorch with the following checkpoints:

- `microsoft/deberta-base`
- `microsoft/deberta-large`

Contributions:
- Add DeBERTa model 5929 (BigBird01)
- Fix DeBERTa integration tests 7729 (LysandreJik)

Both SentencePiece and Tokenizers are now optional libraries

Support for SentencePiece is now part of the `tokenizers` library! Thanks to this we now have near-full support of fast tokenizers in the library.

With this new feature, we slightly change the paradigm regarding installation:
- SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future
- Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.

- [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies 7659 (thomwolf)

The main `__init__` has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an `ImportError` will be raised at init (with instructions on how to install the missing dependency) 7537 (sgugger)

Improvements made to the `Trainer`
The `Trainer` API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new `TrainerCallback` class has been added to allow the user to easily customize the default training loop.

- Remove config assumption in Trainer 7464 (sgugger)
- Clean the Trainer state 7490 (sgugger)
- Small QOL improvements to TrainingArguments 7475 (sgugger)
- Allow nested tensors in predicted logits 7542 (sgugger)
- Trainer callbacks 7596 (sgugger)
- Add specific notebook ProgressCalback 7793 (sgugger)
- Small fixes to NotebookProgressCallback 7813 (sgugger)
- Add predict step accumulation 7767 (sgugger)
- Don't use `store_xxx` on optional bools 7786 (sgugger)

Seq2Seq Trainer
A child of `Trainer` specialized for training seq2seq models, from patil-suraj and sshleifer. Accessible through `examples/seq2seq/finetune_trainer.py`.
- example scripts at `examples/seq2seq/builtin_trainer/`
- same functionality as `examples/seq2seq/finetune.py`, but better TPU support.
- [examples/s2s] clean up finetune_trainer 7509 (patil-suraj)
- [s2s] trainer scripts: Remove --run_name, thanks sylvain! 7521 (sshleifer)
- [s2s] Adafactor support for builtin trainer 7522 (sshleifer)
- [s2s] add config params like Dropout in Seq2SeqTrainingArguments 7532 (patil-suraj)
- Distributed Trainer: 2 little fixes 7461 (sshleifer)
- [s2sTrainer] test + code cleanup 7467 (sshleifer)
- Seq2SeqDataset: avoid passing src_lang everywhere 7470 (amanpreet692)
- [s2strainer] fix eval dataset loading 7477 (patil-suraj)
- [pseudolabels] cleanup markdown table 7653 (sshleifer)

Distributed Generation
- You can run `model.generate` in pytorch on a large dataset and split the work across multiple GPUs, using `examples/seq2seq/run_distributed_eval.py`
- [s2s] release pseudolabel links and instructions 7639 (sshleifer)
- [s2s] Fix t5 warning for distributed eval 7487 (sshleifer)
- [s2s] fix kwargs style 7488 (sshleifer)
- [s2s] fix lockfile and peg distillation constants 7545 (sshleifer)
- [s2s] fix nltk pytest race condition with FileLock 7515 (sshleifer)

Notebooks

- Train T5 in Tensoflow 2 Community Notebook 7428 (HarrisDePerceptron)

General improvements and bugfixes

- remove codecov PR comments 7400 (sshleifer)
- Get a better error when check_copies fails 7457 (sgugger)
- Multi-GPU Testing setup 7453 (LysandreJik)
- Fix LXMERT with DataParallel 7471 (LysandreJik)
- Number of GPUs for multi-gpu 7472 (LysandreJik)
- Make transformers install check positive 7473 (FremyCompany)
- Alphabetize model lists 7478 (sgugger)
- Bump isort version. 7484 (sgugger)
- Add forgotten return_dict argument in the docs 7483 (sgugger)
- Enable pegasus fp16 by clamping large activations 7243 (sshleifer)
- Update LayoutLM doc 7388 (al31415)
- Report Tune metrics in final evaluation 7507 (krfricke)
- Fix Ray Tune progress_reporter kwarg 7508 (krfricke)
- [Seq2Seq] Fix a couple of bugs and clean examples 7474 (patrickvonplaten)
- [Attention Mask] Fix data type 7513 (patrickvonplaten)
- Fix seq2seq example test 7518 (sgugger)
- Remove labels from the RagModel example 7560 (sgugger)
- added script for fine-tuning roberta for sentiment analysis task 7505 (DhavalTaunk08)
- LayoutLM: add exception handling for bbox values 7452 (al31415)
- Cleanup documentation for BART, Marian, MBART and Pegasus 7523 (sgugger)
- Add Electra unexpected keys 7569 (LysandreJik)
- Fix tokenization in SQuAD for RoBERTa, Longformer, BART 7387 (tholor)
- docs(pretrained_models): fix num parameters 7575 (amineabdaoui)
- Update Code example according to deprecation of AutoModeWithLMHead 7555 (jshamg)
- Allow soft dependencies in the namespace with ImportErrors at use 7537 (sgugger)
- Fix post_init of some TrainingArguments 7525 (sgugger)
- Check and update model list in index.rst automatically 7527 (sgugger)
- Expand test to locate flakiness 7580 (sgugger)
- Custom TF weights loading 7422 (jplu)
- Documentation fixes 7585 (sgugger)
- Documentation framework toggle should stick 7586 (LysandreJik)
- Support T5 Distillation w/hidden state supervision 7599 (sshleifer)
- [makefile] check only .py files 7588 (stas00)
- [TF generation] Fix typo 7582 (SidJain1412)
- change return dicitonary for DataCollatorForNextSentencePrediction from masked_lm_labels to labels 7595 (gmihaila)
- Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch 7598 (AdrienDS)
- typo fix 7611 (agemagician)
- [bart] fix config.classif_dropout 7593 (sshleifer)
- [s2s] save first batch to json for debugging purposes 6810 (sshleifer)
- Add GPT2ForSequenceClassification based on DialogRPT 7501 (LysandreJik)
- Fix wrong reference name/filename in docstring of `SquadProcessor` 7616 (phiyodr)
- Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH 7610 (GabrielePicco)
- Add GPT2 to sequence classification auto model 7630 (LysandreJik)
- Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load 6935 (w4nderlust)
- Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer 7141 (thomwolf)
- Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) 7658 (thomwolf)
- Fix RobertaForCausalLM docs 7642 (LysandreJik)
- [s2s] configure lr_scheduler from command line 7641 (patil-suraj)
- [pseudo] Switch URLS to CDN 7661 (sshleifer)
- [s2s] Switch README urls to cdn 7670 (sshleifer)
- fix nn.DataParallel compatibility with PyTorch 1.5 7671 (guhur)
- Update XLM-RoBERTa pretrained model details 7669 (noahtren)
- Fix dataset cardinality 7678 (jplu)
- [pegasus] Faster tokenizer tests 7672 (stas00)
- Delete extra test file in repo root 7681 (sshleifer)
- Better links for models in README and doc index 7680 (sgugger)
- Import integration libraries first 7650 (dsblank)
- Fix title level in Blenderbot doc 7687 (sgugger)
- Fix flaky test in test_trainer 7689 (sgugger)
- Adds license information for default and distilbert models 7688 (ankane)
- Fix docstring in AutoModel class 7694 (al31415)
- [examples] bump pl=0.9.0 7053 (sshleifer)
- Corrected typo: maked → masked 7703 (miggymigz)
- fixed typo in warning line 207. 7718 (Berowne)
- Fix typo in all model docs 7714 (sgugger)
- Fix check for xla in PreTrainedModel.save_pretrained() 7699 (fteufel)
- Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural. 7696 (AndreaSottana)
- The input training data files (multiple files in glob format). 7717 (kfkelvinng)
- Fix trainer callback 7720 (cccntu)
- Fix tf text class 7724 (jplu)
- Fix 7731 7732 (LysandreJik)
- Fix 3 failing slow bart/blender tests 7652 (sshleifer)
- Add license info to nlptown/bert-base-multilingual-uncased-sentiment 7738 (alexcombessie)
- [marian] Automate Tatoeba-Challenge conversion 7709 (sshleifer)
- ElectraTokenizerFast 7754 (LysandreJik)
- Gpt1 for sequence classification 7683 (fmcurti)
- [Rag] Fix loading of pretrained Rag Tokenizer 7756 (patrickvonplaten)
- Do not softmax when num_labels==1 7726 (LysandreJik)
- Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 7742 (noamwies)
- fixed lots of typos. 7758 (NieTiger)
- Adding optional trial argument to model_init 7759 (madlag)
- Faster pegasus tokenization test with reduced data size 7762 (sshleifer)
- Fix bert position ids in DPR convert script 7776 (lhoestq)
- Add batch inferencing support for GPT2LMHeadModel 7552 (cccntu)
- fix examples/rag imports, tests 7712 (sshleifer)
- Fix TF savedmodel in Roberta 7795 (jplu)
- Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. 7728 (Narsil)
- Upgrading in pipelines TFAutoModelWithLMHead to new Causal/Masked/Seq2Seq LM classes 7730 (Narsil)
- fix wandb/comet problems 7830 (stas00)
- [utils/check_copies.py] fix DeprecationWarning 7834 (stas00)
- [DOC] Typo and fix the input of labels to `cross_entropy` 7841 (katarinaslama)
- [seq2seq] get_git_info fails gracefully 7843 (stas00)
- [Pipelines] Fix links to model lists 7826 (julien-c)
- Herbert polish model 7798 (rmroczkowski)
- [cleanup] assign todos, faster bart-cnn test 7835 (sshleifer)
- Remove masked_lm_labels from returned dictionary in DataCollatorForNextSentencePrediction 7818 (vblagoje)
- [testing] fix/hide warnings 7837 (stas00)
- Small fixes to HP search 7839 (sgugger)
- [testing] disable FutureWarning in examples tests 7842 (stas00)
- Fix missing reference titles in retrieval evaluation of RAG 7817 (lhoestq)
- [seq2seq testing] improve readability 7845 (stas00)
- [s2s testing] turn all to unittests, use auto-delete temp dirs 7859 (stas00)
- Fix Rag example docstring 7872 (patrickvonplaten)
- Remove duplicated mish activation function 7856 (Razcle)
- [tests] fix slow bart cnn test, faster marian tests 7888 (sshleifer)
- Fix small type hinting error 7820 (AndreaSottana)
- Add support to provide initial tokens to decoder of encoder-decoder type models 7577 (ayushtiku5)
- style: fix typo 7883 (rememberYou)
- [testing] remove USE_CUDA 7861 (stas00)
- [CIs] report slow tests add --durations=0 to some pytest jobs 7884 (stas00)
- style: fix typo in the README 7882 (rememberYou)
- [RAG] Propagating of n_docs as parameter to all RagModel's related functions 7891 (lalitpagaria)
- Trainer with Iterable Dataset 7858 (j-rossi-nl)
- Allow Custom Dataset in RAG Retriever 7763 (lhoestq)
- Modelling Encoder-Decoder | Error :- `decoder_config` used before intialisation 7903 (ayubSubhaniya)
- [Docstring] fix t5 training docstring 7911 (patrickvonplaten)
- Raise error when using AMP on non-CUDA device 7869 (BramVanroy)
- [EncoderDecoder] Fix Typo 7915 (patrickvonplaten)
- [testing] rename skip targets + docs 7863 (stas00)

3.3.1

Not secure

Fixes errors due to the name conflicts between the datasets library and local folder or modules named datasets.

3.3.0

Not secure

RAG

RAG Model

The RAG model is a retrieval-augmented generation model that can be leveraged for question-answering tasks using `RagTokenForGeneration` or `RagSequenceForGeneration` as proposed in [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.

It was added to the library in PyTorch with the following checkpoints:

- `facebook/rag-token-nq`
- `facebook/rag-sequence-nq`
- `facebook/rag-token-base`
- `facebook/rag-sequence-base`

Contributions:
- **RAG** 6813 (ola13)
- [RAG] Add `attention_mask` to RAG generate 7373 (patrickvonplaten)
- [RAG] Add missing doc and attention_mask to rag 7382 (patrickvonplaten)
- [Rag] Fix wrong usage of `num_beams` and `bos_token_id` in Rag Sequence generation 7386 (patrickvonplaten)
- [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests 7372 (lhoestq)
- [RAG] Remove dependency on `examples/seq2seq` from rag 7395 (ola13)
- [Rag] fix rag retriever save_pretrained method 7399 (patrickvonplaten)
- [RAG] Clean Rag readme in examples 7413 (ola13)
- [RAG] Model cards - clean cards 7420 (patrickvonplaten)
- Document RAG again 7377 (sgugger)

Bug fixes and improvements

- Mark big downloads slow 7325 (sgugger)
- [Bug Fix] The actual batch_size is inconsistent with the settings. 7235 (HuangLianzhe)
- Fixed results of SQuAD-FR evaluation 7313 (psorianom)
- [s2s] add supported architecures to MD 7252 (sshleifer)
- Add num workers cli arg 7322 (chadykamar)
- [s2s] add src_lang kwarg for distributed eval 7300 (sshleifer)
- [s2s] only save metrics.json from rank zero 7331 (sshleifer)
- [code quality] fix confused flake8 7309 (stas00)
- [testing] skip decorators: docs, tests, bugs 7334 (stas00)
- Fixed evaluation_strategy on epoch end bug 7340 (WissamAntoun)
- Models doc 7345 (sgugger)
- Ensure that integrations are imported before transformers or ml libs 7330 (dsblank)
- [Benchmarks] Change all args to from `no_...` to their positive form 7075 (fmcurti)
- Remove reference to args in XLA check 7344 (ZeroCool2u)
- wip: Code to add lang tags to marian model cards 6586 (sshleifer)
- Expand a bit the documentation doc 7350 (sgugger)
- Check decorator order 7326 (sgugger)
- Update modeling_tf_longformer.py 7359 (Line290)
- Updata tokenization_auto.py 6870 (hjptriplebee)
- Update the TF models to remove their interdependencies 7238 (jplu)
- Make PyTorch model files independent from each other 7352 (sgugger)
- Clean RAG docs and template docs 7348 (sgugger)
- Fixing case in which `Trainer` hung while saving model in distributed training 7365 (TevenLeScao)
- Formatter 7368 (LysandreJik)
- [seq2seq] make it easier to run the scripts 7274 (stas00)
- Remove mentions of RAG from the docs 7376 (sgugger)
- [fsmt] build/test scripts 7257 (stas00)
- [s2s] distributed eval allows num_return_sequences > 1 7254 (sshleifer)
- Seq2SeqTrainer 6769 (patil-suraj)
- modeling_bart: 3 small cleanups that dont change outputs 7381 (sshleifer)
- Check config type using `type` instead of `isinstance` 7363 (LysandreJik)
- [s2s, examples] minor doc changes 7385 (patil-suraj)
- Remove unhelpful bart warning 7391 (sshleifer)
- [code quality] new make target that combines style and quality targets 7310 (stas00)
- Speedup check_copies script 7394 (sgugger)
- Fix BartModel output documentation 7390 (sgugger)
- Fix FP16 and attention masks in FunnelTransformer 7374 (sgugger)
- [Longformer, Bert, Roberta, ...] Fix multi gpu training 7272 (patrickvonplaten)
- [s2s] add create student script 7290 (patil-suraj)
- [s2s] rougeLSum expects \n between sentences 7410 (sshleifer)
- [T5] allow config.decoder_layers to control decoer size 7409 (sshleifer)
- Flos fix 7384 (marrrcin)
- Catch PyTorch warning when saving/loading scheduler 7401 (sgugger)
- Pull request template 7392 (LysandreJik)
- Reorganize documentation navbar 7423 (sgugger)

Page 19 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 19 of 26

4.0.0rc1

3.5.1

3.5.0

3.4.0

3.3.1

3.3.0

Page 19 of 26

Links

Releases