Product Research Enterprise Plans Docs

Transformers

Latest version: v4.41.0

Safety actively analyzes 631178 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 23 of 26

2.2.2

Not secure

Patched error where the tokenizers would split the special tokens.

2.2.1

Not secure

Input shapes

This patch fixes a bug related to the input shape in several models in TensorFlow.

Tokenization message

A tokenization message was too present and overloaded the output, hiding the relevant information. It was removed.

2.2.0

Not secure

New model architectures: ALBERT, CamemBERT, GPT2-XL, DistilRoberta

Four new models have been added in v2.2.0

- ALBERT (Pytorch & TF) (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
- CamemBERT (Pytorch) (from Facebook AI Research, INRIA, and La Sorbonne Université), as the first large-scale Transformer language model. Released alongside the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah, and Benoît Sagot. It was added by louismartin with the help of julien-c.
- DistilRoberta (Pytorch & TF) from VictorSanh as the third distilled model after DistilBERT and DistilGPT-2.
- GPT-2 XL (Pytorch & TF) as the last GPT-2 checkpoint released by OpenAI

Encoder-Decoder architectures

We welcome the possibility to create fully seq2seq models by incorporating Encoder-Decoder architectures using a `PreTrainedEncoderDecoder` class that can be initialized from pre-trained models. The base BERT class has be modified so that it may behave as a decoder.

Furthermore, a `Model2Model` class that simplifies the definition of an encoder-decoder when both encoder and decoder are based on the same model has been added. rlouf

Benchmarks and performance improvements

Works by tlkh and LysandreJik aiming to benchmark the library models with different technologies: with TensorFlow and Pytorch, with mixed precision (AMP and FP-16) and with model tracing (Torchscript and XLA). A new section was created in the documentation: [benchmarks](https://huggingface.co/transformers/benchmarks.html) pointing to Google sheets with the results.

Breaking changes

__Tokenizers now add special tokens by default.__ LysandreJik

New model templates

Model templates to ease the addition of new models to the library have been added. thomwolf

Inputs Embeddings

A new input has been added to all models' `forward` (for Pytorch) and `call` (for TensorFlow) methods. These `inputs_embeds` are a direct embedded representation. This is useful as it gives more control over how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. julien-c

Getters and setters for input and output embeddings

A new API for the input and output embeddings are available. These methods are model-independent and allow easy acquisition/modification of the models' embeddings. thomwolf

Additional architectures

New model architectures are available, namely: `DistilBertForTokenClassification`, `CamembertForTokenClassification` stefan-it

Community additions/bug-fixes/improvements

- The Fairseq RoBERTa model conversion script has been patched. louismartin
- einsum now runs in FP-16 in the library's examples slayton58
- In-depth work on the squad script for XLNet to reproduce the original paper's results hlums
- Additional improvements on the run_squad script by WilliamTambellini, orena1
- The run_generation script has seen several improvements by leo-du
- The RoBERTaTensorFlow model has been patched for several use-cases: TPU and keras.fit LysandreJik
- The documentation is now versioned, links are available on the github readme LysandreJik
- The run_ner script has seen several improvements mmaybeno, oneraghavan, manansanghi
- The run_tf_glue script now works for all GLUE tasks LysandreJik
- The run_lm_finetuning script now correctly evaluates perplexity on MLM tasks altsoph
- An issue related to the XLM TensorFlow implementation's training has been fixed tlkh
- run_bertology has been updated to be closer to the run_glue example adrianbg
- Fixed added special tokens in decoded sequences LysandreJik
- Several performance improvements have been done to the tokenizers iedmrc
- A memory leak has been identified and patched in the library's schedulers rlouf
- Correct warning when encoding a sequence too long while specifying a maximum length LysandreJik
- Resizing the token embeddings now works as expected in the run_lm_finetuning script iedmrc
- The difference in versions between Pypi/source in order to run the examples has been clarified rlouf

2.1.1

Not secure

New model architectures: CTRL, DistilGPT-2

Two new models have been added since release 2.0.

- CTRL (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858), by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher. This model has been added to the library by keskarnitish with the help of thomwolf.
- DistilGPT-2 (from HuggingFace), as the second distilled model after DistilBERT in version 1.2.0. Released alongside the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108)

Distillation

Several updates have been made to the distillation script, including the possibility to distill GPT-2 and to distill on the SQuAD task. By VictorSanh.

Pytorch TPU support

The `run_glue.py` example script can now run on a Pytorch TPU.

Updates to example scripts

Several example scripts have been improved and refactored to use the full potential of the new tokenizer functions:

- `run_multiple_choice.py` has been refactored to include `encode_plus` by julien-c and erenup
- `run_lm_finetuning.py` has been improved with the help of dennymarcels, jinoobaek-qz and LysandreJik
- `run_glue.py` has been improved with the help of brian41005

QOL enhancements on the tokenizer

Enhancements have been made on the tokenizers. Two new methods have been added: `get_special_tokens_mask ` and `truncate_sequences `.

The former returns a mask indicating which tokens are special tokens in a token list, and which are tokens from the initial sequences. The latter truncate sequences according to a strategy.

Both of those methods are called by the `encode_plus` method, which itself is called by the `encode` method. The `encode_plus` now returns a larger dictionary which holds information about the special tokens, as well as the overflowing tokens.

Thanks to julien-c, thomwolf, and LysandreJik for these additions.

New German BERT models

- Support for new German BERT models (cased and uncased) from stefan-it dbmdz

Breaking changes

- The two methods `add_special_tokens_single_sequence` and `add_special_tokens_sequence_pair` have been removed. They have been replaced by the single method `build_inputs_with_special_tokens ` which has a more comprehensible name and manages both sequence singletons and pairs.

- The boolean parameter `truncate_first_sequence ` has been removed in tokenizers' `encode` and `encode_plus` methods, being replaced by a strategy in the form of a string: 'longest_first', 'only_second', 'only_first' or 'do_not_truncate' are accepted strategies.

- When the `encode` or `encode_plus` methods are called with a specified `max_length`, the sequences will now always be truncated or throw an error if overflowing.

Guidelines and requirements

New contributing guidelines have been added, alongside library development requirements by rlouf, the newest member of the HuggingFace team.

Community additions/bug-fixes/improvements

- GLUE Processors have been refactored to handle inputs for all tasks coming from the `tensorflow_datasets`. This work has been done by agrinh and philipp-eisen.
- The padding_idx is now correctly initialized to 1 in randomly initialized RoBERTa models. ikuyamada
- The documentation CSS has been adapted to work on older browsers. TimYagan
- An addition concerning the management of hidden states has been added to the README by BramVanroy.
- Integration of TF 2.0 models with other Keras modules thomwolf
- Past values can be opted-out thomwolf

2.1.0

Not secure

2.0.0

Not secure

Name change: welcome 🤗 Transformers

Following the extension to TensorFlow 2.0, `pytorch-transformers` => `transformers`

Install with `pip install transformers`

Also, note that PyTorch is **no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch** to be able to use (and load) the models.

TensorFlow 2.0 - PyTorch

All the PyTorch `nn.Module` classes now have their counterpart in TensorFlow 2.0 as `tf.keras.Model` classes. TensorFlow 2.0 classes have the same name as their PyTorch counterparts prefixed with `TF`.

The interoperability between TensorFlow and PyTorch is actually **a lot deeper** than what is usually meant when talking about libraries with multiple backends:
- each model (not just the static computation graph) can be seamlessly moved from one framework to the other during the lifetime of the model for training/evaluation/usage (`from_pretrained` can load weights saved from models saved in one or the other framework),
- an example is given in the quick-tour on TF 2.0 and PyTorch in the readme in which a model is trained using keras.fit before being opened in PyTorch for quick debugging/inspection.

Remaining unsupported operations in TF 2.0 (to be added later):
- resizing input embeddings to add new tokens
- pruning model heads

TPU support
Training on TPU using free TPUs provided in the TensorFlow Research Cloud (TFRC) program is possible but requires to implement a custom training loop (not possible with keras.fit at the moment).
We will add an example of such a custom training loop soon.

Improved tokenizers

Tokenizers have been improved to provide extended encoding methods `encoding_plus` and additional arguments to `encoding`. Please refer to the doc for detailed usage of the new options.

Breaking changes

Positional order of some model keywords inputs changed (better TorchScript support)

To be able to better use Torchscript both on CPU and GPUs (see 1010, 1204 and 1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.

If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any breaking change.

If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you should double-check the exact order of input arguments.

Dependency requirements have changed

PyTorch is no longer in the requirements so don't forget to install TensorFlow 2.0 and/or PyTorch to be able to use (and load) the models.

Renamed method

The method `add_special_tokens_sentence_pair` has been renamed to the more appropriate name `add_special_tokens_sequence_pair`.
The same holds true for the method `add_special_tokens_single_sentence` which has been changed to `add_special_tokens_single_sequence`.

Community additions/bug-fixes/improvements
- new German model (Timoeller)
- new script for MultipleChoice training (SWAG, RocStories...) (erenup)
- better fp16 support (ziliwang and bryant1410)
- fix evaluation in run_lm_finetuning (SKRohit)
- fiw LM finetuning to prevent crashing on assert len(tokens_b)>=1 (searchivarius)
- Various doc and docstring fixes (sshleifer, Maxpa1n, mattolson93, t080)

Page 23 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 23 of 26

2.2.2

2.2.1

2.2.0

2.1.1

2.1.0

2.0.0

Page 23 of 26

Links

Releases