Transformers

Latest version: v4.41.0

Safety actively analyzes 631274 Python packages for vulnerabilities to keep your Python projects secure.

Page 21 of 26

2.10.0

Not secure

Reformer (patrickvonplaten)

- Added a new model "Reformer": https://arxiv.org/abs/2001.04451 to the library. Original trax code: https://github.com/google/trax/tree/master/trax/models/reformer was translated to PyTorch.

- Reformer uses chunked attention and reversible layers to model sequences as long as 500,000 tokens.

- Reformer is currently available as a casual language model and will soon also be available as encoder only ("Bert"-like) model.

- Two pretrained weights are uploaded: https://huggingface.co/models?search=google%2Freformer

- https://huggingface.co/google/reformer-enwik8 is the first char lm in the library

Additional architectures

- The `ElectraForSequenceClassification` was added by liuzzi

Trainer Tweaks and fixes (LysandreJik, julien-c )

TPU (LysandreJik):
- Model saving, as well as optimizer and scheduler saving mid-training were hanging
- Fixed the optimizer weight updates

Trainer (julien-c)
- Fixed the `nn.DataParallel` support compatibility for PyTorch `v1.5.0`
- Distributed evaluation: SequentialDistributedSampler + gather all results
- Move model to correct device
- Map optimizer to correct device after loading from checkpoint (shaoyent)

QOL: Tokenization, Pipelines

- New method for all tokenizers: `tokenizer.decode_batch`, to decode an entire batch (sshleifer)
- the NER pipeline now returns entity groups (enzoampil)

ONNX Conversion script (mfuntowicz)

- Added a [conversion](https://github.com/huggingface/transformers/blob/master/src/transformers/convert_graph_to_onnx.py) script to convert both PyTorch/TensorFlow models to ONNX.
- Added a [notebook](https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb) explaining how it works

Community notebooks

We've started adding community notebooks to the repository. Three notebooks have made their way into our codebase:

- [Training T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) patil-suraj
- [Fine-tuning T5](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb)
- [Fine-tuning DialoGPT](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) ncoop57

Predict stage for GLUE task, easy submit to gluebenchmark.com

-Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (stdcoutzyx)

Fixes and improvements

- Support flake8 3.8 (julien-c)
- Tests are now faster thanks to using dummy smaller models (sshleifer)
- Fixed the eval loss in the trainer (patil-suraj)
- Fixed the `p_mask` in SQuAD pre-processing (LysandreJik)
- Github Actions pytorch test are no longer pinned to `torch==1.4.0` (mfuntowicz)
- Fixed the multiple-choice script with overflowing tokens (LysandreJik)
- Allow for `None` values in `GradientAccumulator` (jarednielsen, improved by jplu)
- MBart tokenizer saving/loading id was fixed (Mehrad0711)
- TF generation: Fix issue for batch output generation of different output length.(patrickvonplaten)
- Fixed the FP-16 support in the T5 model (patrickvonplaten)
- `run_language_modeling` fix: actually use the `overwrite_cache` argument (borisdayma)
- Better, version compatible way to get the learning rate in the trainer (rakeshchada)
- Fixed the slow tests that were failing on GPU (sshleifer, patrickvonplaten, LysandreJik)
- ONNX conversion tokenizer fix (RensDimmendaal)
- Correct TF formatting to exclude LayerNorms from weight decay (oliverastrand)
- Removed warning of deprecation (Colanim)
- fix no grad in second pruning in run_bertology (TobiasLee)

2.9.1

Not secure

Marian (sshleifer)
- A new model architecture, `MarianMTModel` with 1,008+ pretrained weights is available for machine translation in PyTorch.
- The corresponding `MarianTokenizer` uses a `prepare_translation_batch` method to prepare model inputs.
- All pretrained model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`
- See [docs](https://huggingface.co/transformers/model_doc/marian.html) for information on pretrained model discovery and naming, or find your language [here](https://huggingface.co/models?search=Helsinki-NLP)

AlbertForPreTraining (jarednielsen)

A new model architecture has been added: `AlbertForPreTraining` in both PyTorch and TensorFlow

TF 2.2 compatibility (mfuntowicz, jplu)

Changes have been made to both the TensorFlow scripts and our internals so that we are compatible with TensorFlow 2.2

TFTrainer now supports new tasks

- Multiple choice has been added to the TFTrainer (ViktorAlm)
- Question Answering has been added to the TFTrainer (jplu)

Fixes and improvements

- Fixed a bug with the tf generation pipeline (patrickvonplaten)
- Fixed the XLA spawn (julien-c)
- The sentiment analysis pipeline tokenizer was cased while the model was uncased (mfuntowicz)
- Albert was added to the conversion CLI (fgaim)
- CamemBERT's token ID generation from tokenizer were removed like RoBERTa, as the model does not use them (LysandreJik)
- Additional migration documentation was added (guoquan)
- GPT-2 can now be exported to ONNX (tianleiwu)
- Simplify cache vars and allow for TRANSFORMERS_CACHE env (BramVanroy)
- Remove hard-coded pad token id in distilbert and albert (monologg)
- BART tests were fixed on GPU (julien-c)
- Better wandb integration (vanpelt, borisdayma, julien-c)

2.9

This let us reorganize the example scripts completely for a cleaner codebase.

The main features of the Trainer are:
- Same user-facing API for PyTorch and TF 2
- Support for CPU, GPU, Multi-GPU, and TPU
- Easier than ever to share your fine-tuned models

**The TFTrainer was largely contributed by awesome community member jplu!** 🔥 🔥

A few additional features of the example scripts are:
- Generate argparsers from type hints on dataclasses
- Can load arguments from json files
- Logging through TensorBoard and wandb

Documentation for the Trainer is still work-in-progress, please consider contributing improvements.

TPU Support

- Both the TensorFlow and PyTorch trainers have TPU support (jplu, LysandreJik, julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to `torch.distributed`.
- This was built with the support of jysohn23, member of the Google TPU team

---

Multilingual BART (sshleifer)

New BART checkpoint converted: this adds `mbart-en-ro model`, a BART variant finetuned on english-romanian translation.

Improved support for `huggingface/tokenizers`

- Additional tests and support has been added to `huggingface/tokenizers` tokenizers. (mfuntowicz, thomwolf)
- TensorFlow models work out-of-the-box with the new tokenizers (LysandreJik)

Decoder caching for T5 (patrickvonplaten)

Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.

Breaking change

This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).

Encoder-Decoder enhancements

- Apply Encoder Decoder 1.5GB memory savings to TF as well (patrickvonplaten, translation of same work on PyTorch models by sshleifer)
- BART Summarization fine-tuning script now works for T5 as well (sshleifer)
- Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (patrickvonplaten)

Additional model architectures

Question Answering support for Albert and Roberta in TF with (Pierrci):
- Question Answering support for Albert and Roberta in TF
- TFAlbertForQuestionAnswering

Pipelines

- The question answering pipeline now handles impossible answers (bryant1410)
- Remove tqdm logging (mfuntowicz)
- Sentiment analysis pipeline can now handle more than two sequences (xxbidiao)
- Rewritten batch support in pipelines (mfuntowicz)

Text Generation pipeline (enzoampil)

Implements a text generation pipeline, `GenerationPipeline`, which works on any `ModelWithLMHead` head.

Fixes and improvements

- Clean the generate testing functions (patrickvonplaten)
- Notebooks updated in the documentation (LysandreJik)
- Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (ethanjperez)
- Fixed RoBERTa conversion script (myleott)
- Speedup torch summarization tests (sshleifer)
- Optimize causal mask using torch.where (Akababa)
- Improved benchmarking utils (patrickvonplaten)
- Fixed edge case for bert tokenization (patrickvonplaten)
- SummarizationDataset cleanup (sshleifer)
- BART: Replace config.output_past with use_cache kwarg (sshleifer)
- Better documentation for Summarization and Translation pipeline (julien-c)
- Additional documentation for model cards (julien-c)
- Fix force_download of files on Windows (calpt)
- Fix shuffling issue for distributed training (elk-cloner)
- Shift labels internally within TransfoXLLMHeadModel when called with labels (TevenLeScao)
- Remove `output_past` everywhere and replace by `use_cache` argument (patrickvonplaten)
- Added unit test for run_bart_sum (sshleifer)
- Cleaner code by factorizating a few methods back in the `PreTrainedModel` (sshleifer)
- [Bert] remove hard-coded pad token id (patrickvonplaten)
- Clean pipelines test and remove unnecessary code (patrickvonplaten)
- JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (LysandreJik)
- Change newstest2013 to newstest2014 and clean up (patrickvonplaten)
- Factor out tensor conversion method in `PretrainedTokenizer` (sshleifer)
- Remove tanh torch warnings (aryanshomray)
- Fix token_type_id in BERT question-answering example (siboehm)
- Add CircleCI workflow to build docs for preview (harupy)
- Higher tolerance for past testing in T5 and TF T5 (patrickvonplaten)
- XLM tokenizer should encode with bos token (LysandreJik)
- XLM tokenizer should encode with bos token (patrickvonplaten)
- fix summarization do_predict (sshleifer)
- Encode to max length of input not max length of tokenizer for batch input (patrickvonplaten)
- Add `qas_id` to SquadResult and SquadExample (jarednielsen)
- Fix bug in run_*.py scripts: double wrap into DataParallel during eval (and-kul)
- Fix torchhub integration (julien-c)
- Fix TFAlbertForSequenceClassification classifier dropout probability (jarednielsen)
- Change uses of pow(x, 3) to pow(x, 3.0) (mneilly-et)
- Shuffle train subset for summarization example (Colanim)
- Removed the boto3 dependency (julien-c)
- Add dialogpt training tips (patrickvonplaten)
- Generation can now start with an empty prompt (patrickvonplaten)
- GPT-2 is now traceable (jazzcook15)
- Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (sshleifer)
- Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (thomwolf)
- Now using CDN urls for weights (julien-c)
- [Fix common tests on GPU] send model, ids to torch_device (sshleifer)
- Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (jarednielsen)
- Additional metadata to traing arguments (parmarsuraj99)
- [ci] Load pretrained models into the default (long-lived) cache (julien-c)
- add timeout_decorator to tests (sshleifer)
- Added XLM-R to the multilingual section in the documentation (stefan-it)
- Better `num_labels` in configuration objects
- Updated pytorch lightning scripts (williamFalcon)
- Tests now pass with torch 1.5.0 (LysandreJik)
- Ensure fast tokenizer can construct single-element tensor without pad token (mfuntowicz)

2.9.0

Not secure

Trainer & TFTrainer (julien-c)

2.8.0

Not secure

ELECTRA Model (LysandreJik)

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

This release comes with 6 ELECTRA checkpoints:

- `google/electra-small-discriminator`
- `google/electra-small-generator`
- `google/electra-base-discriminator`
- `google/electra-base-generator`
- `google/electra-large-discriminator`
- `google/electra-large-generator`

Related:

- [Paper](https://openreview.net/pdf?id=r1xMH1BtvB)
- [Original implementation in TF 1.x](https://github.com/google-research/electra)
- 📊 [**Model cards** on huggingface.co](https://huggingface.co/models?search=electra)
- [Docs](https://huggingface.co/transformers/model_doc/electra.html)

Thanks to the author clarkkev for his help during the implementation.

Thanks to community members hfl-rc stefan-it shoarora for already sharing more fine-tuned Electra variants!

Bad word filters in `generate` (patrickvonplaten)

The `generate` method now has a bad word filter.

Fixes and improvements

- Decoder input ids are not necessary for T5 training anymore (patrickvonplaten)
- Update encoder and decoder on set_input_embedding for BART (sshleifer)
- Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (ethanjperez)
- Clean summarization and translation example testing files for T5 and Bart (patrickvonplaten)
- Cleaner examples (julien-c)
- Extensive testing for T5 model (patrickvonplaten)
- Force models outputs to always have batch_size as their first dim (patrickvonplaten)
- Fix for continuing training in some scripts (xeb)
- Resizing embedding matrix before sending it to the optimizer (ngarneau)
- BertJapaneseTokenizer accept options for mecab (tamuhey)
- Speed up GELU computation with torch.jit (mryab)
- fix argument order of update_mems fn in TF version (patrickvonplaten, dmytyar)
- Split generate test function into beam search, no beam search (patrickvonplaten)

2.7.0

Not secure

T5 Model (patrickvonplaten, thomwolf )

T5 is a powerful encoder-decoder model that formats every NLP problem into a text-to-text format. It achieves state of the art results on a variety of NLP tasks (Summarization, Question-Answering, ...).

Five sets of pre-trained weights (pre-trained on a multi-task mixture of unsupervised and supervised tasks) are released. In ascending order from 60 million parameters to 11 billion parameters:

`t5-small`, `t5-base`, `t5-large`, `t5-3b`, `t5-11b`

T5 can now be used with the translation and summarization pipeline.

Related:
- [paper](https://arxiv.org/pdf/1910.10683.pdf)
- [official code](https://github.com/google-research/text-to-text-transfer-transformer)
- model available in Hugging Face's [community models](https://huggingface.co/models?search=t5)
- [docs](https://huggingface.co/transformers/model_doc/t5.html)

Big thanks to the original authors, especially craffel who helped answer our questions, reviewed PRs and tested T5 extensively.

New BART checkpoint: `bart-large-xsum` (sshleifer)

These weights are from BART finetuned on the XSum abstractive summarization challenge, which encourages shorter (more abstractive) summaries. It achieves state of the art.

BART summarization example with pytorch-lightning (acarrera94)

New example: BART for summarization, using Pytorch-lightning. Trains on CNN/DM and evaluates.

Translation pipeline (patrickvonplaten)

A new pipeline is available, leveraging the T5 model. The T5 model was added to the summarization pipeline as well.

Memory improvements with BART (sshleifer)

In an effort to have the same memory footprint and same computing power necessary to run inference on BART, several improvements have been made on the model:

- Remove the LM head and use the embedding matrix instead (~200MB)
- Call encoder before expanding input_ids (~1GB)
- SelfAttention only returns weights if config.output_attentions (~500MB)
- Two separate, smaller decoder attention masks (~500MB)
- drop columns that are exclusively pad_token_id from input_ids in `evaluate_cnn` example.

TensorFlow models may now be serialized (gthb)

Supports JSON serialization of Keras layers by overriding get_config, so that they can be sent to Tensorboard to display a conceptual graph of the model. TensorFlow models may now be saved using `model.save`, as other Keras models.

New model: XLMForTokenClassification (sakares)

A new head was added to XLM: `XLMForTokenClassification`.

Page 21 of 26

Releases

Has known vulnerabilities

Previous Next

Transformers

Page 21 of 26

2.10.0

2.9.1

2.9

2.9.0

2.8.0

2.7.0

Page 21 of 26

Links

Releases