This let us reorganize the example scripts completely for a cleaner codebase.
The main features of the Trainer are:
- Same user-facing API for PyTorch and TF 2
- Support for CPU, GPU, Multi-GPU, and TPU
- Easier than ever to share your fine-tuned models
**The TFTrainer was largely contributed by awesome community member jplu!** 🔥 🔥
A few additional features of the example scripts are:
- Generate argparsers from type hints on dataclasses
- Can load arguments from json files
- Logging through TensorBoard and wandb
Documentation for the Trainer is still work-in-progress, please consider contributing improvements.
TPU Support
- Both the TensorFlow and PyTorch trainers have TPU support (jplu, LysandreJik, julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to `torch.distributed`.
- This was built with the support of jysohn23, member of the Google TPU team
---
Multilingual BART (sshleifer)
New BART checkpoint converted: this adds `mbart-en-ro model`, a BART variant finetuned on english-romanian translation.
Improved support for `huggingface/tokenizers`
- Additional tests and support has been added to `huggingface/tokenizers` tokenizers. (mfuntowicz, thomwolf)
- TensorFlow models work out-of-the-box with the new tokenizers (LysandreJik)
Decoder caching for T5 (patrickvonplaten)
Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.
Breaking change
This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).
Encoder-Decoder enhancements
- Apply Encoder Decoder 1.5GB memory savings to TF as well (patrickvonplaten, translation of same work on PyTorch models by sshleifer)
- BART Summarization fine-tuning script now works for T5 as well (sshleifer)
- Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (patrickvonplaten)
Additional model architectures
Question Answering support for Albert and Roberta in TF with (Pierrci):
- Question Answering support for Albert and Roberta in TF
- TFAlbertForQuestionAnswering
Pipelines
- The question answering pipeline now handles impossible answers (bryant1410)
- Remove tqdm logging (mfuntowicz)
- Sentiment analysis pipeline can now handle more than two sequences (xxbidiao)
- Rewritten batch support in pipelines (mfuntowicz)
Text Generation pipeline (enzoampil)
Implements a text generation pipeline, `GenerationPipeline`, which works on any `ModelWithLMHead` head.
Fixes and improvements
- Clean the generate testing functions (patrickvonplaten)
- Notebooks updated in the documentation (LysandreJik)
- Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (ethanjperez)
- Fixed RoBERTa conversion script (myleott)
- Speedup torch summarization tests (sshleifer)
- Optimize causal mask using torch.where (Akababa)
- Improved benchmarking utils (patrickvonplaten)
- Fixed edge case for bert tokenization (patrickvonplaten)
- SummarizationDataset cleanup (sshleifer)
- BART: Replace config.output_past with use_cache kwarg (sshleifer)
- Better documentation for Summarization and Translation pipeline (julien-c)
- Additional documentation for model cards (julien-c)
- Fix force_download of files on Windows (calpt)
- Fix shuffling issue for distributed training (elk-cloner)
- Shift labels internally within TransfoXLLMHeadModel when called with labels (TevenLeScao)
- Remove `output_past` everywhere and replace by `use_cache` argument (patrickvonplaten)
- Added unit test for run_bart_sum (sshleifer)
- Cleaner code by factorizating a few methods back in the `PreTrainedModel` (sshleifer)
- [Bert] remove hard-coded pad token id (patrickvonplaten)
- Clean pipelines test and remove unnecessary code (patrickvonplaten)
- JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (LysandreJik)
- Change newstest2013 to newstest2014 and clean up (patrickvonplaten)
- Factor out tensor conversion method in `PretrainedTokenizer` (sshleifer)
- Remove tanh torch warnings (aryanshomray)
- Fix token_type_id in BERT question-answering example (siboehm)
- Add CircleCI workflow to build docs for preview (harupy)
- Higher tolerance for past testing in T5 and TF T5 (patrickvonplaten)
- XLM tokenizer should encode with bos token (LysandreJik)
- XLM tokenizer should encode with bos token (patrickvonplaten)
- fix summarization do_predict (sshleifer)
- Encode to max length of input not max length of tokenizer for batch input (patrickvonplaten)
- Add `qas_id` to SquadResult and SquadExample (jarednielsen)
- Fix bug in run_*.py scripts: double wrap into DataParallel during eval (and-kul)
- Fix torchhub integration (julien-c)
- Fix TFAlbertForSequenceClassification classifier dropout probability (jarednielsen)
- Change uses of pow(x, 3) to pow(x, 3.0) (mneilly-et)
- Shuffle train subset for summarization example (Colanim)
- Removed the boto3 dependency (julien-c)
- Add dialogpt training tips (patrickvonplaten)
- Generation can now start with an empty prompt (patrickvonplaten)
- GPT-2 is now traceable (jazzcook15)
- Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (sshleifer)
- Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (thomwolf)
- Now using CDN urls for weights (julien-c)
- [Fix common tests on GPU] send model, ids to torch_device (sshleifer)
- Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (jarednielsen)
- Additional metadata to traing arguments (parmarsuraj99)
- [ci] Load pretrained models into the default (long-lived) cache (julien-c)
- add timeout_decorator to tests (sshleifer)
- Added XLM-R to the multilingual section in the documentation (stefan-it)
- Better `num_labels` in configuration objects
- Updated pytorch lightning scripts (williamFalcon)
- Tests now pass with torch 1.5.0 (LysandreJik)
- Ensure fast tokenizer can construct single-element tensor without pad token (mfuntowicz)