Changelogs » Tensor2tensor



* New [`RTransformer`]( model, a recurrent Transformer
* New [English-Estonian translation dataset]( thanks to stefan-it
* New `ROC_AUC` metric thanks to jjtan
* Various fixes, improvements, additions, etc.


Minor fix to 1.13.3, please see release notes there.


TODO(afrozm): Document more.

* Various PRs.
* Development on TRAX


* jax, jaxlib moved to extras in

fixed get_standardized_layers spelling, thanks cbockman in 1529
serving utils fixes - Thanks Drunkar ! in 1495
Fixing a checkpoint name bug in 1487, thanks lzhang10

* [DeepMind Math dataset](
* [VideoGlow paper added to T2T Papers.](
* [Mixture Transformer](
* A very basic PPO implementation in TRAX.
* More TRAX and RL changes.

[Correct flat CIFAR modality to not consider 0 as padding](


Bug Fixes:
* RL fixes for Model Based RL in 1505 - thanks koz4k
* Serving util corrections in 1495 by Drunkar -- thanks!
* Fix step size extraction in checkpoints by lzhang10 in 1487 -- thanks!


** Modalities refactor: Thanks to Dustin, all modalities are now an enum and just functions, making it easier to understand what's happening in the model. Thanks Dustin!

**[Model-Based Reinforcement Learning for Atari](** using T2T, please find a nice writeup in at -- thanks a lot to all the authors! lukaszkaiser mbz piotrmilos blazejosinski Roy Campbell konradczechowski doomie Chelsea Finn koz4k Sergey Levine rsepassi George Tucker and henrykmichalewski !

**[TRAX]( = T2T + [JAX]**( - please try out and give us feedback at 1478

New Models:
* Evolved Transformer, thanks stefan-it for adding the paper in 1426
* textCNN model by ybbaigo in 1421

Documentation and Logging:
* MultiProblem by cwbeitel in 1399
* ML Enginge logging in 1390 by lgeiger

Thanks again cwbeitel and lgeiger -- good docs and logging goes a long way for understandability.

Bugs fixed:
* t2t_decoder checkpoint fix in 1471 by wanqizhu
* xrange fix for py3 by in 1468 lgeiger
* Fixing COCO dataset in 1466 by hbrylkowski
* Fix math problems by artitw
* Decoding rev problems enzh by googlehjx on 1389
* And honourable mentions to qixiuai , 1440

Many many thanks wanqizhu lgeiger hbrylkowski artitw googlehjx and qixiuai for finding and fixing these and sorry for missing anyone else -- this is really really helpful.

Code Cleanups:
* Registry refactor and optimizer registry by jackd in 1410 and 1401
* Numerous very nice cleanup PRs ex: 1454 1451 1446 1444 1424 1411 1350 by lgeiger

Many thanks for the cleanups jackd and lgeiger -- and sorry if I missed anyone else.

Summary of changes:

* A lot of code cleanup thanks a ton to lgeiger ! This goes a long way with regards to code maintainability and is much appreciated. Ex: PR 1361 , 1350 , 1344 , 1346 , 1345 , 1324
* Fixing LM decode, thanks mikeymezher - PR 1282
* More fast decoding by gcampax, thanks! - PR 999
* Avoid error on beam search - PR 1302 by aeloyq , thanks!
* Fix invalid list comprehension, unicode simplifications, py3 fixes 1343, 1318 , 1321, 1258   thanks cclauss !
* Fix is_generate_per_split hard to spot bug, thanks a lot to kngxscn in PR 1322
* Fix py3 compatibility issues in PR 1300 by ywkim , thanks a lot again!
* Separate train and test data in MRPC and fix broken link in PR 1281 and 1247  by ywkim - thanks for the hawk eyed change!
* Fix universal transformer decoding by artitw in PR 1257
* Fix babi generator by artitw in PR 1235
* Fix transformer moe in 1233 by twilightdema - thanks!
* Universal Transformer bugs corrected in 1213 by cfiken - thanks!
* Change beam decoder stopping condition, makes decode faster in 965 by mirkobronzi - many thanks!
* Bug fix, problem_0_steps variable by senarvi in 1273
* Fixing a typo, by hsm207 in PR 1329 , thanks a lot!

New Model and Problems:
* New problem and model by artitw in PR 1290 - thanks!
* New model for scalar regression in PR 1332 thanks to Kotober
* Text CNN for classification in PR 1271 by ybbaigo - thanks a lot!
* en-ro translation by lukaszkaiser !
* CoNLL2002 Named Entity Recognition problem added in 1253 by ybbaigo - thanks!

New Metrics:
* Pearson Correlation metrics in 1274 by luffy06 - thanks a lot!
* Custom evaluation metrics, this was one of the most asked features, thanks a lot ywkim in PR 1336
* Word Error Rate metric by stefan-falk in PR 1242 , many thanks!
* SARI score for paraphrasing added.

* Fast decoding !! Huge thanks to aeloyq in 1295
* Fast GELU unit
* Relative dot product visualization PR 1303 thanks aeloyq !
* New MTF models and enhacements, thanks to Noam, Niki and the MTF team
* Custom eval hooks in PR 1284 by theorm - thanks a lot !

Lots of commits to Model Based Reinforcement Learning code by konradczechowski koz4k blazejosinski piotrmilos - thanks all !


* Bug fixes in the insight server thanks to haukurb !
* Fix weights initialization in 1196 by mikeymezher - thanks !
* Fix Universal Transformer convergence by MostafaDehghani and rllin-fathom in 1194 and 1192  - thanks !
* Fix add problem hparams after parsing the overrides in 1053 thanks gcampax !
* Fixing error of passing wrong dir in 1185 by stefan-falk , thanks !

New Problems:
* Wikipedia Multiproblems by urvashik - thanks !
* New LM problems in de, fr, ro by lukaszkaiser - thanks !

* Continual addition to Model Based RL by piotrmilos , konradczechowski koz4k and blazejosinski !

Video Models:
* Many continual updates thanks to mbz and MechCoder - thanks all !


- MTF code in Tensor2Tensor has been moved to - thanks dustinvtran

New Problems:
- English-Setswana translation problem, thanks jaderabbit

New layers, models, etc:
- Add Bayesian feedforward layer, thanks dustinvtran
- Lots of changes to the RL pipeline, thanks koz4k , blazejosinski , piotrmilos , lukaszkaiser , konradczechowski
- Lots of work on video mdoels, thanks mbz , MechCoder
- Image transformer with local1d and local 2d spatial partitioning, thanks nikiparmar vaswani

- Support DistributionStrategy in Tensor2Tensor for multi-GPU, thanks smit-hinsu !
- Pass data_dir to feature_encoders, thanks stefan-falk
- variable_scope wrapper for avg_checkpoints, thanks Mehrad0711
- Modalities cleanup, thanks dustinvtran
- Avoid NaN while adding sinusoidal timing signals, thanks peakji
- Avoid a ascii codec error in CNN/DailyMail, thanks shahzeb1
- Allow exporting T2T models as tfhub modules, thanks cyfra


PRs accepted:
Cleaning up the code for gru/lstm as transition function for universal transformer. Thanks MostafaDehghani !
Clipwrapper by piotrmilos !
Corrected transformer spelling mistake - Thanks jurasofish!
Fix to universal transformer update weights - Thanks cbockman and cyvius96 !
Common Voice problem fixes and refactoring - Thanks tlatkowski !
Infer observation datatype and shape from the environment - Thanks koz4k !

New Problems / Models:
* Added a simple discrete autoencoder video model. Thanks lukaszkaiser !
* DistributedText2TextProblem, a base class for Text2TextProblem for large-datasets. Thanks afrozenator!
* Stanford Natural Language Inference problem added `StanfordNLI` in []( Thanks urvashik !
* `Text2TextRemotedir` added for problems with a persistent remote directory. Thanks rsepassi !
* Add a separate binary for vocabulary file generation for subclasses of Text2TextProblem. Thanks afrozenator!
* Added support for non-deterministic ATARI modes and sticky keys. Thanks mbz !
* Pretraining schedule added to MultiProblem and reweighting losses. Thanks urvashik !
* `SummarizeWikiPretrainSeqToSeq32k` and `Text2textElmo` added.
* `AutoencoderResidualVAE` added, thanks lukaszkaiser !
* Discriminator changes by lukaszkaiser  and aidangomez
* Allow scheduled sampling in basic video model, simplify default video modality. Thanks lukaszkaiser !

Code Cleanups:
* Use standard vocab naming and fixing translate data generation. Thanks rsepassi !
* Replaced manual ops w/ dot_product_attention in masked_local_attention_1d. Thanks dustinvtran !
* Eager tests! Thanks dustinvtran !
* Separate out a [video/]( directory in models/. Thanks lukaszkaiser !
* Speed up RL test - thanks lukaszkaiser !

Bug Fixes:
* Don't daisy-chain variables in Universal Transformer. Thanks lukaszkaiser !
* Corrections to mixing, dropout and sampling in autoencoders. Thanks lukaszkaiser !
* WSJ parsing only to use 1000 examples for building vocab.
* Fixed scoring crash on empty targets. Thanks David Grangier!
* Bug fix in

Enhancements to MTF, Video Models and much more!


Introducing [**MeshTensorFlow**]( - this enables training really big models O(Billions) of parameters.

* Layers Added: NAC and NALU from Thanks lukaszkaiser !
* Added a [sparse graph neural net message passing layer](( to tensor2tensor.
* Targeted dropout added to ResNet. Thanks aidangomez !
* Added VQA models in `models/research/vqa_*`
* Added [`Weight Normalization`]( layer from

* MSCoCo paraphrase problem added by tlatkowski - many thanks!
* `VideoBairRobotPushingWithActions` by mbz !

* Code cleaup in autoencoder, works both on image and text. Thanks lukaszkaiser
* Set the default value of Text2TextProblem.max_subtoken_length to 200, this prevents very long vocabulary generation times. Thanks afrozenator
* Add examples to, update support for async training, and simplify run_std_server codepath. Thanks rsepassi !
* Store variable scopes in T2TModel; add T2TModel.initialize_from_ckpt. Thanks rsepassi !
* Undeprecate exporting the model from the trainer Thanks gcampax !
* Doc fixes, thanks to stefan-it :)
* Added t2t_prune: simple magnitude-based pruning script for T2T Thanks aidangomez !
* Added task sampling support for more than two tasks. Thanks urvashik !

Bug Fixes:
* Override serving_input_fn for video problems.
* `StackWrapper` eliminates problem with repeating actions. Thanks blazejosinski !
* Calculated lengths of sequences using _raw in
* Update to fix TypeError Thanks zxqchat !

* Serving tests re-enabled on Travis using Docker. Thanks rsepassi !

Many more fixes, tests and work on RL, Glow, SAVP, Video and other models and problems.


* Added a MultiProblem class for Multitask Learning. Thanks urvashik !
* Added decoding option to pass through the features dictionary to predictions. Thanks rsepassi !
* Enabled MLEngine path to use Cloud TPUs. Thanks rsepassi !
* Added a simple One-Hot Symbol modality. Thanks mbz !
* Added Cleverhans integration. Thanks aidangomez !

* Problem definitions added for:
* Allen Brain Atlas problems. Thanks cwbeitel !
* [LSUN Bedrooms]( dataset.
* Added various NLP datasets. Thanks urvashik !
* [MSR Paraphrase Corpus](,
* [Quora Question Pairs](,
* [Stanford Sentiment Treebank](,
* [Question Answering NLI classification problems](,
* [Recognizing Textual Entailment](,
* [Corpus of Linguistic Acceptability](,
* [Winograd NLI](
* Added a data generator for WSJ parsing.

* Model additions:
* Implemented Targeted Dropout for Posthoc Pruning. Thanks aidangomez !
* Added self attention to VQA attention model.
* Added fast block parallel transformer model
* Implemented auxiliary losses from [Stochastic Activation Pruning for Robust Adversarial Defense]( Thanks alexyku !
* Added probability based scheduled sampling for SV2P problem. Thanks mbz !
* Reimplementated Autoencoder and Eval. Thanks piotrmilos !
* Relative memory efficient unmasked self-attention.

* Notable bug fixes:
* bug with data_gen in style transfer problem Thanks tlatkowski !
* wmt_enfr dataset should not use vocabulary based on "small" dataset. Thanks nshazeer !

* **Many more fixes, tests and work on Model based RL, Transfomer, Video and other models and problems.**


* added Mozilla common voice as Problem and style transfer one others!
* improvements to ASR data preprocessing (thanks to jarfo)
* decoding works for Transformer on TPUs and for timeseries problems
* corrections and refactoring of the RL part
* Removed deprecated Experiment API code, and support SessionRunHooks on TPU.
* many other corrections and work on video problems, latent variables and other

Great thanks to everyone!


* `registry.hparams` now returns an `HParams` object instead of a function that returns an `HParams` object
* New `MultistepAdamOptimizer` thanks to fstahlberg
* New video models and problems and improvements to `VideoProblem`
* Added `pylintrc` and lint tests to Travis CI
* Various fixes, improvements, and additions


* `--random_seed` is unset by default now. Set it to an integer value to get reproducible results.
* [bAbI text understanding tasks added](
* Have the ML Engine and TPU codepaths use TF 1.8
* Various cloud-related bug fixes
* `WikisumWeb` data generation fixes
* Various other fixes


* Lambada and wikitext103 datasets.
* ASR model with Transformer and iPython notebook.
* Many other improvements including RL code, autoencoders, the latent transformer (transformer_vae) and more.




* `--problems` command-line flag renamed to `--problem`
* `hparams.problems` renamed to `hparams.problem_hparams` and `hparams.problem_instances` renamed to `hparams.problem` (and neither are lists now)
* Dropped support for TensorFlow 1.4
* Various additions, fixes, etc.


* Distillation codepath added
* Improved support for serving language models
* New `TransformerScorer` model which return log prob of targets on `infer`
* Support for `bfloat16` weights and activations on TPU
* SRU gate added to `common_layers`
* `--checkpoint_path` supported in interactive decoding
* Improved support for multiple outputs
* `VideoProblem` base class
* Various fixes, additions, etc.


* Scalar summary support on TPUs
* New `Squad` and `SquadConcat` problem for question answering (and relevant base class)
* New video problems
* `bfloat16` support for `Transformer` on TPUs
* New `SigmoidClassLabelModality` for binary classification
* Support batch prediction with Cloud ML Engine
* Various fixes, improvements, additions


* Updates to experimental RL codebase
* `ImageTransformer` on TPU
* Various updates, fixes, additions, etc.


* Updates to the RL codebase
* Tests updated to use TensorFlow 1.6
* Various fixes, additions, etc.


* More flexible Cloud ML Engine usage thanks to bbarnes52
* Fixes thanks to stefan-it wes-turner deasuke bwilbertz
* Various other additions, fixes, etc.


**Note**: The `Text2TextProblem` has been refactored so if you have subclassed it you may need to rename some methods. Some vocabulary files may need to be renamed as well.

* `Text2TextProblem`, `Text2ClassProblem` and `Text2SelfProblem` base classes make specifying new text-based problems easy. See [](
* New models and problems, including for image generation and speech-to-text
* Various bug fixes, feature additions, improvements, etc.
* Test model export and serving for Python 2.7 and TensorFlow 1.5
* Update Travis tests to test against TensorFlow version 1.4, 1.5, and 1.6


* TF 1.4 compatibility bug fix for Cloud ML Engine


* Launch training on [Cloud TPUs](
* Launch training and hyperparameter tuning on [Cloud ML Engine](
* New [`models/research`]( subdirectory for more experimental models
* Some documentation updates
* Bug fixes


* Cloud ML Engine support added
* New experimental RL module thanks to piotrmilos
* Various bug fixes, improvements, etc.


**Note**: Tensor2Tensor now requires TensorFlow 1.5.

* Working `t2t-bleu` thanks to martinpopel
* Improvements to image models: `resnet`, `revnet`, and `shake_shake`
* Image problems refactor: faster input pipeline, richer ImageNet data preprocessing. Note that `ImageModality.bottom` no longer normalizes images; that's now done in the input pipeline.
* Improvements for running on Google's Cloud TPUs, coming to you soon...
* Various bug fixes, improvements, and additions


* New [export method]( for exporting to TensorFlow Serving
* [Script for BLEU evaluation]( thanks to martinpopel
* Better TensorBoard metrics (what was removed has returned), with options to summarize gradients (`--hparams='summarize_grads=True'`)
* Various bug fixes, doc updates, new features, as usual


* Scripts in `bin/` are now thin and executable
* Main training utility library moved to [``](


* Support for multi-device evaluation
* Support for early stopping in distributed training
* Refactor Librispeech problem to use a new speech recognition base class


This release is a significant refactor of T2T internals.

* [`T2TModel`]( subclasses now have the ability to override the entire Estimator model function with the `estimator_model_fn` method, making them much more flexible. Subclasses can also now override `bottom`, `body`, `top`, `loss`, and `optimize`.
* [`Problem`]( subclasses now have the ability to override the entire Estimator input function with the `input_fn` method, making them much more flexible.
* The key components of the trainer and decoder - `Experiment`, `Estimator`, `RunConfig`, `HParams` - are all much more easily constructed and used by library callers through [``](
* We decided to drop support for MultiModel, i.e. training on multiple problems, because it added too much code complexity for the benefit gained. We will consider adding support back in a way that doesn't overcomplicate things too much if there's sufficient interest.

There are also the usual new models, feature improvements, bug fixes.

* New `image_fashion_mnist` dataset
* New `revnet104` model, implementing a large [Reversible Residual Network](
* Set `--decode_hparams=write_beam_scores=True` to include beam scores when writing to a file
* Beginnings of new interactive visualization server at [insights/](