Changelogs » Tensor2tensor

PyUp Safety actively tracks 268,345 Python packages for vulnerabilities and notifies you when to upgrade.



* New [`RTransformer`]( model, a recurrent Transformer
  * New [English-Estonian translation dataset]( thanks to stefan-it
  * New `ROC_AUC` metric thanks to jjtan
  * Various fixes, improvements, additions, etc.


  * 1788 by mzilinec adding an option to select different TPU zone.
  Some more code cleanups, regarding tf.compat.v1


* Flush out some more contrib remnants.


* Some changes to handle 1.x to 2.x for tf contrib
  * TODO(afrozm): Write more


Some changes needed to be able to import problems with TF 2.0


* Move away from tf.flags to absl-py's flags.
  * Move away from std::string to tensorflow::string


Final T2T major release
  It is now in maintenance mode — we keep it running and welcome bug-fixes, but encourage users to use the successor library [Trax](
  PRs Merged
  - 1724 by Separius - use batch_size in _test_img2img_transformer thanks!
  - 1726 by senarvi  - Fix decoding in prepend mode thanks!
  - 1733 by prasastoadi - En-Id untokenized parallel corpora thanks!
  - 1748 by gabegrand adding a Text2RealProblem class -- thanks a lot gabegrand
  Bug Fixes
  - Fix features and decoding on TPUs by mts42000
  - iansimon and Kristy Choi around shape assertions and modalities
  - superbobry fixed cases where tf.TensorShape was constructed with float dimensions
  - Trax was moved into its own repo:


PRs Merged
  * 1720 thanks przemb
  * 1698 1699 test/util file fixes thanks to Vooblin
  * Fix serving response from Cloud ML Engine (1688) thanks to evalphobia
  * Refine automatic mixed precision support via hyper param (1681) thanks vinhngx
  * correct return shape of rel_pos2abs_pos() (1686) thanks to Separius
  * save attention weights for relative attention v2 (1682) thanks to Ghostvv
  * Update (1674) thanks to TanguyUrvoy
  * Transformer tutorial (1675) many thanks to Styleoshin
  * 4 new dialog problems by ricsinaruto in 1642
  * Extend NeuralStack to support Dequeu by reading/writing in both directions, thanks narphorium
  * Lots of work on SimPLe tuning hyperparameters by koz4k , lukaszkaiser and afrozenator
  * async data collection for RL in TRAX
  * New memory efficient Transformer using Reversible layers, thanks to Nikita Kitaev, lukaszkaiser and Anselm Levskaya
  * Losses and metrics are layers now in trax, thanks to lukaszkaiser
  * Activations in TRAX thanks to joaogui1 in 1684 and 1666


Models / Layers:
  * NeuralStack and NeuralQueue added, in - thanks narphorium !
  * Open Sourcing the Search Space used in EvolvedTransformer -
  * Masked local n-D attention added in -
  * Add English-Spanish translation problem (1626) thanks voluntadpear !
  * MovingMNist added in thanks MechCoder !
  Bug Fixes:
  * Loss twice multiplied with loss_coef (1627) by davidmrau - thanks a lot David!
  * Fix log_prob accumulation during decoding, thanks lmthang !
  * Fixed high usage of TPU HBM "Arguments" during serving
  in thanks ziy !
  * Should not generate summary during decoding in dot_product_relative_atention (1618) thanks phamthuonghai !
  Misc changes:
  * Implement sequence packing as a transformation - thanks robieta !
  * Lots of work on t2t_distill and model exporting by ziy - thanks ziy !
  Introduce Rainbow. (1607) by konradczechowski in 1607
  Changes to MBRL by konradczechowski , koz4k in multiple PRs.
  * Adding automatic mixed precision support (1637) thanks a lot to vinhngx !
  * Documentation for creating own model 1589 thanks hbrylkowski !
  * Adding extra linear to semantic hashing discretization bottleneck. 1578 thanks martiansideofthemoon !
  * Using partial targets at inference time. (1596) thanks EugKar !
  * Updated link to DeepMind Math dataset (1583) thanks MaxSobolMark !
  * Only strip end of line (1577) thanks funtion !
  * correct typo in add_timing_signal_nd (1651) many thanks to Separius !
  * fix decode bug (1645) many thanks to dong-s !
  * Change confusing function name (1669) thanks lazylife7157 !
  * Forked optimizers from JAX and make them objects in
  * Trax layers are now stateful and support custom gradients.
  * Multi-device capability added.
  * Memory efficient trainer added in ! Thanks Nikita Kitaev!
  * Adafactor optimizer added in TRAX -
  * Demo Colab added in thanks levskaya
  * Demo colab for trax layers -
  * Transformer, TransformerLM, [Reversible Transformer](, PositionLookupTransformer and Resnet50 are some of the models that TRAX now supports.
  * Many PPO changes to be able to work on Atari.
  * Distributed PPO where the envs can run in multiple parallel machines using gRPC
  * SimulatedEnvProblem by koz4k - a gym env that simulates a step taken by a trainer of a Neural Network in
  * Implement SerializedSequenceSimulatedEnvProblem
  by koz4k
  * Transformer can be used as a policy now, thanks to koz4k in !


Minor fix to 1.13.3, please see release notes there.


TODO(afrozm): Document more.
  * Various PRs.
  * Development on TRAX


* jax, jaxlib moved to extras in
  fixed get_standardized_layers spelling, thanks cbockman in 1529
  serving utils fixes - Thanks Drunkar ! in 1495
  Fixing a checkpoint name bug in 1487, thanks lzhang10
  * [DeepMind Math dataset](
  * [VideoGlow paper added to T2T Papers.](
  * [Mixture Transformer](
  * A very basic PPO implementation in TRAX.
  * More TRAX and RL changes.
  [Correct flat CIFAR modality to not consider 0 as padding](


Bug Fixes:
  * RL fixes for Model Based RL in 1505 - thanks koz4k
  * Serving util corrections in 1495 by Drunkar -- thanks!
  * Fix step size extraction in checkpoints by lzhang10 in 1487 -- thanks!


** Modalities refactor: Thanks to Dustin, all modalities are now an enum and just functions, making it easier to understand what's happening in the model. Thanks Dustin!
  **[Model-Based Reinforcement Learning for Atari](** using T2T, please find a nice writeup in at -- thanks a lot to all the authors! lukaszkaiser mbz piotrmilos blazejosinski Roy Campbell konradczechowski doomie Chelsea Finn koz4k Sergey Levine rsepassi George Tucker and henrykmichalewski !
  **[TRAX]( = T2T + [JAX]**( - please try out and give us feedback at 1478
  New Models:
  * Evolved Transformer, thanks stefan-it for adding the paper in 1426
  * textCNN model by ybbaigo in 1421
  Documentation and Logging:
  * MultiProblem by cwbeitel in 1399
  * ML Enginge logging in 1390 by lgeiger
  Thanks again cwbeitel and lgeiger -- good docs and logging goes a long way for understandability.
  Bugs fixed:
  * t2t_decoder checkpoint fix in 1471 by wanqizhu
  * xrange fix for py3 by in 1468 lgeiger
  * Fixing COCO dataset in 1466 by hbrylkowski
  * Fix math problems by artitw
  * Decoding rev problems enzh by googlehjx on 1389
  * And honourable mentions to qixiuai , 1440
  Many many thanks wanqizhu lgeiger hbrylkowski artitw googlehjx and qixiuai for finding and fixing these and sorry for missing anyone else -- this is really really helpful.
  Code Cleanups:
  * Registry refactor and optimizer registry by jackd in 1410 and 1401
  * Numerous very nice cleanup PRs ex: 1454 1451 1446 1444 1424 1411 1350 by lgeiger
  Many thanks for the cleanups jackd and lgeiger -- and sorry if I missed anyone else.
  Summary of changes:
  * A lot of code cleanup thanks a ton to lgeiger ! This goes a long way with regards to code maintainability and is much appreciated. Ex: PR 1361 , 1350 , 1344 , 1346 , 1345 , 1324
  * Fixing LM decode, thanks mikeymezher - PR 1282
  * More fast decoding by gcampax, thanks! - PR 999
  * Avoid error on beam search - PR 1302 by aeloyq , thanks!
  * Fix invalid list comprehension, unicode simplifications, py3 fixes 1343, 1318 , 1321, 1258   thanks cclauss !
  * Fix is_generate_per_split hard to spot bug, thanks a lot to kngxscn in PR 1322
  * Fix py3 compatibility issues in PR 1300 by ywkim , thanks a lot again!
  * Separate train and test data in MRPC and fix broken link in PR 1281 and 1247  by ywkim - thanks for the hawk eyed change!
  * Fix universal transformer decoding by artitw in PR 1257
  * Fix babi generator by artitw in PR 1235
  * Fix transformer moe in 1233 by twilightdema - thanks!
  * Universal Transformer bugs corrected in 1213 by cfiken - thanks!
  * Change beam decoder stopping condition, makes decode faster in 965 by mirkobronzi - many thanks!
  * Bug fix, problem_0_steps variable by senarvi in 1273
  * Fixing a typo, by hsm207 in PR 1329 , thanks a lot!
  New Model and Problems:
  * New problem and model by artitw in PR 1290 - thanks!
  * New model for scalar regression in PR 1332 thanks to Kotober
  * Text CNN for classification in PR 1271 by ybbaigo - thanks a lot!
  * en-ro translation by lukaszkaiser !
  * CoNLL2002 Named Entity Recognition problem added in 1253 by ybbaigo - thanks!
  New Metrics:
  * Pearson Correlation metrics in 1274 by luffy06 - thanks a lot!
  * Custom evaluation metrics, this was one of the most asked features, thanks a lot ywkim in PR 1336
  * Word Error Rate metric by stefan-falk in PR 1242 , many thanks!
  * SARI score for paraphrasing added.
  * Fast decoding !! Huge thanks to aeloyq in 1295
  * Fast GELU unit
  * Relative dot product visualization PR 1303 thanks aeloyq !
  * New MTF models and enhacements, thanks to Noam, Niki and the MTF team
  * Custom eval hooks in PR 1284 by theorm - thanks a lot !
  Lots of commits to Model Based Reinforcement Learning code by konradczechowski koz4k blazejosinski piotrmilos - thanks all !


  * Bug fixes in the insight server thanks to haukurb !
  * Fix weights initialization in 1196 by mikeymezher - thanks !
  * Fix Universal Transformer convergence by MostafaDehghani and rllin-fathom in 1194 and 1192  - thanks !
  * Fix add problem hparams after parsing the overrides in 1053 thanks gcampax !
  * Fixing error of passing wrong dir in 1185 by stefan-falk , thanks !
  New Problems:
  * Wikipedia Multiproblems by urvashik - thanks !
  * New LM problems in de, fr, ro by lukaszkaiser - thanks !
  * Continual addition to Model Based RL by piotrmilos , konradczechowski koz4k and blazejosinski !
  Video Models:
  * Many continual updates thanks to mbz and MechCoder - thanks all !


  - MTF code in Tensor2Tensor has been moved to - thanks dustinvtran
  New Problems:
  - English-Setswana translation problem, thanks jaderabbit
  New layers, models, etc:
  - Add Bayesian feedforward layer, thanks dustinvtran
  - Lots of changes to the RL pipeline, thanks koz4k , blazejosinski , piotrmilos , lukaszkaiser , konradczechowski
  - Lots of work on video mdoels, thanks mbz , MechCoder
  - Image transformer with local1d and local 2d spatial partitioning, thanks nikiparmar vaswani
  - Support DistributionStrategy in Tensor2Tensor for multi-GPU, thanks smit-hinsu !
  - Pass data_dir to feature_encoders, thanks stefan-falk
  - variable_scope wrapper for avg_checkpoints, thanks Mehrad0711
  - Modalities cleanup, thanks dustinvtran
  - Avoid NaN while adding sinusoidal timing signals, thanks peakji
  - Avoid a ascii codec error in CNN/DailyMail, thanks shahzeb1
  - Allow exporting T2T models as tfhub modules, thanks cyfra


PRs accepted:
  Cleaning up the code for gru/lstm as transition function for universal transformer. Thanks MostafaDehghani !
  Clipwrapper by piotrmilos !
  Corrected transformer spelling mistake - Thanks jurasofish!
  Fix to universal transformer update weights - Thanks cbockman and cyvius96 !
  Common Voice problem fixes and refactoring - Thanks tlatkowski !
  Infer observation datatype and shape from the environment - Thanks koz4k !
  New Problems / Models:
  * Added a simple discrete autoencoder video model. Thanks lukaszkaiser !
  * DistributedText2TextProblem, a base class for Text2TextProblem for large-datasets. Thanks afrozenator!
  * Stanford Natural Language Inference problem added `StanfordNLI` in []( Thanks urvashik !
  * `Text2TextRemotedir` added for problems with a persistent remote directory. Thanks rsepassi !
  * Add a separate binary for vocabulary file generation for subclasses of Text2TextProblem. Thanks afrozenator!
  * Added support for non-deterministic ATARI modes and sticky keys. Thanks mbz !
  * Pretraining schedule added to MultiProblem and reweighting losses. Thanks urvashik !
  * `SummarizeWikiPretrainSeqToSeq32k` and `Text2textElmo` added.
  * `AutoencoderResidualVAE` added, thanks lukaszkaiser !
  * Discriminator changes by lukaszkaiser  and aidangomez
  * Allow scheduled sampling in basic video model, simplify default video modality. Thanks lukaszkaiser !
  Code Cleanups:
  * Use standard vocab naming and fixing translate data generation. Thanks rsepassi !
  * Replaced manual ops w/ dot_product_attention in masked_local_attention_1d. Thanks dustinvtran !
  * Eager tests! Thanks dustinvtran !
  * Separate out a [video/]( directory in models/. Thanks lukaszkaiser !
  * Speed up RL test - thanks lukaszkaiser !
  Bug Fixes:
  * Don't daisy-chain variables in Universal Transformer. Thanks lukaszkaiser !
  * Corrections to mixing, dropout and sampling in autoencoders. Thanks lukaszkaiser !
  * WSJ parsing only to use 1000 examples for building vocab.
  * Fixed scoring crash on empty targets. Thanks David Grangier!
  * Bug fix in
  Enhancements to MTF, Video Models and much more!


Introducing [**MeshTensorFlow**]( - this enables training really big models O(Billions) of parameters.
  * Layers Added: NAC and NALU from Thanks lukaszkaiser !
  * Added a [sparse graph neural net message passing layer](( to tensor2tensor.
  * Targeted dropout added to ResNet. Thanks aidangomez !
  * Added VQA models in `models/research/vqa_*`
  * Added [`Weight Normalization`]( layer from
  * MSCoCo paraphrase problem added by tlatkowski - many thanks!
  * `VideoBairRobotPushingWithActions` by mbz !
  * Code cleaup in autoencoder, works both on image and text. Thanks lukaszkaiser
  * Set the default value of Text2TextProblem.max_subtoken_length to 200, this prevents very long vocabulary generation times. Thanks afrozenator
  * Add examples to, update support for async training, and simplify run_std_server codepath. Thanks rsepassi !
  * Store variable scopes in T2TModel; add T2TModel.initialize_from_ckpt. Thanks rsepassi !
  * Undeprecate exporting the model from the trainer Thanks gcampax !
  * Doc fixes, thanks to stefan-it :)
  * Added t2t_prune: simple magnitude-based pruning script for T2T Thanks aidangomez !
  * Added task sampling support for more than two tasks. Thanks urvashik !
  Bug Fixes:
  * Override serving_input_fn for video problems.
  * `StackWrapper` eliminates problem with repeating actions. Thanks blazejosinski !
  * Calculated lengths of sequences using _raw in
  * Update to fix TypeError Thanks zxqchat !
  * Serving tests re-enabled on Travis using Docker. Thanks rsepassi !
  Many more fixes, tests and work on RL, Glow, SAVP, Video and other models and problems.


* Added a MultiProblem class for Multitask Learning. Thanks urvashik !
  * Added decoding option to pass through the features dictionary to predictions. Thanks rsepassi !
  * Enabled MLEngine path to use Cloud TPUs. Thanks rsepassi !
  * Added a simple One-Hot Symbol modality. Thanks mbz !
  * Added Cleverhans integration. Thanks aidangomez !
  * Problem definitions added for:
  * Allen Brain Atlas problems. Thanks cwbeitel !
  * [LSUN Bedrooms]( dataset.
  * Added various NLP datasets. Thanks urvashik !
  * [MSR Paraphrase Corpus](,
  * [Quora Question Pairs](,
  * [Stanford Sentiment Treebank](,
  * [Question Answering NLI classification problems](,
  * [Recognizing Textual Entailment](,
  * [Corpus of Linguistic Acceptability](,
  * [Winograd NLI](
  * Added a data generator for WSJ parsing.
  * Model additions:
  * Implemented Targeted Dropout for Posthoc Pruning. Thanks aidangomez !
  * Added self attention to VQA attention model.
  * Added fast block parallel transformer model
  * Implemented auxiliary losses from [Stochastic Activation Pruning for Robust Adversarial Defense]( Thanks alexyku !
  * Added probability based scheduled sampling for SV2P problem. Thanks mbz !
  * Reimplementated Autoencoder and Eval. Thanks piotrmilos !
  * Relative memory efficient unmasked self-attention.
  * Notable bug fixes:
  * bug with data_gen in style transfer problem Thanks tlatkowski !
  * wmt_enfr dataset should not use vocabulary based on "small" dataset. Thanks nshazeer !
  * **Many more fixes, tests and work on Model based RL, Transfomer, Video and other models and problems.**


* added Mozilla common voice as Problem and style transfer one others!
  * improvements to ASR data preprocessing (thanks to jarfo)
  * decoding works for Transformer on TPUs and for timeseries problems
  * corrections and refactoring of the RL part
  * Removed deprecated Experiment API code, and support SessionRunHooks on TPU.
  * many other corrections and work on video problems, latent variables and other
  Great thanks to everyone!


* `registry.hparams` now returns an `HParams` object instead of a function that returns an `HParams` object
  * New `MultistepAdamOptimizer` thanks to fstahlberg
  * New video models and problems and improvements to `VideoProblem`
  * Added `pylintrc` and lint tests to Travis CI
  * Various fixes, improvements, and additions


* `--random_seed` is unset by default now. Set it to an integer value to get reproducible results.
  * [bAbI text understanding tasks added](
  * Have the ML Engine and TPU codepaths use TF 1.8
  * Various cloud-related bug fixes
  * `WikisumWeb` data generation fixes
  * Various other fixes


* Lambada and wikitext103 datasets.
  * ASR model with Transformer and iPython notebook.
  * Many other improvements including RL code, autoencoders, the latent transformer (transformer_vae) and more.




* `--problems` command-line flag renamed to `--problem`
  * `hparams.problems` renamed to `hparams.problem_hparams` and `hparams.problem_instances` renamed to `hparams.problem` (and neither are lists now)
  * Dropped support for TensorFlow 1.4
  * Various additions, fixes, etc.


* Distillation codepath added
  * Improved support for serving language models
  * New `TransformerScorer` model which return log prob of targets on `infer`
  * Support for `bfloat16` weights and activations on TPU
  * SRU gate added to `common_layers`
  * `--checkpoint_path` supported in interactive decoding
  * Improved support for multiple outputs
  * `VideoProblem` base class
  * Various fixes, additions, etc.


* Scalar summary support on TPUs
  * New `Squad` and `SquadConcat` problem for question answering (and relevant base class)
  * New video problems
  * `bfloat16` support for `Transformer` on TPUs
  * New `SigmoidClassLabelModality` for binary classification
  * Support batch prediction with Cloud ML Engine
  * Various fixes, improvements, additions


* Updates to experimental RL codebase
  * `ImageTransformer` on TPU
  * Various updates, fixes, additions, etc.


* Updates to the RL codebase
  * Tests updated to use TensorFlow 1.6
  * Various fixes, additions, etc.


* More flexible Cloud ML Engine usage thanks to bbarnes52
  * Fixes thanks to stefan-it wes-turner deasuke bwilbertz
  * Various other additions, fixes, etc.