- Replaced multi-head attention with [interleaved_matmul_encdec](https://github.com/apache/incubator-mxnet/pull/16408) operators, which removes previously needed transposes and improves performance.
- Beam search states and model layers now assume time-major format.
2.1.26
Fixed
- Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.
2.1.25
Changed
- Reverting PR 772 as it causes issues with `amp`.
2.1.24
Changed
- Make sure to write a final checkpoint when stopping with `--max-updates`, `--max-samples` or `--max-num-epochs`.
2.1.23
Changed
- Updated to [MXNet 1.7.0](https://github.com/apache/incubator-mxnet/tree/1.7.0). - Re-introduced use of softmax with length parameter in DotAttentionCell (see PR 772).
2.1.22
Added
- Re-introduced `--softmax-temperature` flag for `sockeye.score` and `sockeye.translate`.