Nmtpytorch

Latest version: v4.0.0

Safety actively analyzes 619345 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

4.0.0

- **Critical**: `NumpyDataset` now returns tensors of shape `HxW, N, C` for 3D/4D convolutional features, `1, N, C` for 2D feature files. Models should be adjusted to adapt to this new shaping.
- An `order_file` per split (`ord: path/to/txt file with integer per line`) can be given from the configurations to change the feature order of numpy tensors to flexibly revert, shuffle, tile, etc. them.
- Better dimension checking to ensure that everything is OK.
- Added `LabelDataset` for single label input/outputs with associated `Vocabulary` for integer mapping.
- Added `handle_oom=(True|False)` argument for `[train]` section to recover from **GPU out-of-memory (OOM)** errors during training. This is disabled by default, you need to enable it from the experiment configuration file. Note that it is still possible to get an OOM during validation perplexity computation. If you hit that, reduce the `eval_batch_size` parameter.
- Added `de-hyphen` post-processing filter to stitch back the aggressive hyphen splitting of Moses during early-stopping evaluations.
- Added optional projection layer and layer normalization to `TextEncoder`.
- Added `enc_lnorm, sched_sampling` options to `NMT` to enable layer normalization for encoder and use **scheduled sampling** at a given probability.
- `ConditionalDecoder` can now be initialized with max-pooled encoder states or the last state as well.
- You can now experiment with different decoders for `NMT` by changing the `dec_variant` option.
- Collect all attention weights in `self.history` dictionary of the decoders.
- Added **n-best** output to `nmtpy translate` with the argument `-N`.
- Changed the way `-S` works for `nmtpy translate`. Now you need to give the split name with `-s` all the time but `-S` is used to override the input data sources defined for that split in the configuration file.
- Removed decoder-initialized multimodal NMT `MNMTDecInit`. Same functionality exists within the `NMT` model by using the model option `dec_init=feats`.
- **New model MultimodalNMT:** that supports encoder initialization, decoder initialization, both, concatenation of embeddings with visual features, prepending and appending. This model covers almost all the models from [LIUM-CVC's WMT17 multimodal systems](https://arxiv.org/abs/1707.04481) except the multiplicative interaction variants such as `trgmul`.
- **New model MultimodalASR:** encoder-decoder initialized ASR model. See the [paper](https://arxiv.org/abs/1811.03865)
- **New Model AttentiveCaptioning:** Similar but not an exact reproduction of show-attend-and-tell, it uses feature files instead of raw images.
- **New model AttentiveMNMTFeaturesFA:** [LIUM-CVC's WMT18 multimodal system](https://arxiv.org/abs/1809.00151) i.e. filtered attention
- **New (experimental) model NLI:** A simple LSTM-based NLI baseline for [SNLI](https://nlp.stanford.edu/projects/snli/) dataset:
- `direction` should be defined as `direction: pre:Text, hyp:Text -> lb:Label`
- `pre, hyp` and `lb` keys point to plain text files with one sentence per line. A vocabulary should be constructed even for the labels to fit the nmtpy architecture.
- `acc` should be added to `eval_metrics` to compute accuracy.

3.0.0

Major release that brings support for **Pytorch 0.4** and drops support for **0.3**.

Training and testing on **CPUs** are now supported thanks to easier device
semantics of Pytorch 0.4: just give `-d cpu` to `nmtpy` to switch to CPU mode.
NOTE: Training on CPUs is only logical for debugging, otherwise it's very slow.
- NOTE: `device_id` is no longer a configuration option. It should be removed
from your old configurations.
- Multi-GPU is not supported. Always restrict to single GPU using
`CUDA_VISIBLE_DEVICES` environment variable.

You can now override the config options used to train a model during
inference: Example: `nmtpy translate (...) -x model.att_temp:0.9`

`nmtpy train` now detects invalid/old `[train]` options and refuses to
train the model.

**New sampler:** `ApproximateBucketBatchSampler`
Similar to the default `BucketBatchSampler` but more efficient for sparsely
distributed sequence-lengths as in speech recognition. It bins similar-length
items to buckets. It no longer guarantees that the batches are completely
made of same-length sequences so **care has to be taken in the encoders**
to support packing/padding/masking. `TextEncoder` already does this automatically
while speech encoder `BiLSTMp` does not care.

**EXPERIMENTAL**: You can decode an ASR system using the approximate sampler
although the model does not take care of the padded positions (a warning
will be printed at each batch).
The loss is 0.2% WER for a specific dataset that we tried. So although the computations
in the encoder becomes noisy and not totally correct, the model can handle
this noise quite robustly:

`$ nmtpy translate -s val -o hyp -x model.sampler_type:approximate best_asr.ckpt`

This type of batching cuts ASR decoding time almost by a factor of 2-3.

Other changes
- Vocabularies generated by `nmtpy-build-vocab` now contains frequency
information as well. The code is backward-compatible with old vocab files.
- `Batch` objects should now be explicitly moved to the allocated device
using `.device()` method. See `mainloop.py` and `test_performance()` from
the `NMT` model.
- Training no longer shows the cached GPU allocation from `nvidia-smi` output
as it was in the end a hacky thing to call `nvidia-smi` periodically. We
plan to use `torch.cuda.*` to get an estimate on memory consumption.
- NOTE: Multi-process data loading is temporarily disabled as it was
crashing from time to time so `num_workers > 0` does not have an effect
in this release.
- `Attention` is separated into `DotAttention` and `MLPAttention` and a
convenience function `get_attention()` is provided to select between them
during model construction.
- `get_activation_fn()` should be used to select between non-linearities
dynamically instead of doing `getattr(nn.functional, activ)`. The latter
will not work for `tanh` and `sigmoid` in the next Pytorch releases.
- Simplification: `ASR` model is now derived from `NMT`.

2.0.0

- Ability to install through `pip`.
- Advanced layers are now organized into subfolders.
- New basic layers: Convolution over sequence, MaxMargin.
- New attention layers: Co-attention, multi-head attention, hierarchical attention.
- New encoders: Arbitrary sequence-of-vectors encoder, BiLSTMp speech feature encoder.
- New decoders: Multi-source decoder, switching decoder, vector decoder.
- New datasets: Kaldi dataset (.ark/.scp reader), Shelve dataset, Numpy sequence dataset.
- Added learning rate annealing: See `lr_decay*` options in `config.py`.
- Removed subword-nmt and METEOR files from repository. We now depend on
the PIP package for subword-nmt. For METEOR, `nmtpy-install-extra` should
be launched after installation.
- More multi-task and multi-input/output `translate` and `training` regimes.
- New early-stopping metrics: Character and word error rate (cer,wer) and ROUGE (rouge).
- Curriculum learning option for the `BucketBatchSampler`, i.e. length-ordered batches.
- New models:
- ASR: Listen-attend-and-spell like automatic speech recognition
- Multitask*: Experimental multi-tasking & scheduling between many inputs/outputs.

1.4.0

- Add `environment.yml` for easy installation using `conda`. You can now
create a ready-to-use `conda` environment by just calling `conda env create -f environment.yml`.
- Make `NumpyDataset` memory efficient by keeping `float16` arrays as they are
until batch creation time.
- Rename `Multi30kRawDataset` to `Multi30kDataset` which now supports both
raw image files and pre-extracted visual features file stored as `.npy`.
- Add CNN feature extraction script under `scripts/`.
- Add doubly stochastic attention to `ShowAttendAndTell` and multimodal NMT.
- New model `MNMTDecinit` to initialize decoder with auxiliary features.
- New model `AMNMTFeatures` which is the attentive MMT but with features file
instead of end-to-end feature extraction which was memory hungry.

1.3.2

- Updates to `ShowAttendAndTell` model.

1.3.1

- Removed old `Multi30kDataset`.
- Sort batches by source sequence length instead of target.
- Fix `ShowAttendAndTell` model. It should now work.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.