Changelogs » Torch

Torch

7.5

- Add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities.

As always, links to our binaries are on http://pytorch.org

New features
- Add [Cosine Annealing Learning Rate Scheduler](http://pytorch.org/docs/0.3.1/optim.htmltorch.optim.lr_scheduler.CosineAnnealingLR)   https://github.com/pytorch/pytorch/pull/3311
- add `reduce` argument to `PoissonNLLLoss` to be able to compute unreduced losses   https://github.com/pytorch/pytorch/pull/3770
- Allow `target.requires_grad=True` in `l1_loss` and `mse_loss` (compute loss wrt `target`)   https://github.com/pytorch/pytorch/pull/3876
- Add [`random_split`](https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataset.pyL105-L113) that randomly splits a dataset into non-overlapping new datasets of given lengths https://github.com/pytorch/pytorch/pull/4435
- Introduced scopes to annotate ONNX graphs to have better [TensorBoard visualization of models](https://github.com/lanpa/tensorboard-pytorch) https://github.com/pytorch/pytorch/pull/5153
Allow `map_location` in `torch.load` to be a string, such as `map_location='cpu'` or `map_location='cuda:2'`   https://github.com/pytorch/pytorch/pull/4203

Bug Fixes

Data Loader / Datasets / Multiprocessing
- Made DataLoader workers more verbose on bus error and segfault. Additionally, add a `timeout` option to the DataLoader, which will error if sample loading time exceeds the given value. https://github.com/pytorch/pytorch/pull/3474
- DataLoader workers used to all have the same random number generator (RNG) seed because of the semantics of `fork` syscall. Now, each worker will have it's RNG seed set to `base_seed + worker_id` where `base_seed` is a random int64 value generated by the parent process. You may use `torch.initial_seed()` to access this value in `worker_init_fn`, which can be used to set other seeds (e.g. NumPy) before data loading. `worker_init_fn` is an optional argument that will be called on each worker subprocess with the worker id as input, after seeding and before data loading https://github.com/pytorch/pytorch/pull/4018
- Add additional signal handling in DataLoader worker processes when workers abruptly die.
- Negative value for n_workers now gives a ValueError https://github.com/pytorch/pytorch/pull/4019
- fixed a typo in `ConcatDataset.cumulative_sizes` attribute name https://github.com/pytorch/pytorch/pull/3534
- Accept longs in default_collate for dataloader in python 2   https://github.com/pytorch/pytorch/pull/4001
- Re-initialize autograd engine in child processes   https://github.com/pytorch/pytorch/pull/4158
- Fix distributed dataloader so it pins memory to current GPU not GPU 0.   https://github.com/pytorch/pytorch/pull/4196

CUDA / CuDNN
- allow cudnn for fp16 batch norm https://github.com/pytorch/pytorch/pull/4021
- Use `enabled` argument in `torch.autograd.profiler.emit_nvtx` (was being ignored)   https://github.com/pytorch/pytorch/pull/4032
- Fix cuBLAS arguments for fp16 `torch.dot` https://github.com/pytorch/pytorch/pull/3660
- Fix CUDA index_fill_ boundary check with small tensor size   https://github.com/pytorch/pytorch/pull/3953
- Fix CUDA Multinomial checks   https://github.com/pytorch/pytorch/pull/4009
- Fix CUDA version typo in warning  https://github.com/pytorch/pytorch/pull/4175
- Initialize cuda before setting cuda tensor types as default   https://github.com/pytorch/pytorch/pull/4788
- Add missing lazy_init in cuda python module   https://github.com/pytorch/pytorch/pull/4907
- Lazy init order in set device, should not be called in getDevCount   https://github.com/pytorch/pytorch/pull/4918
- Make torch.cuda.empty_cache() a no-op when cuda is not initialized   https://github.com/pytorch/pytorch/pull/4936

CPU
- Assert MKL ld* conditions for ger, gemm, and gemv   https://github.com/pytorch/pytorch/pull/4056

torch operators
- Fix `tensor.repeat` when the underlying storage is not owned by `torch` (for example, coming from numpy)   https://github.com/pytorch/pytorch/pull/4084
- Add proper shape checking to torch.cat   https://github.com/pytorch/pytorch/pull/4087
- Add check for slice shape match in index_copy_ and index_add_.   https://github.com/pytorch/pytorch/pull/4342
- Fix use after free when advanced indexing tensors with tensors   https://github.com/pytorch/pytorch/pull/4559
- Fix triu and tril for zero-strided inputs on gpu   https://github.com/pytorch/pytorch/pull/4962
- Fix blas addmm (gemm) condition check   https://github.com/pytorch/pytorch/pull/5048
- Fix topk work size computation   https://github.com/pytorch/pytorch/pull/5053
- Fix reduction functions to respect the stride of the output   https://github.com/pytorch/pytorch/pull/4995
- Improve float precision stability of `linspace` op, fix 4419.   https://github.com/pytorch/pytorch/pull/4470

autograd
- Fix python gc race condition with THPVariable_traverse   https://github.com/pytorch/pytorch/pull/4437

nn layers
- Fix padding_idx getting ignored in backward for Embedding(sparse=True)   https://github.com/pytorch/pytorch/pull/3842
Fix cosine_similarity's output shape   https://github.com/pytorch/pytorch/pull/3811
- Add rnn args check   https://github.com/pytorch/pytorch/pull/3925
- NLLLoss works for arbitrary dimensions https://github.com/pytorch/pytorch/pull/4654
- More strict shape check on Conv operators https://github.com/pytorch/pytorch/pull/4637
- Fix maxpool3d / avgpool3d crashes   https://github.com/pytorch/pytorch/pull/5052
- Fix setting using running stats in InstanceNorm*d   https://github.com/pytorch/pytorch/pull/4444

Multi-GPU
- Fix DataParallel scattering for empty lists / dicts / tuples   https://github.com/pytorch/pytorch/pull/3769
- Fix refcycles in DataParallel scatter and gather (fix elevated memory usage)  https://github.com/pytorch/pytorch/pull/4988
- Broadcast output requires_grad only if corresponding input requires_grad   https://github.com/pytorch/pytorch/pull/5061

core
- Remove hard file offset reset in load()   https://github.com/pytorch/pytorch/pull/3695
- Have __sizeof__ account for size of stored elements   https://github.com/pytorch/pytorch/pull/3821
- Fix undefined FileNotFoundError   https://github.com/pytorch/pytorch/pull/4384
- make torch.set_num_threads also set MKL threads (take 2)   https://github.com/pytorch/pytorch/pull/5002

others
- Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2   https://github.com/pytorch/pytorch/pull/4656

Performance improvements
- slightly simplified math in IndexToOffset   https://github.com/pytorch/pytorch/pull/4040
- improve performance of maxpooling backwards   https://github.com/pytorch/pytorch/pull/4106
- Add cublas batched gemm support.   https://github.com/pytorch/pytorch/pull/4151
- Rearrange dimensions for pointwise operations for better performance.   https://github.com/pytorch/pytorch/pull/4174
- Improve memory access patterns for index operations.   https://github.com/pytorch/pytorch/pull/4493
- Improve CUDA softmax performance   https://github.com/pytorch/pytorch/pull/4973
- Fixed double memory accesses of several pointwise operations.   https://github.com/pytorch/pytorch/pull/5068

Documentation and UX Improvements
- Better error messages for blas ops with cuda.LongTensor   https://github.com/pytorch/pytorch/pull/4160
- Add missing trtrs, orgqr, ormqr docs   https://github.com/pytorch/pytorch/pull/3720
- change doc for Adaptive Pooling   https://github.com/pytorch/pytorch/pull/3746
- Fix MultiLabelMarginLoss docs   https://github.com/pytorch/pytorch/pull/3836
- More docs for Conv1d Conv2d   https://github.com/pytorch/pytorch/pull/3870
- Improve Tensor.scatter_ doc   https://github.com/pytorch/pytorch/pull/3937
- [docs] rnn.py: Note zero defaults for hidden state/cell   https://github.com/pytorch/pytorch/pull/3951
- Improve Tensor.new doc   https://github.com/pytorch/pytorch/pull/3954
- Improve docs for torch and torch.Tensor   https://github.com/pytorch/pytorch/pull/3969
- Added explicit tuple dimensions to doc for Conv1d.   https://github.com/pytorch/pytorch/pull/4136
- Improve svd doc   https://github.com/pytorch/pytorch/pull/4155
- Correct instancenorm input size   https://github.com/pytorch/pytorch/pull/4171
- Fix StepLR example docs   https://github.com/pytorch/pytorch/pull/4478

5.0

>>> sum = torch.tensor([2, 3]).sum()
>>> sum
tensor(5)
>>> sum.size()
torch.Size([])


Accumulating losses

Consider the widely used pattern ``total_loss += loss.data[0]`` before 0.4.0. ``loss`` was a ``Variable`` wrapping a tensor of size ``(1,)``, but in 0.4.0 ``loss`` is now a scalar and has ``0`` dimensions. Indexing into a scalar doesn't make sense (it gives a warning now, but will be a hard error in 0.5.0): use ``loss.item()`` to get the Python number from a scalar.

Note that if you don't convert to a Python number when accumulating losses, you may find increased memory usage in your program. This is because the right-hand-side of the above expression used to be a Python float, while it is now a zero-dim Tensor.  The total loss is thus accumulating Tensors and their gradient history, which may keep around large autograd graphs for much longer than necessary.


Deprecation of ``volatile`` flag

The ``volatile`` flag is now deprecated and has no effect. Previously, any computation that involves a ``Variable`` with ``volatile=True`` won't be tracked by ``autograd``. This has now been replaced by [a set of more flexible context managers](http://pytorch.org/docs/0.4.0/torch.htmllocally-disabling-gradient-computation) including ``torch.no_grad()``, ``torch.set_grad_enabled(grad_mode)``, and others.

python
>>> x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
>>>
>>> is_train = False
>>> with torch.set_grad_enabled(is_train):
...     y = x * 2
>>> y.requires_grad
False
>>> torch.set_grad_enabled(True)   this can also be used as a function
>>> y = x * 2
>>> y.requires_grad
True
>>> torch.set_grad_enabled(False)
>>> y = x * 2
>>> y.requires_grad
False



[``dtypes``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.dtype), [``devices``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.device) and NumPy-style creation functions

In previous versions of PyTorch, we used to specify data type (e.g. float vs double), device type (cpu vs cuda) and layout (dense vs sparse) together as a "tensor type". For example, ``torch.cuda.sparse.DoubleTensor`` was the ``Tensor`` type respresenting``double`` data type, living on CUDA devices, and with [COO sparse tensor layout](https://en.wikipedia.org/wiki/Sparse_matrixCoordinate_list_(COO)).

In this release, we introduce [``torch.dtype``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.dtype), [``torch.device``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.device) and [``torch.layout``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.layout) classes to allow better management of these properties via NumPy-style creation functions.

[``torch.dtype``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.dtype)

Below is a complete list of available [``torch.dtype``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.dtype)s (data types) and their corresponding tensor types.

| Data type                 | ``torch.dtype``                        | Tensor types              |
|:------------------------- |:-------------------------------------- | :------------------------ |
| 32-bit floating point     | ``torch.float32`` or ``torch.float``   | ``torch.*.FloatTensor``   |
| 64-bit floating point     | ``torch.float64`` or ``torch.double``  | ``torch.*.DoubleTensor``  |
| 16-bit floating point     | ``torch.float16`` or ``torch.half``    | ``torch.*.HalfTensor``    |
| 8-bit integer (unsigned)  | ``torch.uint8``                        | ``torch.*.ByteTensor``    |
| 8-bit integer (signed)    | ``torch.int8``                         | ``torch.*.CharTensor``    |
| 16-bit integer (signed)   | ``torch.int16``   or ``torch.short``   | ``torch.*.ShortTensor``   |
| 32-bit integer (signed)   | ``torch.int32``   or ``torch.int``     | ``torch.*.IntTensor``     |
| 64-bit integer (signed)   | ``torch.int64``   or ``torch.long``    | ``torch.*.LongTensor``    |

Use [``torch.set_default_dtype``](http://pytorch.org/docs/0.4.0/torch.htmltorch.set_default_dtype) and [``torch.get_default_dtype``](http://pytorch.org/docs/0.4.0/torch.htmltorch.get_default_dtype) to manipulate default ``dtype`` for floating point tensors.

[``torch.device``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.device)

A [``torch.device``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.device) contains a device type (``'cpu'`` or ``'cuda'``) and optional device ordinal (id) for the device type. It can be initilized with ``torch.device('{device_type}')`` or ``torch.device('{device_type}:{device_ordinal}')``.

If the device ordinal is not present, this represents the current device for the device type; e.g., ``torch.device('cuda')`` is equivalent to ``torch.device('cuda:X')`` where ``X`` is the result of ``torch.cuda.current_device()``.

[``torch.layout``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.layout)

[``torch.layout``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.layout) represents the data layout of a [``Tensor``](http://pytorch.org/docs/0.4.0/tensors.html). Currently``torch.strided`` (dense tensors) and ``torch.sparse_coo`` (sparse tensors with COO format) are supported.

Creating ``Tensor``s

[Methods that create a ``Tensor``](http://pytorch.org/docs/0.4.0/torch.htmlcreation-ops) now also take in ``dtype``, ``device``, ``layout``, and ``requires_grad`` options to specify the desired attributes on the returned ``Tensor``. For example,

python
>>> device = torch.device("cuda:1")
>>> x = torch.randn(3, 3, dtype=torch.float64, device=device)

2.5

tensor(3.5000)
>>> torch.tensor([1]) + torch.tensor(2.5)
tensor([3.5000])
>>> torch.tensor(True) + 5
tensor(6)
</pre></sub></td>
</tr>
</table>
</p>

Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error.  ([22273](https://github.com/pytorch/pytorch/pull/22273), [26981](https://github.com/pytorch/pytorch/pull/26981))

<p align="center">
<table align="center">
<tr><th>Version 1.2</th><th>Version 1.3</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
tensor(2)
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
tensor(True)
</pre></sub></td>
<td><sub><pre lang="python">
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
RuntimeError: result type Float cannot be cast to the desired output type Long
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
RuntimeError: result type Long cannot be cast to the desired output type Bool
</pre></sub></td>
</tr>
</table>
</p>

These rules can be checked at runtime via [torch.can_cast](https://pytorch.org/docs/master/torch.htmltorch.can_cast).

`torch.flatten`: 0-dimensional inputs now return a 1-dim tensor.  ([25406](https://github.com/pytorch/pytorch/pull/25406)).


<p align="center">
<table align="center">
<tr><th>Version 1.2</th><th>Version 1.3</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> torch.flatten(torch.tensor(0))
tensor(0)
</pre></sub></td>
<td><sub><pre lang="python">
>>> torch.flatten(torch.tensor(0))
tensor([0])
</pre></sub></td>
</tr>
</table>
</p>

`nn.functional.affine_grid`: when `align_corners = True`, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).

Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).

`torch.gels:` removed deprecated operator, use `torch.lstsq` instead.  ([26480](https://github.com/pytorch/pytorch/pull/26480)).

`utils.data.DataLoader:` made a number of Iterator attributes private (e.g. `num_workers`, `pin_memory`).  ([22273](https://github.com/pytorch/pytorch/pull/22273))

**[C++]** `Variable::backward` will no longer implicitly create a gradient for non-1-element Variables.  Previously, a gradient tensor of all 1s would be implicitly created . This behavior matches the Python API.  ([26150](https://github.com/pytorch/pytorch/pull/26150))


auto x = torch::randn({5, 5}, torch::requires_grad());
auto y = x * x;
y.backward()
// ERROR: "grad can be implicitly created only for scalar outputs"


[C++] All option specifiers (e.g. `GRUOptions::bidirectional_`) are now private, use the function variants (`GRUOptions::bidirectional(...))` instead. ([26419](https://github.com/pytorch/pytorch/pull/26419)).

Highlights

[Experimental]: Mobile Support

In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here are the full list of features in this release:

* Support for full TorchScript inference on mobile;
* Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
* Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
* Support for all forward ops on mobile CPU (backward ops are not supported yet);
* Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
* Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);

We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.

The tutorials, demo apps and download links for prebuilt libraries can be found at: https://pytorch.org/mobile/

This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster and better for your specific use cases. Stay tuned and give us your feedback!

[Experimental]: Named Tensor Support

Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by name" rather than "broadcasting by position".

Create a named tensor by passing a `names` argument into most tensor factory function.

python
>>> tensor = torch.zeros(2, 3, names=('C', 'N'))
tensor([[0., 0., 0.],
[0., 0., 0.]], names=('C', 'N'))


Named tensors propagate names across operations.

python
>>> tensor.abs()
tensor([[0., 0., 0.],
[0., 0., 0.]], names=('C', 'N'))


Rearrange to a desired ordering by using `align_to` .

python
>>> tensor = tensor.align_to('N', 'C', 'H', 'W')
>>> tensor.names, tensor.shape
(('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))


And more! [Please see our documentation on named tensors.](https://pytorch.org/docs/master/named_tensor.html)

[Experimental]: Quantization support

PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:


x = torch.rand(10,1, dtype=torch.float32)
xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
xq is a quantized tensor with data represented as quint8
xdq = x.dequantize()
convert back to floating point


We also support 8 bit quantized implementations of most common operators in CNNs, including:

* Tensor operations:
* view, clone, resize, slice
* add, multiply, cat, mean, max, sort, topk
* Modules/Functionals (in torch.nn.quantized)
* Conv2d
* Linear
* Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
* Interpolate
* Upsample
* Fused operations for preserving better accuracy (in torch.nn.intrinsic)
* ConvReLU2d, ConvBnReLU2d, ConvBn2d
* LinearReLU
* add_relu

We also support dynamic quantized operators, which take in floating point activations, but use quantized weights (in torch.nn.quantized.dynamic).

* LSTM
* Linear

Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:

* MinMaxObserver
* MovingAverageMinMaxObserver
* PerChannelMinMaxObserver
* MovingAveragePerChannelMinMaxObserver
* HistogramObserver

For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:

* `torch.fake_quantize_per_tensor_affine`, `torch.fake_quantize_per_channel_affine`
* `torch.quantization.FakeQuantize`

In addition, we also support workflows in torch.quantization for:

* post-training dynamic quantization
* static post training quantization
* quantization aware training

All quantized operators are compatible with TorchScript.

For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html

Type Promotion

Arithmetic and comparison operations may now perform mixed-type operations that promote to a common dtype.

This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with `dtype=torch.float32`.


>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)


See the full [documentation](https://github.com/pytorch/pytorch/blob/master/docs/source/tensor_attributes.rsttype-promotion-doc) for more details.

* `torch.result_type` Provide function to determine result of mixed-type operations ([26012](https://github.com/pytorch/pytorch/pull/26012)).
* `torch.can_cast` Expose casting rules for type promotion ([26805](https://github.com/pytorch/pytorch/pull/26805)).
* `torch.promote_types` Expose promotion logic ([26655](https://github.com/pytorch/pytorch/pull/26655)).




Deprecations


`nn.functional.affine_grid` / `nn.functional.grid_sample`: USING The Align_CORNER Default value is now deprecated, because it will be changed in 1.4 release.

The `align_corner` parameter was added in this release; the behavior in the previous release was equivalent to setting the parameter to `True`.  This is also the current default value but it will be changed to `False` from 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2))
UserWarning: Default grid_sample and affine_grid behavior will be changed
to align_corners=False from 1.4.0.
See the documentation of grid_sample for details.
...

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2),
align_corners=True)
NO WARNING!
...

[C++] Deprecate `torch::Tensor::data<T>()` in favor of `torch::Tensor::data_ptr<T>()` ([24847](https://github.com/pytorch/pytorch/pull/24847), [24886](https://github.com/pytorch/pytorch/pull/24886)).

New Features

TensorBoard: 3D Mesh and Hyperparameter Support

`torch.utils.tensorboard` supports 3D mesh and points plus hyperparameter logging. More details can be found in [the documentation](https://pytorch.org/docs/stable/tensorboard.html) for `SummaryWriter` with `add_mesh` and `add_hparams`.

A simple example exercising both methods:


from torch.utils.tensorboard import SummaryWriter

vertices_tensor = torch.as_tensor([
[1, 1, 1],
[-1, -1, 1],
[1, -1, -1],
[-1, 1, -1],
], dtype=torch.float).unsqueeze(0)
colors_tensor = torch.as_tensor([
[255, 0, 0],
[0, 255, 0],
[0, 0, 255],
[255, 0, 255],
], dtype=torch.int).unsqueeze(0)
faces_tensor = torch.as_tensor([
[0, 2, 3],
[0, 3, 1],
[0, 1, 2],
[1, 3, 2],
], dtype=torch.int).unsqueeze(0)

with SummaryWriter() as w:
w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
for i in range(5):
w.add_hparams({'lr': 0.1*i, 'bsize': i},
{'hparam/accuracy': 10*i, 'hparam/loss': 10*i})



Distributed

This release adds macOS support for `torch.distributed` with the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.


* `torch.distributed.all_reduce_coalesced` Support allreduce of a list of same-device tensors ([24949](https://github.com/pytorch/pytorch/pull/24949), [25470](https://github.com/pytorch/pytorch/pull/25470), [24876](https://github.com/pytorch/pytorch/pull/24876))
* `torch.distributed.all_reduce` Add bitwise reduction ops (BAND, BOR, BXOR) ([26824](https://github.com/pytorch/pytorch/pull/26824))

Libtorch Binaries with C++11 ABI

We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ “QUICK START LOCALLY”.


New TorchScript features

* Add `not in` support for TorchScript ([23637](https://github.com/pytorch/pytorch/pull/23637)).
* You can now raise exceptions in one side of an if branch ([23565](https://github.com/pytorch/pytorch/pull/23565)).
* Add `torch.jit.is_scripting()` API ([25955](https://github.com/pytorch/pytorch/pull/25955)).
* Make assertions like `x is not None` unwrap the optional type of `x` ([23949](https://github.com/pytorch/pytorch/pull/23949)).
* Add dictionary augmented assignment (`+=`) support to TorchScript ([23639](https://github.com/pytorch/pytorch/pull/23639)).
* Support `grad` and `data` attribute for tensor in TorchScript ([23842](https://github.com/pytorch/pytorch/pull/23842)).
* Add `ignore` for TorchScript classes ([23614](https://github.com/pytorch/pytorch/pull/23614)).
* Support nn.GRU in script ([23266](https://github.com/pytorch/pytorch/pull/23266)).
* Support tensor as a key type in TorchScript ([23638](https://github.com/pytorch/pytorch/pull/23638)).
* Add support for ModuleDict ([25715](https://github.com/pytorch/pytorch/pull/25715)).
* Bind `set_grad_enabled()` into TorchScript ([25350](https://github.com/pytorch/pytorch/pull/25350)).
* Add `in` membership checks for lists ([25796](https://github.com/pytorch/pytorch/pull/25796)).
* Add `tuple` keyword ([25474](https://github.com/pytorch/pytorch/pull/25474)).
* Add `__getitem__` to class types ([25664](https://github.com/pytorch/pytorch/pull/25664)).
* Add `__setitem__` to class types ([25750](https://github.com/pytorch/pytorch/pull/25750)).
* Make JIT dicts ordered, matching Python 3.6+ semantics ([26465](https://github.com/pytorch/pytorch/pull/26465)).
* Added invert bitwise operation to TorchScript ([22324](https://github.com/pytorch/pytorch/pull/22324)).
* Add `min()` and `max()` for lists to TorchScript ([26351](https://github.com/pytorch/pytorch/pull/26351)).
* Support iterables and ranges in list comprehensions ([26768](https://github.com/pytorch/pytorch/pull/26768)).

Improvements



C++ Frontend Improvements

We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:

Autograd

* Tensor autograd APIs
* `torch::Tensor::data` Added ([26008](https://github.com/pytorch/pytorch/pull/26008)).
* `torch::Tensor::grad` Don’t create a gradient for non-1-element Variables [BC-breaking] ([26150](https://github.com/pytorch/pytorch/pull/26150)).
* `torch::Tensor::is_leaf` Added ([26186](https://github.com/pytorch/pytorch/pull/26186)).
* `torch::Tensor::output_nr` Added ([26216](https://github.com/pytorch/pytorch/pull/26216)).
* `torch::Tensor::_version` Added ([26217](https://github.com/pytorch/pytorch/pull/26217)).
* Add support for custom autograd functions in C++ API
* For example usage, please see the PR description and test cases in ([23572](https://github.com/pytorch/pytorch/pull/23572), [23628](https://github.com/pytorch/pytorch/pull/23628), and [23803](https://github.com/pytorch/pytorch/pull/23803))
* `torch::autograd::backward` and `torch::autograd::grad` ([24342](https://github.com/pytorch/pytorch/pull/24342))
* `torch::autograd::Variable::register_hook` ([24393](https://github.com/pytorch/pytorch/pull/24393)).

New torch::nn modules

* Containers
* torch::nn::ModuleList ([24317](https://github.com/pytorch/pytorch/pull/24317)).
* Linear layers
* torch::nn::Identity ([26713](https://github.com/pytorch/pytorch/pull/26713)).
* Convolution layers
* torch::nn::Fold ([24160](https://github.com/pytorch/pytorch/pull/24160)).
* Pooling layers
* torch::nn::MaxPool1d / MaxPool2d / MaxPool3d ([24860](https://github.com/pytorch/pytorch/pull/24860), [26521](https://github.com/pytorch/pytorch/pull/26521)).
* torch::nn::AvgPool1d / AvgPool2d / AvgPool3d ([25800](https://github.com/pytorch/pytorch/pull/25800)).
* torch::nn::AdaptiveMaxPool1d / AdaptiveMaxPool2d / AdaptiveMaxPool3d ([26755](https://github.com/pytorch/pytorch/pull/26755), [26772](https://github.com/pytorch/pytorch/pull/26772), [26775](https://github.com/pytorch/pytorch/pull/26775)).
* Loss functions
* torch::nn::L1Loss ([25902](https://github.com/pytorch/pytorch/pull/25902)).
* Distance functions
* torch::nn::CosineSimilarity ([26424](https://github.com/pytorch/pytorch/pull/26424))
* torch::nn::PairwiseDistance ([26424](https://github.com/pytorch/pytorch/pull/26424))

New torch::nn::functional functions

* Pooling functions
* torch::nn::functional::max_pool1d / max_pool2d / max_pool3d ([26262](https://github.com/pytorch/pytorch/pull/26262)).
* torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices ([26521](https://github.com/pytorch/pytorch/pull/26521)).
* torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d ([26262](https://github.com/pytorch/pytorch/pull/26262)).
* torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d ([26755](https://github.com/pytorch/pytorch/pull/26755), [26772](https://github.com/pytorch/pytorch/pull/26772), [26775](https://github.com/pytorch/pytorch/pull/26775)).
* torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices ([26755](https://github.com/pytorch/pytorch/pull/26755), [26772](https://github.com/pytorch/pytorch/pull/26772), [26775](https://github.com/pytorch/pytorch/pull/26775)).
* Distance functions
* torch::nn::functional::cosine_similarity ([26424](https://github.com/pytorch/pytorch/pull/26424)).
* torch::nn::functional::pairwise_distance ([26424](https://github.com/pytorch/pytorch/pull/26424)).

tensor Construction API

* Add support for multidimensional inputs to `torch::tensor` ([26210](https://github.com/pytorch/pytorch/pull/26210), [26890](https://github.com/pytorch/pytorch/pull/26890), [26756](https://github.com/pytorch/pytorch/pull/26756)).
* From now on, we can use `torch::tensor({{1, 2}, {3, 4}})` in C++ to construct the same tensor as `torch.tensor([[1, 2], [3, 4]])` in Python. Some caveats are noted in [this comment](https://github.com/pytorch/pytorch/blob/e0ae3ce5e4b5a98c8dd67b9ec1ea0f81dfc52fef/tools/autograd/templates/variable_factories.hL184-L194).
* Add support for bool and BFloat16 dtypes to `torch::tensor` ([23337](https://github.com/pytorch/pytorch/pull/23337)).

Other C++ Improvements

* Add `torch::nn::Module::unregister_module` function, for unregistering a submodule from a `torch::nn::Module` ([26088](https://github.com/pytorch/pytorch/pull/26088)).

Distributed Improvements

* `torch.distributed` Detect and handle NCCL errors appropriately instead of blocking peers until timeout in `ProcessGroupNCCL` ([25012](https://github.com/pytorch/pytorch/pull/25012), [25905](https://github.com/pytorch/pytorch/pull/25905))
* `torch.distributed` Make scatter/gather arguments optional ([25575](https://github.com/pytorch/pytorch/pull/25575))
* `torch.distributed.launch` Add a -m flag to allow users to launch python modules ([24910](https://github.com/pytorch/pytorch/pull/24910)).
* `torch.distributed` Add function to get NCCL version for logging ([26583](https://github.com/pytorch/pytorch/pull/26583)).
* `torch.distributed` Add timeout parameter to connect function in TCPStore ([26554](https://github.com/pytorch/pytorch/pull/26554)).
* `torch.distributed` use timeout in connect function to prevent against infinite loop ([26364](https://github.com/pytorch/pytorch/pull/26364)).
* `torch.nn.modules.batchnorm` Allow SyncBatchNorm to run without DDP in inference mode ([24815](https://github.com/pytorch/pytorch/pull/24815))

Performance Improvements

* `torch.argmax/argmin` Rewrite as TensorIterator reductions ([26181](https://github.com/pytorch/pytorch/pull/26181)).
* `torch.erfinv` Vectorize unary operator ([26629](https://github.com/pytorch/pytorch/pull/26629)).
* `torch.sin/cos/tan` Use intrinsics for trigonometric functions on CPU ([26431](https://github.com/pytorch/pytorch/pull/26431)).
* Fix possible deadlock in SharedCache inside a forked child proc ([25158](https://github.com/pytorch/pytorch/pull/25158)).
* `torch.qr` Fix a regression ([23591](https://github.com/pytorch/pytorch/pull/23591)).
* `nn.Conv` Use Caffe2's implementation of grouped depthwise 3x3 convolutions ([26556](https://github.com/pytorch/pytorch/pull/26556)).
* `nn.Conv` Use parallel_for in DepthwiseConvKernel ([26879](https://github.com/pytorch/pytorch/pull/26879)).
* `nn.Conv` Change shape for conv and unary ops ([25477](https://github.com/pytorch/pytorch/pull/25477)).
* Fix pin_memory_thread not exiting quickly ([23646](https://github.com/pytorch/pytorch/pull/23646)).
* Increase predefined_minimum_secs to reduce variation ([23734](https://github.com/pytorch/pytorch/pull/23734)).
* Enhance Tensor indexSelect performance ([23055](https://github.com/pytorch/pytorch/pull/23055)).
* Separate input shapes to reduce default execution time ([24136](https://github.com/pytorch/pytorch/pull/24136)).
* constraints.lower_cholesky Vectorize LowerCholeskyTransform ([24131](https://github.com/pytorch/pytorch/pull/24131)).
* Speed up an integer to the power of a positive integer on CPU ([26020](https://github.com/pytorch/pytorch/pull/26020)).
* [ROCm] Enable jit fusion ([22872](https://github.com/pytorch/pytorch/pull/22872)).
* [ROCm] Use MIOpen for transpose convolutions ([26172](https://github.com/pytorch/pytorch/pull/26172)).

JIT Improvements

* Enable CPU fused kernel on Windows ([25578](https://github.com/pytorch/pytorch/pull/25578)).
* Expose an API to iterate all the registered operators ([23207](https://github.com/pytorch/pytorch/pull/23207)).
* Include recursive class compilations in error call stack ([23454](https://github.com/pytorch/pytorch/pull/23454)).
* Substantial improvements to saved model format speed and size.
* Compress debug symbols when serializing TorchScript models. ([23659](https://github.com/pytorch/pytorch/pull/23659)).
* Compress all non-Tensor components of a serialized TorchScript model. ([23723](https://github.com/pytorch/pytorch/pull/23723)).
* Perform string uniquing by value in pickle serialization. ([23741](https://github.com/pytorch/pytorch/pull/23741)).
* Implement a bunch of pickle serialization features that optimize for size. ([23759](https://github.com/pytorch/pytorch/pull/23759)).
* Implement more size-oriented opcodes in the depickler. ([26454](https://github.com/pytorch/pytorch/pull/26454)).
* Cache node operators to speed up optimization ([24827](https://github.com/pytorch/pytorch/pull/24827)).
* Allow forward hooks in tracing ([23613](https://github.com/pytorch/pytorch/pull/23613)).
* Add Pickler C++ API ([23241](https://github.com/pytorch/pytorch/pull/23241)).
* Open up AliasAnalysisKind for any ops ([23810](https://github.com/pytorch/pytorch/pull/23810)).
* Add the ability to compile exports on traced modules ([24298](https://github.com/pytorch/pytorch/pull/24298)).
* Make `NoneType` a subtype of `Optional[T]` ([25361](https://github.com/pytorch/pytorch/pull/25361)).

ONNX Exporter Improvements

In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.

Adding Support for ONNX IR v4

* Provide an option to exclude the weights from model inputs ([23284](https://github.com/pytorch/pytorch/pull/26146))
* Make graph inputs without weights as default ([26146](https://github.com/pytorch/pytorch/pull/26146))

Adding Support for ONNX Opset 11

* Introduce ONNX Opset 11 support ([23739](https://github.com/pytorch/pytorch/pull/23739))
* Add export for torch.Interpolate in Opset 11 ([24805](https://github.com/pytorch/pytorch/pull/24805), [27179](https://github.com/pytorch/pytorch/pull/27179))
* Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 ([24790](https://github.com/pytorch/pytorch/pull/24790))
* Add export for tensor.clamp in Opset 11 ([25797](https://github.com/pytorch/pytorch/pull/25797))
* Add export for torch.topk and torch.sort in Opset 11 ([25739](https://github.com/pytorch/pytorch/pull/25739))

Exporting More Torch Operators/Models to ONNX

* Export torch.pixel_shuffle ([23739](https://github.com/pytorch/pytorch/pull/23739))
* Export torch.multinomial ([23581](https://github.com/pytorch/pytorch/pull/23581))
* Export torch.norm’s frobenius_norm ([23536](https://github.com/pytorch/pytorch/pull/23536))
* Export torch.std ([22310](https://github.com/pytorch/pytorch/pull/22310))
* Export torch.empty and torch.empty_like ([24166](https://github.com/pytorch/pytorch/pull/24166))
* Export torch.rsqrt ([24153](https://github.com/pytorch/pytorch/pull/24153))
* Export torch.log1p ([25808](https://github.com/pytorch/pytorch/pull/25808))
* Export torch.unique ([25050](https://github.com/pytorch/pytorch/pull/25050))
* Export torch.gelu ([24475](https://github.com/pytorch/pytorch/pull/24475))
* Export tensor.index_fill and tensor.index_copy ([23052](https://github.com/pytorch/pytorch/pull/23052))
* Export torch.round ([26126](https://github.com/pytorch/pytorch/pull/26126))
* Export torch.baddbmm ([25738](https://github.com/pytorch/pytorch/pull/25738))
* Export torch.remainder ([24410](https://github.com/pytorch/pytorch/pull/24410))
* Export torch.cumsum ([24476](https://github.com/pytorch/pytorch/pull/24476))
* Export tensor.size with negative axis ([26436](https://github.com/pytorch/pytorch/pull/26436))
* Export RNN/LSTM with h0/c0 initial state ([22813](https://github.com/pytorch/pytorch/pull/22813))

Enhancing ONNX Export Infra

* Enable exporting PyTorch models which have Dict and String as inputs and outputs ([25889](https://github.com/pytorch/pytorch/pull/25889))
* Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. ([24378](https://github.com/pytorch/pytorch/pull/24378))
* Correctly validate dynamic axes names. ([23974](https://github.com/pytorch/pytorch/pull/23974))
* Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 ([22993](https://github.com/pytorch/pytorch/pull/22993))

Other Improvements

* Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result ([23063](https://github.com/pytorch/pytorch/issues/23063)).
* `torch.det/logdet/slogdet` Allowing batching ([22909](https://github.com/pytorch/pytorch/pull/22909)).
* `torch.logical_not` Add new operator ([23839](https://github.com/pytorch/pytorch/pull/23839)).
* `torch.logical_xor` Add new operator ([23847](https://github.com/pytorch/pytorch/pull/23847)).
* `torch.symeig` Improve the stability of gradient updates ([23018](https://github.com/pytorch/pytorch/pull/23018)).
* `torch.eye` Enable for bool and half ([24148](https://github.com/pytorch/pytorch/pull/24148)).
* `torch.tril / triu` Enable for bool and half ([24163](https://github.com/pytorch/pytorch/pull/24163)).
* `torch.logical_not/xor` support non-bool tensors. ([23916](https://github.com/pytorch/pytorch/pull/23916), [23978](https://github.com/pytorch/pytorch/pull/23978)).
* `torch.index_select` Implement indexing methods for sparse tensors ([24937](https://github.com/pytorch/pytorch/pull/24937)).
* `torch.lu_solve` Enable broadcasting of batch dimensions ([24333](https://github.com/pytorch/pytorch/pull/24333)).
* `torch.cholesky` Enable batches greater than 262140 ([24438](https://github.com/pytorch/pytorch/pull/24438)).
* `torch.det` Simplify generation of singular matrices to avoid numerical issue on PowerPC ([25773](https://github.com/pytorch/pytorch/pull/25773)).
* `torch.erfinv` In the CUDA implementation, use erfinv() for double to preserve accuracy ([25337](https://github.com/pytorch/pytorch/pull/25337)).
* `torch.erfinv` Add a float version of erfinv on CPU ([26070](https://github.com/pytorch/pytorch/pull/26070)).
* `torch.cuda.stream` Updates autograd engine to respect streams set in forward ([8354](https://github.com/pytorch/pytorch/pull/8354)).
* `torch.backends.mkldnn.enabled` Allow disabling MKLDNN at runtime ([25459](https://github.com/pytorch/pytorch/pull/25459)).
* `torch.cholesky_solve` Add derivative ([26185](https://github.com/pytorch/pytorch/pull/26185)).
* `torch.cholesky_inverse` Add derivative ([26451](https://github.com/pytorch/pytorch/pull/26451)).
* `torch.polygamma` Ensure that n is non-negativ`e` ([26294](https://github.com/pytorch/pytorch/pull/26294)).
* `torch.pinverse` Enable batching ([26095](https://github.com/pytorch/pytorch/pull/26095)).
* `torch.digamma/trigamma` Fix type mismatches on CUDA ([25791](https://github.com/pytorch/pytorch/pull/25791)).
* `torch.where` Enable for bool tensor on CUDA ([26430](https://github.com/pytorch/pytorch/pull/26430)).
* `torch.load` default encoding change to 'utf-8' ([26421](https://github.com/pytorch/pytorch/pull/26421)).
* `torch.repeat_interleave` Respect the current stream ([26946](https://github.com/pytorch/pytorch/pull/26946)).
* `torch.bernoulli_` Implement for bool tensors ([25076](https://github.com/pytorch/pytorch/pull/25076)).
* `torch.norm` Fix nuclear norm with requires_grad=True ([26303](https://github.com/pytorch/pytorch/pull/26303)).
* `torch.hub.download_url_to_file` Make function public ([26723](https://github.com/pytorch/pytorch/pull/26723)).
* `nn.modules.conv` add padding_mode to repr ([23996](https://github.com/pytorch/pytorch/pull/23996)).
* `nn.Transformer` Extend to support BERT (gelu) ([24181](https://github.com/pytorch/pytorch/pull/24181)).
* `nn.BatchNorm2d` Add support for non-affine batch norm with float stats and half inputs ([22750](https://github.com/pytorch/pytorch/pull/22750)).
* `nn.Parameter` Fix type hints ([25586](https://github.com/pytorch/pytorch/pull/25586)).
* `nn.CTCLoss` Improve error message ([26325](https://github.com/pytorch/pytorch/pull/26325)).
* `nn.Conv` Allow batch size of 0 ([26214](https://github.com/pytorch/pytorch/pull/26214)).
* `nn.LSTM/GRU` enable double backward for non-cudnn ([26660](https://github.com/pytorch/pytorch/pull/26660)).
* `optim.Adagrad` Add epsilon argument ([24980](https://github.com/pytorch/pytorch/pull/24980)).
* `optim.LBFGS`  Change default tolerance_grad to 1e-7 ([25240](https://github.com/pytorch/pytorch/pull/25240)).
* `optim.lr_scheduler.OneCycleLR` Add new 1cycle learning rate scheduler ([25324](https://github.com/pytorch/pytorch/pull/25324)).
* `optimizer.step` Fix type annotation ([26930](https://github.com/pytorch/pytorch/pull/26930)).
* `bfloat16` Add support for sub, mul, and div on CPU ([22851](https://github.com/pytorch/pytorch/pull/22851)).
* `bfloat16` Enabled comparison ops on CPU ([24182](https://github.com/pytorch/pytorch/pull/24182)).
* `bfloat16` Enabled masked methods ([24183](https://github.com/pytorch/pytorch/pull/24183)).
* `bfloat16` Enabled torch.mm and torch.mv ([24224](https://github.com/pytorch/pytorch/pull/24224)).
* `bfloat16` Enable log_softmax and CrossEntropyLoss ([24457](https://github.com/pytorch/pytorch/pull/24457)).
* `bfloat16` Enabled conv methods ([26167](https://github.com/pytorch/pytorch/pull/26167)).
* `bfloat16` Enabled dtype on CUDA ([26407](https://github.com/pytorch/pytorch/pull/26407)).
* `quasirandom.SobolEngine` Use random seed if not specified ([24884](https://github.com/pytorch/pytorch/pull/24884)).
* `utils.data.dataloader` Add possible out of shared memory error message ([25730](https://github.com/pytorch/pytorch/pull/25730)).
* `cuda.set_rng_state` Add type hint ([26200](https://github.com/pytorch/pytorch/pull/26200)).
* Zero sized tensor support for repeat_interleave ([23717](https://github.com/pytorch/pytorch/pull/23717)).
* Recommend `~` and `bitwise_not()` when user tries to apply neg (`-`) on a bool tensor. ([23621](https://github.com/pytorch/pytorch/pull/23621)).
* Fix double backward of inplace op on view ([23502](https://github.com/pytorch/pytorch/pull/23502)).
* `autograd.grad` Validate shapes of outputs ([25349](https://github.com/pytorch/pytorch/pull/25349)).
* Enable libflame as a LAPACK choice ([25795](https://github.com/pytorch/pytorch/pull/25795)).
* Fix race condition in CUDA initialization ([25788](https://github.com/pytorch/pytorch/pull/25788)).
* Include `iteration_` in SGD optimizer serialization ([26906](https://github.com/pytorch/pytorch/pull/26906)).
* [C++] `torch::tensor` Fix an ambiguous overload issues in constructor ([26890](https://github.com/pytorch/pytorch/pull/26890)).
* [XLA] Check device before accessing data_ptr in PackLayer ([26056](https://github.com/pytorch/pytorch/pull/26056)).
* [XLA] Allow overwriting catch-all kernels ([25947](https://github.com/pytorch/pytorch/pull/25947)).



Bug Fixes

TensorBoard Bug Fixes

* `SummaryWriter.add_graph`: Fix empty graph output in some cases ([25599](https://github.com/pytorch/pytorch/pull/25599)).
* Update Caffe2 contrib TensorBoard logging to not require TensorFlow ([25259](https://github.com/pytorch/pytorch/pull/25259)).
* `SummaryWriter.make_video`: Fix write_gif call to moviepy for newer lib ([21218](https://github.com/pytorch/pytorch/pull/21218)).

C++ API Bug fixes

* Fixes mismatch of device and data type when computing `step_size` in LBFGS optimizer ([25909](https://github.com/pytorch/pytorch/pull/25909)).

JIT

* Fix list comprehension that change the type of the original iterable ([24271](https://github.com/pytorch/pytorch/pull/24271)).
* Fix double copying of constants during recursive scripting ([24412](https://github.com/pytorch/pytorch/pull/24412)).
* Fix frontend error message ([23576](https://github.com/pytorch/pytorch/pull/23576)).
* Clear recursive error stack on each compilation ([23458](https://github.com/pytorch/pytorch/pull/23458)).
* Fix bugs in assignment to optionals ([25059](https://github.com/pytorch/pytorch/pull/25059)).
* Make `torch.jit.Attribute` work when `PYTORCH_ENABLED=0` ([23851](https://github.com/pytorch/pytorch/pull/23851)).
* Fix unicode in comments causing compilation errors ([24218](https://github.com/pytorch/pytorch/pull/24218)).
* Correctly raise an error if an `nn.Module` has not been initialized but you try to script it ([24852](https://github.com/pytorch/pytorch/pull/24852)).
* Fix annotated assignment to variables ([25094](https://github.com/pytorch/pytorch/pull/25094)).
* dictPop: dereference dict.find() iterator before calling dict.erase() ([25056](https://github.com/pytorch/pytorch/pull/25056)).
* fix closures which always throw. ([25278](https://github.com/pytorch/pytorch/pull/25278)).
* Add source location to class instantiation error ([24990](https://github.com/pytorch/pytorch/pull/24990)).
* Fix `AliasAnalysisKind::PURE` on MSVC ([25375](https://github.com/pytorch/pytorch/pull/25375)).
* Emit script function calls during tracing. ([25089](https://github.com/pytorch/pytorch/pull/25089)).
* Resolve `NamedTuple` types properly in Python ([26443](https://github.com/pytorch/pytorch/pull/26443)).
* Fix schema matching of tuples to vartype lists ([25944](https://github.com/pytorch/pytorch/pull/25944)).
* Correctly preserve ignored function return value type ([25262](https://github.com/pytorch/pytorch/pull/25262)).
* Fix missing newline in compiled from source range highlight ([25802](https://github.com/pytorch/pytorch/pull/25802)).
* Fix use-after-free bug in `optional` ([25965](https://github.com/pytorch/pytorch/pull/25965)).
* Fix torch.arange traced as constant ([25363](https://github.com/pytorch/pytorch/pull/25363)).
* Preserve module names in recursive script ([24505](https://github.com/pytorch/pytorch/pull/24505)).
* Properly resolve ignored module method type annotations ([26683](https://github.com/pytorch/pytorch/pull/26683)).
* Make `is_optional` check more robust ([26312](https://github.com/pytorch/pytorch/pull/26312)).
* Fix builtin lookup for Python functions ([26688](https://github.com/pytorch/pytorch/pull/26688)).
* Typevar matching fix + implicit conversions from Scalar to int/float ([26453](https://github.com/pytorch/pytorch/pull/26453)).
* Fix range for non-int inputs and pow implementation ([26926](https://github.com/pytorch/pytorch/pull/26926)).

Other Bug Fixes

* `torch.is_pinned` pin_memory should not copy on already pinned tensors ([23484](https://github.com/pytorch/pytorch/pull/23484)).
* `torch.cdist` Fix incorrect gradients on CUDA non-batch tensors ([22915](https://github.com/pytorch/pytorch/pull/22915)).
* `torch.from_numpy` Fix failure on windows for int32 ([25139](https://github.com/pytorch/pytorch/pull/25139)).
* `torch.tensor` Fix memory leak creating a tensor from numpy ([24267](https://github.com/pytorch/pytorch/pull/24267)).
* `torch.index` Don't save `self` in `index` backward ([25594](https://github.com/pytorch/pytorch/pull/25594)).
* `torch.bincount` Fix int32 overflow on CUDA ([25748](https://github.com/pytorch/pytorch/pull/25748)).
* `torch.bernoulli` Fix the distribution sampler ([26864](https://github.com/pytorch/pytorch/pull/26864)).
* `torch.pow` Fix precision ([25476](https://github.com/pytorch/pytorch/pull/25476)).
* `torch.cdist` Fix gradient computation when first arg is 1xn ([26254](https://github.com/pytorch/pytorch/pull/26254)).
* `torch.scatter_add_` Fix scatter CPU kernel when (input size, src size) > index size ([25839](https://github.com/pytorch/pytorch/pull/25839)).
* `nn.ConvTranspose2d` Fixed an error with float16 inputs and weights on CUDA.  ([23552](https://github.com/pytorch/pytorch/pull/23552)).
* `nn.CTCLoss` Fix zero-length targets on CUDA ([23298](https://github.com/pytorch/pytorch/pull/23298)).
* `nn.Conv2d` Correct an overflow in an error message ([25146](https://github.com/pytorch/pytorch/pull/25146)).
* `optim.Adam` apply a small mathematical fix. ([23737](https://github.com/pytorch/pytorch/pull/23737)).
* `dataloader` Fix IndexError on shutdown if not all workers are started ([23761](https://github.com/pytorch/pytorch/pull/23761)).
* `Tensor.repeat` Fix crash on for 0 repeats ([23766](https://github.com/pytorch/pytorch/pull/23766)).
* `torch.pin_memory` only use one thread ([25111](https://github.com/pytorch/pytorch/pull/25111)).
* `distributions.Uniform,HalfCauchy,Gamma` Fix `log_prob` when value is a float ([23017](https://github.com/pytorch/pytorch/pull/23017)).
* Fix typing error for Padding with asymmetric signatures ([24895](https://github.com/pytorch/pytorch/pull/24895)).
* Avoid race condition in `intrusive_ptr.reset_()` ([24464](https://github.com/pytorch/pytorch/pull/24464)).
* `torch.hub`: Fix SSL cert issue for hub in Python 2 ([25042](https://github.com/pytorch/pytorch/pull/25042)).
* Fix int overflow issue in CUDA kernels. ([24818](https://github.com/pytorch/pytorch/pull/24818)).
* `Module.cuda` Fix type hints ([25018](https://github.com/pytorch/pytorch/pull/25018)).
* Fix bug in assertNotEqual for int tensors ([25412](https://github.com/pytorch/pytorch/pull/25412)).
* Fix 'in' return true incorrectly ([24156](https://github.com/pytorch/pytorch/pull/24156)).
* Fix bugs in bulk loader when `batch_size=None` or with namedtuple ([26065](https://github.com/pytorch/pytorch/pull/26065)).
* Fix serialization issue in big endian arch ([26383](https://github.com/pytorch/pytorch/pull/26383)).
* Fix `Vec256::abs()` for floating point when applied on -0.0 ([26422](https://github.com/pytorch/pytorch/pull/26422)).
* Fix cyclic reference in _LRScheduler ([25776](https://github.com/pytorch/pytorch/pull/25776)).
* Fix a build failure on s390x ([26233](https://github.com/pytorch/pytorch/pull/26233)).
* [XLA] Fix tensor construction from array ([24283](https://github.com/pytorch/pytorch/pull/24283)).

Documentation Updates

Distributed

* `torch.distributed` Error phrasing in torch.distributed helper functions ([25574](https://github.com/pytorch/pytorch/pull/25574))
* `torch.distributions.negative_binomial` clarified ambiguous doc string in NegativeBinomial ([25923](https://github.com/pytorch/pytorch/pull/25923))

JIT

* Add technical documentation for the serialization format ([23456](https://github.com/pytorch/pytorch/pull/23456)).
* Fix trace docs ([24191](https://github.com/pytorch/pytorch/pull/24191)).
* Add `trace_module` to docs ([24258](https://github.com/pytorch/pytorch/pull/24258)).
* Cleanup distinction around `script` and `trace` ([24208](https://github.com/pytorch/pytorch/pull/24208)).
* Fix `item()` call in docs ([25404](https://github.com/pytorch/pytorch/pull/25404)).
* Misc doc updates / fixes ([24371](https://github.com/pytorch/pytorch/pull/24371), [24445](https://github.com/pytorch/pytorch/pull/24445)).

Other documentation improvements

* `torch.record_stream` Add documentation ([24078](https://github.com/pytorch/pytorch/pull/24078)).
* `torch.fold` Describe the relation between fold and unfold operations ([24840](https://github.com/pytorch/pytorch/pull/24840)).
* `torch.argmax` Fix incorrect doc ([23775](https://github.com/pytorch/pytorch/pull/23775)).
* `torch.random` add docs ([23553](https://github.com/pytorch/pytorch/pull/23553)).
* `torch.empty_strided` Add docs ([23735](https://github.com/pytorch/pytorch/pull/23735)).
* `torch.bitwise_not` Document for bool tensors ([23800](https://github.com/pytorch/pytorch/pull/23800)).
* `torch.cdist` Add documentation ([25221](https://github.com/pytorch/pytorch/pull/25221)).
* `torch.where` Update parameter names in doc ([25554](https://github.com/pytorch/pytorch/pull/25554)).
* `torch.atan2` Clarify and correct the doc ([26180](https://github.com/pytorch/pytorch/pull/26180)).
* `nn.functional.bilinear` Added documentation ([24951](https://github.com/pytorch/pytorch/pull/24951)).
* `nn.functional.upsample` Fix align_corners doc ([23707](https://github.com/pytorch/pytorch/pull/23707)).
* `nn.Transformer` Fixed an error in the example ([24837](https://github.com/pytorch/pytorch/pull/24837)).
* `optim.lr_scheduler.CosineAnnealingWarmRestarts` Add documentation ([25421](https://github.com/pytorch/pytorch/pull/25421)).
* `optim.SGD` Updated with subscripts ([23985](https://github.com/pytorch/pytorch/pull/23985)).
* `optim.RMSprop` Highlighting in the doc that square root comes before adding epsilon ([26735](https://github.com/pytorch/pytorch/pull/26735)).
* `autograd.detect_anomaly` Add a warning ([26615](https://github.com/pytorch/pytorch/pull/26615)).
* Improve dataloader docs on when auto-batching is disabled ([23671](https://github.com/pytorch/pytorch/pull/23671)).
* Updated docs and added deprecation warnings to acknowledge a bool tensor ([22261](https://github.com/pytorch/pytorch/pull/22261)).
* Document benchmarking practice for CUDA ([23910](https://github.com/pytorch/pytorch/pull/23910)).
* Add ASAN instructions to CONTRIBUTING.md ([24848](https://github.com/pytorch/pytorch/pull/24848)).

1.2758

[ 0.8414,  1.7962,  1.0589],
[-0.1369, -1.0462, -0.4373]], dtype=torch.float64, device='cuda:1')
>>> x.requires_grad   default is False
False
>>> x = torch.zeros(3, requires_grad=True)
>>> x.requires_grad
True


[``torch.tensor``](http://pytorch.org/docs/0.4.0/torch.htmltorch.tensor)
[``torch.tensor``](http://pytorch.org/docs/0.4.0/torch.htmltorch.tensor) is one of the newly added [tensor creation methods](http://pytorch.org/docs/0.4.0/torch.htmlcreation-ops). It takes in array like data of all kinds and copies the contained values into a new ``Tensor``. As mentioned earlier, [``torch.tensor``](http://pytorch.org/docs/0.4.0/torch.htmltorch.tensor) is the PyTorch equivalent of NumPy's ``numpy.array`` constructor.  Unlike the ``torch.*Tensor`` methods, you can also create zero-dimensional ``Tensor``s (aka scalars) this way (a single python number is treated as a Size in the``torch.*Tensor``  methods). Moreover, if a ``dtype`` argument isn't given, it will infer the suitable ``dtype`` given the data. It is the recommended way to create a tensor from existing data like a Python list. For example,

python
>>> cuda = torch.device("cuda")
>>> torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda)
tensor([[ 1],
[ 2],
[ 3]], device='cuda:0')
>>> torch.tensor(1)                scalar
tensor(1)
>>> torch.tensor([1, 2.3]).dtype   type inferece
torch.float32
>>> torch.tensor([1, 2]).dtype     type inferece
torch.int64


We've also added more tensor creation methods. Some of them have ``torch.*_like`` and/or ``tensor.new_*`` variants.

1. ``torch.*_like`` takes in an input ``Tensor`` instead of a shape. It returns a ``Tensor`` with same attributes as the input ``Tensor`` by default unless otherwise specified:

python
>>> x = torch.randn(3, dtype=torch.float64)
>>> torch.zeros_like(x)
tensor([ 0.,  0.,  0.], dtype=torch.float64)
>>> torch.zeros_like(x, dtype=torch.int)
tensor([ 0,  0,  0], dtype=torch.int32)


2. ``tensor.new_*`` can also create ``Tensor``s with same attributes as ``tensor``, but it always takes in a shape argument:

python
>>> x = torch.randn(3, dtype=torch.float64)
>>> x.new_ones(2)
tensor([ 1.,  1.], dtype=torch.float64)
>>> x.new_ones(4, dtype=torch.int)
tensor([ 1,  1,  1,  1], dtype=torch.int32)


To specify the desired shape, you can either use a tuple (e.g., ``torch.zeros((2, 3))``) or variable arguments (e.g., ``torch.zeros(2, 3)``) in most cases.

| Name                                                       | Returned ``Tensor``                                       | ``torch.*_like`` variant | ``tensor.new_*`` variant |
|:-----------------------------------------------------------|-----------------------------------------------------------|--------------------------|--------------------------|
| [``torch.empty``](http://pytorch.org/docs/0.4.0/torch.htmltorch.empty)                                            | unintialized memory                                       | ✔                        | ✔                        |
| [``torch.zeros``](http://pytorch.org/docs/0.4.0/torch.htmltorch.zeros)                                            | all zeros                                                 | ✔                        | ✔                        |
| [``torch.ones``](http://pytorch.org/docs/0.4.0/torch.htmltorch.ones)                                             | all ones                                                  | ✔                        | ✔                        |
| [``torch.full``](http://pytorch.org/docs/0.4.0/torch.htmltorch.full)                                             | filled with a given value                                 | ✔                        | ✔                        |
| [``torch.rand``](http://pytorch.org/docs/0.4.0/torch.htmltorch.rand)                                             | i.i.d. continuous ``Uniform[0, 1)``                       | ✔                        |                          |
| [``torch.randn``](http://pytorch.org/docs/0.4.0/torch.htmltorch.randn)                                            | i.i.d. ``Normal(0, 1)``                                   | ✔                        |                          |
| [``torch.randint``](http://pytorch.org/docs/0.4.0/torch.htmltorch.randint)                                          | i.i.d. discrete Uniform in given range                    | ✔                        |                          |
| [``torch.randperm``](http://pytorch.org/docs/0.4.0/torch.htmltorch.randperm)                                         | random permutation of ``{0, 1, ..., n - 1}``              |                          |                          |
| [``torch.tensor``](http://pytorch.org/docs/0.4.0/torch.htmltorch.tensor)                                           | copied from existing data (`list`, NumPy `ndarray`, etc.) |                          | ✔                        |
| [``torch.from_numpy``](http://pytorch.org/docs/0.4.0/torch.htmltorch.from_numpy)*                                      | from NumPy ``ndarray`` (sharing storage without copying)  |                          |                          |
| [``torch.arange``](http://pytorch.org/docs/0.4.0/torch.htmltorch.arange), <br>[``torch.range``](http://pytorch.org/docs/0.4.0/torch.htmltorch.range), and <br>[``torch.linspace``](http://pytorch.org/docs/0.4.0/torch.htmltorch.linspace)  | uniformly spaced values in a given range                  |                          |                          |
| [``torch.logspace``](http://pytorch.org/docs/0.4.0/torch.htmltorch.logspace)                                         | logarithmically spaced values in a given range            |                          |                          |
| [``torch.eye``](http://pytorch.org/docs/0.4.0/torch.htmltorch.eye)                                              | identity matrix                                           |                          |                          |

*: [``torch.from_numpy``](http://pytorch.org/docs/0.4.0/torch.htmltorch.from_numpy) only takes in a NumPy ``ndarray`` as its input argument.



Writing device-agnostic code

Previous versions of PyTorch made it difficult to write code that was device agnostic (i.e. that could run on both CUDA-enabled and CPU-only machines without modification).

PyTorch 0.4.0 makes this easier in two ways:
* The `device` attribute of a Tensor gives the [``torch.device``](http://pytorch.org/docs/0.4.0/tensor_attributes.htmltorch.torch.device) for all Tensors (`get_device` only works for CUDA tensors)
* The `to` method of ``Tensors`` and ``Modules`` can be used to easily move objects to different devices (instead of having to call `cpu()` or `cuda()` based on the context)


We recommend the following pattern:
python
at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

then whenever you get a new Tensor or Module
this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)


Tensors

Full support for Advanced indexing

PyTorch now has full support for advanced indexing, following numpy's advanced indexing rules. The following examples are now possible:

python
a = torch.rand(10, 10, 10, 10)

the indexing elements can have other shapes than 1
b = a[[[3, 2]], :, [[1, 3]]]

broadcasting also supported in the indices, as well as lists,
negative indices, slices, elipses, numbers
c = a[[1, -2], 2:4, :, [1]]

can also support tensors as indices
index = torch.tensor([2, 4])
d = a[index]

and the indices can be on the GPU
or CPU
e = a[index.cuda()]
f = a.cuda()[index]

1.3.1

Significant Fixes

[Type Promotion](https://pytorch.org/docs/stable/tensor_attributes.htmltorch-dtype): fixed a bug where type promotion, combined with non-contiguous tensors could compute incorrect results.  ([28253](https://github.com/pytorch/pytorch/pull/28253/))

<p align="center">
<table align="center">
<tr><th>Version 1.3.0</th><th>Version 1.3.1</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> a = torch.tensor([[True,  True],
[False, True]])
get a non-contiguous tensor
>>> a_transpose = a.t()
type promote by comparing across dtypes (bool -> long)
>>> a_transpose == 0
POTENTIALLY INCORRECT VALUES
</pre></sub></td>
<td><sub><pre lang="python">
>>> a = torch.tensor([[True,  True],
[False, True]])
get a non-contiguous tensor
>>> a_transpose = a.t()
type promote by comparing across dtypes (bool -> long)
>>> a_transpose == 0
tensor([[False,  True],
[False, False]])
</pre></sub></td>
</tr>
</table>
</p>

[Type Promotion](https://pytorch.org/docs/stable/tensor_attributes.htmltorch-dtype) / Indexing: Fixed a Bug that Allowed Mixed-Dtype Indexing and assignment could lead to incorrect results.  Mixed dtype operations of this form are currently disabled, as they were in 1.2.  ([28231](https://github.com/pytorch/pytorch/pull/28231))

<p align="center">
<table align="center">
<tr><th>Version 1.3.0</th><th>Version 1.3.1</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> a = torch.ones(5, 2, dtype=torch.float)
>>> b = torch.zeros(5, dtype=torch.long)
>>> a[:, [1]] = b.unsqueeze(-1)
>>> a
POTENTIALLY INCORRECT VALUES
</pre></sub></td>
<td><sub><pre lang="python">
>>> a = torch.ones(5, 2, dtype=torch.float)
>>> b = torch.zeros(5, dtype=torch.long)
>>> a[:, [1]] = b.unsqueeze(-1)
RuntimeError: expected dtype Float but got dtype Long
</pre></sub></td>
</tr>
</table>
</p>

[torch.where(condition, x, y)](https://pytorch.org/docs/stable/torch.htmltorch.where): fixed a bug on CPU where incorrect results could be returned if `x` and `y` were of different dtypes.  Mixed dtype operations of this form are currently disabled, as they were in version 1.2.  ([29078](https://github.com/pytorch/pytorch/pull/29078))

<p align="center">
<table align="center">
<tr><th>Version 1.3.0</th><th>Version 1.3.1</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> x = torch.randn(2, 3)
>>> y = torch.randint(0, 10, (2, 3))
>>> torch.where(x < 0, x, y)
tensor(...)
POTENTIALLY INCORRECT VALUES
</pre></sub></td>
<td><sub><pre lang="python">
>>> x = torch.randn(2, 3)
>>> y = torch.randint(0, 10, (2, 3))
>>> torch.where(x < 0, x, y)
RuntimeError: expected scalar type Float but found Long
</pre></sub></td>
</tr>
</table>
</p>

Other Fixes

* `torch.argmax`: fix regression on CUDA that disabled support for `torch.float16` inputs.  ([28915](https://github.com/pytorch/pytorch/pull/28915/))
* NamedTensor: fix Python refcounting bug with `Tensor.names`.  ([28922](https://github.com/pytorch/pytorch/pull/28922))
* Quantization: support `deepcopy` for quantized tensors.  ([28612](https://github.com/pytorch/pytorch/pull/28612))
* Quantization: support `nn.quantized.ReLU` with `inplace=True`.  ([28710](https://github.com/pytorch/pytorch/pull/28710))
* Documentation: `torch.lgamma` and `torch.polygamma` are now documented.  ([28964](https://github.com/pytorch/pytorch/pull/28964))

1.3.0

Table of Contents

- Breaking Changes
- Highlights
* [Experimental]: Mobile Support
* [Experimental]: Named Tensor Support
* [Experimental]: Quantization support
* Type Promotion
* Deprecations
- New Features
* TensorBoard: 3D Mesh and Hyperparameter Support
* Distributed
* Libtorch Binaries with C++11 ABI
* New TorchScript features
- Improvements
* C++ Frontend Improvements
+ Autograd
+ New torch::nn modules
+ New torch::nn::functional functions
+ tensor Construction API
+ Other C++ Improvements
* Distributed Improvements
* Performance Improvements
* JIT Improvements
* ONNX Exporter Improvements
+ Adding Support for ONNX IR v4
+ Adding Support for ONNX Opset 11
+ Exporting More Torch Operators/Models to ONNX
+ Enhancing ONNX Export Infra
* Other Improvements
- Bug Fixes
+ TensorBoard Bug Fixes
+ C++ API Bug fixes
+ JIT
+ Other Bug Fixes
- Documentation Updates
+ Distributed
+ JIT
+ Other documentation improvements

Breaking Changes

Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions.  ([22273](https://github.com/pytorch/pytorch/pull/22273), [26981](https://github.com/pytorch/pytorch/pull/26981))

Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.

In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see [full documentation](https://pytorch.org/docs/master/tensor_attributes.htmltorch-dtype)).  These rules generally will retain precision and be less surprising to users.

<p align="center">
<table align="center">
<tr><th>Version 1.2</th><th>Version 1.3</th></tr>
<tr valign="top">
<td><sub><pre lang="python">

1.2

class MyModule(torch.nn.Module):
...

Construct an nn.Module instance
module = MyModule(args)

Pass it to `torch.jit.script` to compile it into a ScriptModule.
my_torchscript_module = torch.jit.script(module)


`torch.jit.script()` will attempt to recursively compile the given `nn.Module`, including any submodules or methods called from `forward()`. See the [migration guide](https://pytorch.org/docs/master/jit.htmlmigrating-to-pytorch-1-2-recursive-scripting-api) for more info on what's changed and how to migrate.

[JIT] Improved TorchScript Python language coverage

In 1.2, TorchScript has significantly improved its support for Python language constructs and Python's standard library. Highlights include:

* Early returns, breaks and continues.
* Iterator-based constructs, like `for..in` loops, `zip()`, and `enumerate()`.
* `NamedTuples`.
* `math` and `string` library support.
* Support for most Python builtin functions.

See the detailed notes below for more information.

Expanded Onnx Export

In PyTorch 1.2, working with Microsoft, we’ve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). We’ve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:

* Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
* Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
* More than a dozen additional PyTorch operators supported including the ability to export a custom operator.

Updated docs can be found [here](https://pytorch.org/docs/stable/onnx.html) and also a refreshed tutorial using ONNXRuntime can be found [here](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html).

Tensorboard is no Longer Considered Experimental

Read the [documentation](https://pytorch.org/docs/stable/tensorboard.html) or simply type **`from`**` torch.utils.tensorboard `**`import`**` SummaryWriter` to get started!

NN.Transformer

We include a standard [nn.Transformer](https://pytorch.org/docs/stable/nn.html?highlight=transformertorch.nn.Transformer) module, based on the paper “[_Attention is All You Need_](https://arxiv.org/abs/1706.03762)”.  The `nn.Transformer` module relies entirely on an [attention mechanism](https://pytorch.org/docs/stable/nn.html?highlight=nn%20multiheadattentiontorch.nn.MultiheadAttention) to draw global dependencies between input and output.  The individual components of the `nn.Transformer` module are designed so they can be adopted independently.  For example, the [nn.TransformerEncoder](https://pytorch.org/docs/stable/nn.html?highlight=nn%20transformerencodertorch.nn.TransformerEncoder) can be used by itself, without the larger `nn.Transformer`. New APIs include:

* `nn.Transformer`
* `nn.TransformerEncoder` and `nn.TransformerEncoderLayer`
* `nn.TransformerDecoder` and `nn.TransformerDecoderLayer`

See the [Transformer Layers](https://pytorch.org/docs/stable/nn.htmltransformer-layers) documentation for more info.

Breaking Changes

Comparison operations (`lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=)` ) return dtype has changed from `torch.uint8` to `torch.bool` ([21113](https://github.com/pytorch/pytorch/pull/21113))

*Version 1.1:*


>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([1, 0, 0], dtype=torch.uint8)


*Version 1.2:*


>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([True, False, False])



For most programs, we don't expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.

**Mask Inversion**

In prior versions of PyTorch, the idiomatic way to invert a mask was to call `1 - mask`.  This behavior is no longer supported; use the `~` or `bitwise_not()` operator instead.

*Version 1.1*:


>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([0, 1, 1], dtype=torch.uint8)


*Version 1.2:*


>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

>>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([False,  True,  True])


**sum(Tensor) (python built-in) does not upcast `dtype` like `torch.sum`**

Python's built-in `sum` returns results in the same `dtype` as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the `dtype` of the tensor.

*Version 1.1*:


value can be represented in result dtype
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(3, dtype=torch.uint8)

value can NOT be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(44, dtype=torch.uint8)

torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)


*Version 1.2:*


value cannot be represented in result dtype (now torch.bool)
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(True)

value cannot be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(True)

torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)
tensor(300)


**TLDR**: use `torch.sum` instead of the built-in `sum`.  Note that the built-in `sum()` behavior will more closely resemble `torch.sum` in the next release.

Note also that masking via `torch.uint8` Tensors is now deprecated, see the **Deprecations** section for more information.


`__invert__` / `~`: now calls `torch.bitwise_not` instead of `1 - tensor` and is supported for all integral+Boolean dtypes instead of only `torch.uint8`.  ([22326](https://github.com/pytorch/pytorch/pull/22326))

*Version 1.1*:


>>> ~torch.arange(8, dtype=torch.uint8)
tensor([ 1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)


*Version 1.2*:


>>> ~torch.arange(8, dtype=torch.uint8)
tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)




`torch.tensor(bool)` and `torch.as_tensor(bool)` now infer `torch.bool` dtype instead of `torch.uint8`.  ([19097](https://github.com/pytorch/pytorch/pull/19097))

*Version 1.1:*


>>> torch.tensor([True, False])
tensor([1, 0], dtype=torch.uint8)


*Version 1.2:*


>>> torch.tensor([True, False])
tensor([ True, False])




`nn.BatchNorm{1,2,3}D`: gamma (`weight`) is now initialized to all 1s rather than randomly initialized from *U(0, 1)*.  ([13774](https://github.com/pytorch/pytorch/pull/13774))

*Version 1.1:*


>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:

1.2.0

We have just released PyTorch v1.2.0.

It has over 1,900 commits and contains a significant amount of effort in areas spanning JIT, ONNX, Distributed, as well as Performance and Eager Frontend Improvements.

Highlights

[JIT] New TorchScript API

1.1.0

Note: CUDA 8.0 is no longer supported

Highlights

TensorBoard (currently experimental)

First-class and native support for visualization and model debugging with [TensorBoard](https://www.tensorflow.org/tensorboard), a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple `from torch.utils.tensorboard import SummaryWriter` command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs [here](https://pytorch.org/docs/stable/tensorboard.html).

![](https://github.com/gchanan/pytorch/raw/tensorboard_screenshot/Screen%20Shot%202019-04-25%20at%204.53.42%20PM.png)

[JIT] Attributes in ScriptModules
Attributes can be assigned on a `ScriptModule` by wrapping them with `torch.jit.Attribute` and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call `torch.jit.save()`, so they are a great way to store arbitrary state in your model. See [the docs](https://pytorch.org/docs/master/jit.htmlmodule-attributes) for more info.

Example:

class Foo(torch.jit.ScriptModule):
def __init__(self, a_dict):
super(Foo, self).__init__(False)
self.words = torch.jit.Attribute([], List[str])
self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])

torch.jit.script_method
def forward(self, input: str) -> int:
self.words.append(input)
return self.some_dict[input]


[JIT] Dictionary and List Support in TorchScript
TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and `for…in` constructs.

[JIT] User-defined classes in TorchScript (experimental)
For more complex stateful operations, TorchScript now supports annotating a class with `torch.jit.script`. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See [the docs](https://pytorch.org/docs/master/jit.htmluser-defined-types) for more info.

torch.jit.script
class Pair:
def __init__(self, first, second)
self.first = first
self.second = second

def sum(self):
return self.first + self.second



DistributedDataParallel new functionality and tutorials

`nn.parallel.DistributedDataParallel`: can now wrap multi-GPU modules, which enables use cases such as model parallel ([tutorial](https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html)) on one server and data parallel ([tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)) across servers.
([19271](https://github.com/pytorch/pytorch/pull/19271)).

Breaking Changes
* `Tensor.set_`: the `device` of a Tensor can no longer be changed via `Tensor.set_`.  This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in a `Storage` on a different CUDA device.  Instead, set up the Tensor on the correct device from the beginning.  ([18832](https://github.com/pytorch/pytorch/pull/18832)).
* Pay attention to the order change of `lr_scheduler.step()`. ([7889](https://github.com/pytorch/pytorch/pull/7889)).
* `torch.unique`: changed the default value of `sorted` to `True`.  ([15379](https://github.com/pytorch/pytorch/pull/15379)).
* **[JIT]** Rename isTensor api -> isCompleteTensor. [18437](https://github.com/pytorch/pytorch/pull/18437)
* **[JIT]** Remove GraphExecutor's python bindings. [19141](https://github.com/pytorch/pytorch/pull/19141)
* **[C++]**: many methods on `Type` no longer exist; use the functional or Tensor method equivalent.  ([17991](https://github.com/pytorch/pytorch/pull/17991)).
* **[C++]**: the `Backend` constructor of `TensorOptions` no longer exists.  ([18137](https://github.com/pytorch/pytorch/pull/18137)).
* **[C++, Distributed]**: Remove c10d `ProcessGroup::getGroupRank` has been removed.  ([19147](https://github.com/pytorch/pytorch/pull/19147)).


New Features

Operators
* `torch.tril_indices`, `torch.triu_indices`: added operator with same behavior as NumPy.  ([14904](https://github.com/pytorch/pytorch/pull/14904), [15203](https://github.com/pytorch/pytorch/pull/15203)).
* `torch.combinations`, `torch.cartesian_prod`: added new `itertools`-like operators.  ([9393](https://github.com/pytorch/pytorch/pull/9393)).
* `torch.repeat_interleave`: new operator similar to `numpy.repeat`.  ([18395](https://github.com/pytorch/pytorch/pull/18395)).
* `torch.from_file`: new operator similar to `Storage.from_file`, but returning a tensor.  ([18688](https://github.com/pytorch/pytorch/pull/18688)).
* `torch.unique_consecutive`: new operator with semantics similar to `std::unique` in C++.  ([19060](https://github.com/pytorch/pytorch/pull/19060)).
* `torch.tril`, `torch.triu`, `torch.trtrs`: now support batching.  ([15257](https://github.com/pytorch/pytorch/pull/15257), [18025](https://github.com/pytorch/pytorch/pull/18025)).
* `torch.gather`: add support for `sparse_grad` option.  ([17182](https://github.com/pytorch/pytorch/pull/17182)).
* `torch.std`, `torch.max_values`, `torch.min_values`, `torch.logsumexp` can now operate over multiple dimensions at once.  ([14535](https://github.com/pytorch/pytorch/pull/14535), [15892](https://github.com/pytorch/pytorch/pull/15892), [16475](https://github.com/pytorch/pytorch/pull/16475)).
* `torch.cdist`: added operator equivalent to `scipy.spatial.distance.cdist`.  ([16168](https://github.com/pytorch/pytorch/pull/16168), [17173](https://github.com/pytorch/pytorch/pull/17173)).
* `torch.__config__.show()`: reports detailed version of all libraries.  ([18579](https://github.com/pytorch/pytorch/pull/18579)).

NN
* `nn.MultiheadedAttention`: new module implementing MultiheadedAttention from `Attention Is All You Need`.  ([18334](https://github.com/pytorch/pytorch/pull/18334)).
* `nn.functional.interpolate`: added support for `bicubic`.  ([9849](https://github.com/pytorch/pytorch/pull/9849)).
* `nn.SyncBatchNorm`: support synchronous Batch Normalization.  ([14267](https://github.com/pytorch/pytorch/pull/14267)).
* `nn.Conv`: added support for Circular Padding via `mode='circular'`.  ([17240](https://github.com/pytorch/pytorch/pull/17240)).
* `nn.EmbeddingBag`: now supports trainable `per_sample_weights.  ([18799](https://github.com/pytorch/pytorch/pull/18799)).
* `nn.EmbeddingBag`: add support for `from_pretrained` method, as in `nn.Embedding`.  ([15273](https://github.com/pytorch/pytorch/pull/15273)).
* `RNNs`: automatically handle unsorted variable-length sequences via `enforce_sorted`.  ([15225](https://github.com/pytorch/pytorch/pull/15225)).
* `nn.Identity`: new module for easier model surgery.  ([19249](https://github.com/pytorch/pytorch/pull/19249)).

Tensors / dtypes
* `torch.bool`: added support for `torch.bool` dtype and Tensors with that dtype (1-byte storage).  NumPy conversion is supported, but operations are currently limited.  ([16810](https://github.com/pytorch/pytorch/pull/16810)).

Optim
* `optim.lr_scheduler.CyclicLR`: Support for Cyclical Learning Rate and Momentum.  ([18001](https://github.com/pytorch/pytorch/pull/18001)).
* `optim.lr_scheduler.CosineAnnealingWarmRestarts`: new scheduler: Stochastic Gradient Descent with Warm Restarts).  ([17226](https://github.com/pytorch/pytorch/pull/17226)).
* Support multiple simultaneous LR schedulers.  ([14010](https://github.com/pytorch/pytorch/pull/14010))


Distributions
* `torch.distributions`: now support multiple inheritance.  ([16772](https://github.com/pytorch/pytorch/pull/16772)).

Samplers
* `quasirandom.SobolEngine`: new sampler.  ([10505](https://github.com/pytorch/pytorch/pull/10505)).

DistributedDataParallel
* `nn.parallel.DistributedDataParallel`: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). ([18251](https://github.com/pytorch/pytorch/pull/18251), [18953](https://github.com/pytorch/pytorch/pull/18953)).

TorchScript and Tracer
* Allow early returns from if-statements. ([154463](https://github.com/pytorch/pytorch/pull/15463))
* Add an `ignore` annotation, which statically tells the TorchScript compiler to ignore the Python function. ([16055](https://github.com/pytorch/pytorch/pull/16055))
* Simple `for...in`  loops on lists. ([16726](https://github.com/pytorch/pytorch/pull/16726))
* Ellipses (`...`) in Tensor indexing. ([17763](https://github.com/pytorch/pytorch/pull/17763))
* `None` in Tensor indexing. ([18615](https://github.com/pytorch/pytorch/pull/18615))
* Support for basic list comprehensions. ([17267](https://github.com/pytorch/pytorch/pull/17267))
* Add implicit unwrapping of optionals on `if foo is not None`. ([15587](https://github.com/pytorch/pytorch/pull/15587))
* Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. ([18755](https://github.com/pytorch/pytorch/pull/18755)).
* Implement `to()`, `cpu()`, and `cuda()` on ScriptModules. ([15340](https://github.com/pytorch/pytorch/pull/15340) ,  [15904](https://github.com/pytorch/pytorch/pull/15904))
* Add support for various methods on lists: ([`clear()`](https://github.com/pytorch/pytorch/pull/17050), [`pop()`](https://github.com/pytorch/pytorch/pull/17015), [`reverse()`](https://github.com/pytorch/pytorch/pull/17001), [`copy()`](https://github.com/pytorch/pytorch/pull/17092) ,  [`extend()`](https://github.com/pytorch/pytorch/pull/17092),[`index()`](https://github.com/pytorch/pytorch/pull/17446), [`count()`](https://github.com/pytorch/pytorch/pull/17446), [`insert()`](https://github.com/pytorch/pytorch/pull/17200), [`remove()`](https://github.com/pytorch/pytorch/pull/17200) ).
* Add support for `sort()` on lists of specialized type (`Tensors`, `int`, `float`, `bool`). ([19572](https://github.com/pytorch/pytorch/pull/19572))
* Add support for various methods on strings: ([`index()`](https://github.com/pytorch/pytorch/pull/18247), [`slice()`](https://github.com/pytorch/pytorch/pull/18247), [`len()`](https://github.com/pytorch/pytorch/pull/19320))
* Support `Tensor.to()` in TorchScript. ( [15976](https://github.com/pytorch/pytorch/pull/15976) )
* Support for `Torch.tensor()` in TorchScript. ([14913](https://github.com/pytorch/pytorch/pull/14913),  [19445](https://github.com/pytorch/pytorch/pull/19445))
* Support for `torch.manual_seed()` in TorchScript. ([19510](https://github.com/pytorch/pytorch/pull/19510))
* Support for `nn.LSTM` in TorchScript. ([15744](https://github.com/pytorch/pytorch/pull/15744))
* Support for `nn.init` in TorchScript. ([19640](https://github.com/pytorch/pytorch/pull/19640))
* Add `hash()` builtin. ([18258](https://github.com/pytorch/pytorch/pull/18258))
* Add `min()` and `max()` builtins for numerical types. ([15680](https://github.com/pytorch/pytorch/pull/15680))
* Add `isinstance()` builtin, which performs a static type check. ([15076](https://github.com/pytorch/pytorch/pull/15076))
* Add `train()` / `eval()` / `is_training()` to C++ ScriptModule API. ([16044](https://github.com/pytorch/pytorch/pull/16044))
* Allow List arguments to Python functions called from TorchScript. ([15721](https://github.com/pytorch/pytorch/pull/19086))
* Allow using `std::vector` and `std::unordered_map` as arguments to custom operators. ([17587](https://github.com/pytorch/pytorch/pull/17587))
* Tracer: now allows passing static dicts and lists as trace inputs. ([18092](https://github.com/pytorch/pytorch/pull/18092), [19580](https://github.com/pytorch/pytorch/pull/19580))
* Allow generic containers as ScriptModule inputs. ([16482](https://github.com/pytorch/pytorch/pull/16482))
* Allow `nn.Sequential` in ModuleList. ([16882](https://github.com/pytorch/pytorch/pull/16882))

Experimental Features
* [Quantization] **(API unstable)**: added limited support for quantized datatypes via `torch.qint8` dtype, `torch.quantize_linear` conversion function.  ([18230](https://github.com/pytorch/pytorch/pull/18230)).
* [MKLDNN tensor] **(API unstable)**: Added limited (opaque) support for `MKLDNN` tensors via `Tensor.to_mkldnn()`; operators are currently limited to ResNext101 operators.  ([17748](https://github.com/pytorch/pytorch/pull/17748)).

Improvements

* `torch.min`, `torch.max`, `torch.median`, `torch.mode`, `torch.kthvalue`, `torch.symeig`, `torch.eig`, `torch.pstrf`, `torch.qr`, `torch.geqrf`, `torch.solve`, `torch.slogdet`, `torch.sort`, `torch.topk`, `torch.gels`, `torch.triangular_solve`, `torch.svd` now return namedtuples describing their outputs. ([16186](https://github.com/pytorch/pytorch/pull/16186), [16950](https://github.com/pytorch/pytorch/pull/16950), [17093](https://github.com/pytorch/pytorch/pull/17093), [17195](https://github.com/pytorch/pytorch/pull/17195), [15429](https://github.com/pytorch/pytorch/pull/15429)).
* `torch.empty` (and other factory functions): now take a `pin_memory` kwarg; can now pin without going through `torch.Storage` interface..  ([18455](https://github.com/pytorch/pytorch/pull/18455)).
* `torch.histc`: Now supported on CUDA.  ([15842](https://github.com/pytorch/pytorch/pull/15842))
* `torch.unique`: Add `return_counts`.  ([18391](https://github.com/pytorch/pytorch/pull/18391), [18651](https://github.com/pytorch/pytorch/pull/18651)).
* `torch.logspace`: add the ability to specify a `base`.  ([19542](https://github.com/pytorch/pytorch/pull/19542)).
* `torch.set_printoptions`: added scientific notation support.  ([16876](https://github.com/pytorch/pytorch/pull/16876)).
* `torch.btrifact` now handles tensors with greater than 3 dimensions.  ([14964](https://github.com/pytorch/pytorch/pull/14964)).
* `torch.kthvalue`: now supported on CUDA.  ([17544](https://github.com/pytorch/pytorch/pull/17544)).
* `torch.abs`: now supported on `uint8` and `int8` dtypes.  ([16893](https://github.com/pytorch/pytorch/pull/16893)).
* `torch.stack`, `torch.cat`: now supported for CPU half tensors.  ([16389](https://github.com/pytorch/pytorch/pull/16389)).
* `torch.cross`: added support for negative dimensions. ([17582](https://github.com/pytorch/pytorch/pull/17582)).
* `torch.lerp`: add support for `weight` as a Tensor.  ([17348](https://github.com/pytorch/pytorch/pull/17348)).
* `torch.transpose`: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is.  ([17462](https://github.com/pytorch/pytorch/pull/17462), [17535](https://github.com/pytorch/pytorch/pull/17535)).
* `torch.linspace`, `torch.logspace` can now be used with `steps=1` and `start != end`.  ([14748](https://github.com/pytorch/pytorch/pull/14748)).
* `torch.cholesky`: changed the derivative from a triangular matrix to symmetric matrix.  ([19116](https://github.com/pytorch/pytorch/pull/19116)).
* `torch.lerp`: Improved numerical stability.  ([18871](https://github.com/pytorch/pytorch/pull/18871)).
* `torch.logdet`, `torch.slogdet`: improve numerical precision.  ([18449](https://github.com/pytorch/pytorch/pull/18449)).
* `Tensor.__contains__` is now supported. ([17733](https://github.com/pytorch/pytorch/pull/17733)).
* `Tensor.fill_` and `torch.zeros` now support half on CPU.  ([17536](https://github.com/pytorch/pytorch/pull/17536)).
* `Tensor.resize_as_`, `Tensor.view`: now supported on half CPU tensors.  ([18821](https://github.com/pytorch/pytorch/pull/18821)).
* `Tensor indexing`: allow indexing via NumPy booleans.  ([14932](https://github.com/pytorch/pytorch/pull/14932)).
* `nn.EmbeddingBag`: enable half precision dense backward.  ([19293](https://github.com/pytorch/pytorch/pull/19293)).
* `nn.Embedding`: fix dense Embedding to work with double backwards.  ([9078](https://github.com/pytorch/pytorch/pull/9078)).
* `nn.MaxPool1d`: Allow list and tuples to be passed as `output_size`.  ([16489](https://github.com/pytorch/pytorch/pull/16489)).
* `nn.CTCLoss`:  support zeroing infinite losses via `zero_infinity` argument.  ([16199](https://github.com/pytorch/pytorch/pull/16199)).
* `nn.Dropout`: add support for enabling during eval.  ([17549](https://github.com/pytorch/pytorch/pull/17549)).
* `nn.MSELoss`: add warning about unexpected broadcasting.  ([18349](https://github.com/pytorch/pytorch/pull/18349)).
* `nn.Module.load_state_dict`: also return `missing_keys` and `unexpected_keys`.  ([18668](https://github.com/pytorch/pytorch/pull/18668)).
* `nn.parallel.data_parallel`: Enforce devices match `device_ids`.  ([17129](https://github.com/pytorch/pytorch/pull/17129)).
* `torch.device`: handle in more places that used to accept only device ordinals.  ([14929](https://github.com/pytorch/pytorch/pull/14929))
* `dtype.int8` tensors can now be converted to NumPy arrays.  ([14710](https://github.com/pytorch/pytorch/pull/14710)).
* `nn.functional.gumbel_softmax`: allow multidimensional input with `dim` argument.  ([13339](https://github.com/pytorch/pytorch/pull/13339)).
* `nn.functional.cosine_similarity`: improved precision.  ([18250](https://github.com/pytorch/pytorch/pull/18250)).
* `torch.autograd`: Don't keep unnecessary saved_inputs alive, increasing memory efficiency.  ([16583](https://github.com/pytorch/pytorch/pull/16583)).
* `torch.autograd.profiler`: add Self (non-nested) CPU Time Total, CPU time total ([19378](https://github.com/pytorch/pytorch/pull/19378)).
* `DataLoader`: support accepting a custom memory pinning function.  ([16743](https://github.com/pytorch/pytorch/pull/16743)).
* `DataLoader`: retry libshm on EINTR.  ([15964](https://github.com/pytorch/pytorch/pull/15964)).
* `DataLoader`: fixed an issue with `pin_memory` and `PackedSequence`.  ([18079](https://github.com/pytorch/pytorch/pull/18079))
* `data.utils.collate`, `data.utils.pin_memory`: now preserve namedtuples.  ([16440](https://github.com/pytorch/pytorch/pull/16440))
* Use `IndexError` instead of `RuntimeError` on many indexing error cases.  ([17049](https://github.com/pytorch/pytorch/pull/17049), [17114](https://github.com/pytorch/pytorch/pull/17114)).
* Support indexing a `torch.float16` tensor on CPU.  ([17645](https://github.com/pytorch/pytorch/pull/17645)).
* Add (limited) error checking in case of internal overlap on inplace operators.  ([19317](https://github.com/pytorch/pytorch/pull/19317), [17927](https://github.com/pytorch/pytorch/pull/17927)).
* `utils.checkpoint.checkpoint`: support `None` as an argument to checkpoint function.  ([17969](https://github.com/pytorch/pytorch/pull/17969)).
* `torch.autograd`: added more information for `one of the variables needed for gradient computation has been modified by an inplace operation` exception.  ([18523](https://github.com/pytorch/pytorch/pull/18523)).
* `cuda.synchronize`: add a device argument.  ([19573](https://github.com/pytorch/pytorch/pull/19573)).
* `cuda.reset_max_memory_*`: now supported.  ([15985](https://github.com/pytorch/pytorch/pull/15985)).
* `distributions.Independent`:  can now calculate KL Divergence.  ([17681](https://github.com/pytorch/pytorch/pull/17681)).
* `torch.distributed.new_group`: now supports overriding default backend. ([18595](https://github.com/pytorch/pytorch/pull/18595)).
* `torch.distributed.init_process_group`: will now propagate timeout to underlying Store. ([16571](https://github.com/pytorch/pytorch/pull/16571)).
* **[JIT]** Preserve module hierarchy on traced modules. ([15101](https://github.com/pytorch/pytorch/pull/15101))
* **[JIT]** Add metadata for TracedModules. ([17311](https://github.com/pytorch/pytorch/pull/17311))
* **[JIT]** Improve portability of int and float checks. ([19532](https://github.com/pytorch/pytorch/pull/19532))
* **[JIT]** Preserve method parameter names during serialization. ([16750](https://github.com/pytorch/pytorch/pull/16750))
* **[JIT]** Add a correctness check for C++ types to custom operators. ([15247](https://github.com/pytorch/pytorch/pull/15247))
* **[JIT]** Added a few extra python bindings to help with walking the IR graph from Python. [17822](https://github.com/pytorch/pytorch/pull/17822)
* **[JIT Error Messages]** Print out operator suggestions for "unknown builtin op" error. ([15183](https://github.com/pytorch/pytorch/pull/15183))
* **[JIT Error Messages]** Better error message when creating a module instance in TorchScript. ([16416](https://github.com/pytorch/pytorch/pull/16416))
* **[JIT Error Messages]** Print suggestion to add `nn.Module` attributes to `__constants__` when they are using in TorchScript. ([18164](https://github.com/pytorch/pytorch/pull/18164))
* **[JIT Error Messages]** `torch.save()`: Improve error message when you try to save a ScriptModule. ([15321](https://github.com/pytorch/pytorch/pull/15321))
* **[JIT Error Messages]** `torch.jit.save()`: Improve error message when trying to save a model with Python code. ([16850](https://github.com/pytorch/pytorch/pull/16850))
* **[JIT Error Messages]** Better errors when trying to close over a Tensor with grad enabled while tracing. ([18298](https://github.com/pytorch/pytorch/pull/18298), [19645](https://github.com/pytorch/pytorch/pull/19645))
* **[JIT Error Messages]** Better error when trying to add a Tensor to `__constants__`. ([16724](https://github.com/pytorch/pytorch/pull/16724))
* **[JIT Error Messages]** Better error when a module list isn't added to `__constants__`. ([17167](https://github.com/pytorch/pytorch/pull/17167))
* **[JIT Error Messages]** Add a warning when attempting to trace legacy constructors. ([16770](https://github.com/pytorch/pytorch/pull/16770))
* **[JIT Error Messages]** Improve hint when trying to trace non-deterministic nodes. ([17957](https://github.com/pytorch/pytorch/pull/17957))
* **[C++]** `nn::Module`: added Python interop.  ([13481](https://github.com/pytorch/pytorch/pull/13481)).
* **[C++]** `autograd::profiler`: is now supported.  ([16580](https://github.com/pytorch/pytorch/pull/16580))
* **[C++]** allow detection of C++ ABI flag for cpp extensions from available runtime information.  ([18994](https://github.com/pytorch/pytorch/pull/18994)).
* **[C++]** `torch.argsort` is now supported in C++.  ([17099](https://github.com/pytorch/pytorch/pull/17099)).
* **[C++]** `Tensor.isnan`: now supported in C++.  ([15722](https://github.com/pytorch/pytorch/pull/15722)).
* **[C++]**: Added named submodule support to `nn::Sequential`.  ([17552](https://github.com/pytorch/pytorch/pull/17552)).
* **[C++]**: Kaiming Initialization.  ([14718](https://github.com/pytorch/pytorch/pull/14718)).
* **[C++]** `torch::data::transforms::Normalize`: now supported in C++.  ([15891](https://github.com/pytorch/pytorch/pull/15891)).
* **[C++]**: Support call operator on module holder calling forward.  ([15831](https://github.com/pytorch/pytorch/pull/15831)).
Random and Sequential distributed samplers.  ([16910](https://github.com/pytorch/pytorch/pull/16910)).
* **[C++]**: pretty printing of C++ Modules.  ([15326](https://github.com/pytorch/pytorch/pull/15326)).
* **[C++]** Support serializing `std::vector<torch::Tensor>`.  ([19677](https://github.com/pytorch/pytorch/pull/19677)).

Bug Fixes

Serious
* `torch.prod`: correct erroneous calculation on large tensors.  ([15653](https://github.com/pytorch/pytorch/pull/15653)).
* `torch.mean` (and other reductions): fix incorrect calculation on CUDA on large inputs.  ([16023](https://github.com/pytorch/pytorch/pull/16023)).
* `nn.Conv`: correctly handle non-contiguous inputs on MKLDNN convolution codepath.  ([16300](https://github.com/pytorch/pytorch/pull/16300)).
* `Tensor.eq_`:  Fix erroneous calculation.  ([15475](https://github.com/pytorch/pytorch/pull/15475)).
* `torch.mean`: Fix fp16 output calculation.  ([14878](https://github.com/pytorch/pytorch/pull/14878)).
* `nn.PoissonNLLLoss`:  Properly handle `reduction=None`.  ([17358](https://github.com/pytorch/pytorch/pull/17358)).
* **[JIT]** Fix bug where custom ops could get optimized out if their outputs weren't used. ([18711](https://github.com/pytorch/pytorch/pull/18711)).
* **[JIT]** Fix bug where the model serializer would accidentally reorder statements. ([17557](https://github.com/pytorch/pytorch/pull/17557)).

Other
* `Tensor.round` is now consistently half to even.  ([17443](https://github.com/pytorch/pytorch/pull/17443)).
* `Tensor.resize_`: Fix some 0-element cases.  ([14874](https://github.com/pytorch/pytorch/pull/14874)).
* `Tensor.numpy`: Fix conversion of `torch.int8` dtype.  ([15194](https://github.com/pytorch/pytorch/pull/15194)).
* `Tensor.grad`: correctly handle `del`.  ([16525](https://github.com/pytorch/pytorch/pull/16525)).
* `Tensor.clamp`: correctly handle NaN on CUDA.  ([15479](https://github.com/pytorch/pytorch/pull/15479)).
* `Tensor.topk`: properly set launch bounds on CUDA.  ([17296](https://github.com/pytorch/pytorch/pull/17296)).
* `Tensor.kthvalue`: treat NaN as bigger than any number.  ([17824](https://github.com/pytorch/pytorch/pull/17824)).
* `Tensor.copy_`: Properly synchronize on src and dst sreams.  ([16966](https://github.com/pytorch/pytorch/pull/16966)).
* `Tensor indexing`: Fix incorrect dimension error message.  ([16495](https://github.com/pytorch/pytorch/pull/16495)).
* `Tensor.coalesce`, `Tensor.clone`, `Tensor.to_dense`: fixed for sparse 0-dimensional tensors.  ([17379](https://github.com/pytorch/pytorch/pull/17379)).
* `torch.isinf`: Don't error out on integral tensors.  ([15489](https://github.com/pytorch/pytorch/pull/15489)).
* `torch.argsort`, `torch.sort`: Match NumPy by considering NaNs to be larger than any number.  ([15886](https://github.com/pytorch/pytorch/pull/15886)).
* `torch.geqrf`, `torch.ormqr`: when an `out` parameter is specified, dispatch to the correct function.  ([16964](https://github.com/pytorch/pytorch/pull/16964)).
* `torch.cuda.get_device_name` / `torch.cuda.get_device_capability`: Fix handling of optional.  ([17222](https://github.com/pytorch/pytorch/pull/17222)).
* `Tensor.tril_` / `Tensor.triu_`: properly reuse input memory.  ([17031](https://github.com/pytorch/pytorch/pull/17031)).
* `torch.arange`: fix shape inconsistency between CPU and CUDA.  ([18462](https://github.com/pytorch/pytorch/pull/18462)).
* `torch.empty` (and other size-based factory functions): properly enforce non-negative sizes.  ([17077](https://github.com/pytorch/pytorch/pull/17077)).
* `torch.load`: support serializing / deserializing `pathlib.Path` object.  ([18562](https://github.com/pytorch/pytorch/pull/18562)).
* `nn.BatchNorm`: correctly handle very large batches.  ([17047](https://github.com/pytorch/pytorch/pull/17047)).
* `nn.Softmax` / `nn.LogSoftmax`: fix double backward for `torch.half`.  ([17330](https://github.com/pytorch/pytorch/pull/17330)).
* `nn.Softmax`: handle empty inputs in backward.  ([17259](https://github.com/pytorch/pytorch/pull/17259)).
* `nn.NLLLoss`: Fix crash when `ignore_index` is out-of-bounds on CPU.  ([17328](https://github.com/pytorch/pytorch/pull/17328)).
* `nn.Softmax`, `nn.LogSoftmax`: handle 0-element inputs.  ([17651](https://github.com/pytorch/pytorch/pull/17651)).
* `nn.CTCLoss`: correct error checking.  ([16269](https://github.com/pytorch/pytorch/pull/16269)).
* `nn.Conv`: better report convolution size mismatch.  ([17436](https://github.com/pytorch/pytorch/pull/17436)).
* `torch.nn.functional.cosine_similarity`: fix output sometimes returning result > 1.0.  ([18168](https://github.com/pytorch/pytorch/pull/18168)).
* `nn.parallel.data_parallel`: Fix handling of buffers that require_grad.  ([13352](https://github.com/pytorch/pytorch/pull/13352)).
* `nn.parallel.data_parallel`: would previously sometimes frees tensors before all pending operations finish. ([18465](https://github.com/pytorch/pytorch/pull/18465)).
* `torch.distributed.broadcast`: fixed repeated calls leading to OOM. ([19219](https://github.com/pytorch/pytorch/pull/19219)).
* `torch.multiprocessing`: fix serialization of integer `nn.Parameters`.  ([18639](https://github.com/pytorch/pytorch/pull/18639)).
* `torch.multiprocessing`: Fix handling of `distributions` on CUDA.  ([16854](https://github.com/pytorch/pytorch/pull/16854)).
* `torch.nonzero`: Fix for 0-dimensional tensors on CUDA.  ([17406](https://github.com/pytorch/pytorch/pull/17406)).
* `torch.slogdet`: Fix `sign` requiring grad when `input` required grad.  ([16337](https://github.com/pytorch/pytorch/pull/16337)).
* `torch.cuda.Stream`: Properly restore stream on destination device when switching devices.  ([17439](https://github.com/pytorch/pytorch/pull/17439)).
* `torch.cuda.Stream`: Fixed synchronization issue when used with non-current device.  ([15689](https://github.com/pytorch/pytorch/pull/15689)).
* `torch.cuda.Stream`: properly change device in stream context manager.  ([16128](https://github.com/pytorch/pytorch/pull/16128)).
* `DataLoader`: fixed a hang when no data was read and the buffer size is smaller than the chunk size.  ([17409](https://github.com/pytorch/pytorch/pull/17409)).
* `DataLoader`: `_utils.collate.default_collate` now converts bool lists to byte Tensors, not integer tensors.
([14669](https://github.com/pytorch/pytorch/pull/14669)).
* `DataLoader`: ensure dataset is indexed by integers.  ([17649](https://github.com/pytorch/pytorch/pull/17649)).
* `torch.sparse.mm`:  Handle transposed dense tensors in backwards.  ([18737](https://github.com/pytorch/pytorch/pull/18737)).
* `torch.sparse.sum`: Fix parsing of `dim`.  ([16517](https://github.com/pytorch/pytorch/pull/16517)).
* `torch.sparse.mm` / `torch.sparse.addmm`: fix broadcasting and using uninitialized data.  ([16572](https://github.com/pytorch/pytorch/pull/16572)).
* `Tensor.to_sparse`: Fix for 0-dimensional tensors.  ([17406](https://github.com/pytorch/pytorch/pull/17406)).
* `SparseTensor`: fix add with non-contiguous `values` tensors.  ([18179](https://github.com/pytorch/pytorch/pull/18179)).
* Fix `compare_exchange_weak` in `weak_intrusive_ptr`.  ([16302](https://github.com/pytorch/pytorch/pull/16302)).
* `utils.model_zoo.load_url`: Fix race condition.  ([16578](https://github.com/pytorch/pytorch/pull/16578)).
* `utils.data.RandomSampler`: have `len` properly take into account `num_samples`.  ([15991](https://github.com/pytorch/pytorch/pull/15991)).
* `torch.distributions`:  Fix precision issue with expansion that prefers `probs` over `logits`.  ([18614](https://github.com/pytorch/pytorch/pull/18614)).
* `distributions.dirichlet.Dirichlet`: fixed an underflow issue.  ([17488](https://github.com/pytorch/pytorch/pull/17488)).
* `distributions.binomial.Binomial.log_prob`: fixed numerical stability issue.  ([15962](https://github.com/pytorch/pytorch/pull/15962)).
* `Caching Allocator`: Free all blocks with outstanding events on OOM-retry.  ([19222](https://github.com/pytorch/pytorch/pull/19222)).
* `torch.dtype`: fix pickling issue with Python 2.  ([18045](https://github.com/pytorch/pytorch/pull/18045)).
* `utils.data.DataLoader`: Fix SIGCHLD checking.  ([19421](https://github.com/pytorch/pytorch/pull/19421)).
* `optim.Optimizer`: Properly copy defaults.  ([19308](https://github.com/pytorch/pytorch/pull/19308)).
* `optim.lr_scheduler.CosineAnnealingLR`: Fix division-by-zero error.  ([19180](https://github.com/pytorch/pytorch/pull/19180)).
* `optim.lr_scheduler.ReduceLROnPlateau`: fix bug when the argument to `step` is reused outside the function.
([16697](https://github.com/pytorch/pytorch/pull/16697)).
* `cudNN`: fix race condition with multiple threads calling into the same device.  ([15080](https://github.com/pytorch/pytorch/pull/15080)).
* `cudNN`: Properly specify accumulation types.  ([16825](https://github.com/pytorch/pytorch/pull/16825)).
* `cuDNN`: Fix incorrectly selecting slower algorithms in certain cases.  ([15881](https://github.com/pytorch/pytorch/pull/15881)).
* `cuFFT`:  Properly handle CUDA contexts.  ([19300](https://github.com/pytorch/pytorch/pull/19300))
* Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1.  ([15114](https://github.com/pytorch/pytorch/pull/15114)).
* Fix tensor printing bug with Python 2.  ([12732](https://github.com/pytorch/pytorch/pull/12732)).
* `MKLDNN`: fix thread safety.  ([17022](https://github.com/pytorch/pytorch/pull/17022)).
* **[JIT]** `floordiv`: Fix integer division and divide-by-zero semantics. ([15813](https://github.com/pytorch/pytorch/pull/15813)).
* **[JIT]** Fix bug in alias analysis that disabled optimizations even in models without mutation. ([18416](https://github.com/pytorch/pytorch/pull/18146)).
* **[JIT]** `ord()`: Fix handling of utf8 chars. ([19423](https://github.com/pytorch/pytorch/pull/19423)).
* **[JIT]** Fix error when too many parameters are passed to a fused CUDA kernel. ([18063](https://github.com/pytorch/pytorch/pull/18063)).
* **[JIT]** Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. ([19576](https://github.com/pytorch/pytorch/pull/19576)).
* **[JIT]** Fix infinite loop in `requires_grad` analysis pass. ([18361](https://github.com/pytorch/pytorch/pull/18361)).
* **[JIT]** Fix ordering of parameters for in `rnn.py`. ([18198](https://github.com/pytorch/pytorch/pull/18198)).
* **[JIT]]** Fix contiguous autodiff and AutoGradZero inconsistency ([18633](https://github.com/pytorch/pytorch/pull/18633)).
* **[JIT]** Fix error reporting in NVRTC use of the fuser. ([18327](https://github.com/pytorch/pytorch/pull/18327)).
* **[JIT]** Ensure GIL is acquired before doing module lookup on import. ([17135](https://github.com/pytorch/pytorch/pull/17135)).
* **[JIT]** Fix bug where `_unique_state_dict` could contain duplicate Tensors. ([18139](https://github.com/pytorch/pytorch/pull/18139)).
* **[C++]**: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do.  ([15033](https://github.com/pytorch/pytorch/pull/15033)).
* **[C++]**: Add `Stream` and `Event` APIs.  ([15937](https://github.com/pytorch/pytorch/pull/15937)).
* **[C++]**: Fix Module serialization incompatibility between Python and C++ with weight-less layers.  ([19740](https://github.com/pytorch/pytorch/pull/19740)).
* **[C++]**: Properly pass `extra_cuda_cflags` to C++ extensions on Windows.  ([18638](https://github.com/pytorch/pytorch/pull/18638)).
* **[C++]** Make SGD semantics match python.  ([15840](https://github.com/pytorch/pytorch/pull/15840)).
* **[C++]** `torch::nn::init::orthogonal_`: match Python API.  ([18915](https://github.com/pytorch/pytorch/pull/18915)).

Deprecations
* `torch.btrifact`: the deprecated `info` argument has been removed.  ([14935](https://github.com/pytorch/pytorch/pull/14935)).
* `torch.potrs` has been deprecated, use `torch.cholesky_solve` instead.  Note that `upper` defaults to `False`  for `torch.cholesky_solve`, and `True` for `torch.potrs`.  ([15334](https://github.com/pytorch/pytorch/pull/15334)).
* `torch.pstrf` is deprecated; use `torch.cholesky` instead.  Note that `upper` defaults to `False`  for `torch.cholesky`, and `True` for `torch.pstrf`.  ([17866](https://github.com/pytorch/pytorch/pull/17866)).
* `torch.potri` is deprecated; use `torch.cholesky_inverse` instead.  Note that `upper` defaults to `False`  for `torch.cholesky_inverse`, and `True` for `torch.potri`.  ([19498](https://github.com/pytorch/pytorch/pull/19498)).
* `torch.btrifact_with_info` has been deprecated; use `torch.lu` with `get_infos=True` instead.([18435](https://github.com/pytorch/pytorch/pull/18435)).
* `torch.btrifact` has been deprecated; use the new name `torch.lu` instead.  ([18435](https://github.com/pytorch/pytorch/pull/18435)).
* `torch.gesv` is deprecated; use the new name `torch.solve instead.  ([18060](https://github.com/pytorch/pytorch/pull/18060)).
* `torch.trtrs` has been deprecated; use the new name `torch.triangular_solve` instead.  ([18213](https://github.com/pytorch/pytorch/pull/18213)).
* `torch. btriunpack` has been deprecated; use the new name `torch.lu_unpack ` instead.  ([18529](https://github.com/pytorch/pytorch/pull/18529)).
* `torch.btrisolve` has been deprecated; use the new name `torch.lu_solve` instead.  ([18726](https://github.com/pytorch/pytorch/pull/18726)).
* **[C++]** `IntList` has been deprecated, use `IntArrayRef` instead, as it better describes the type and ownership semantics in C++.  ([16751](https://github.com/pytorch/pytorch/pull/16751)).
*  **[C++]** Dispatch macros with `Type` parameters, e.g. `AT_DISPATCH_ALL_TYPES(tensor.type(), ...`, are now deprecated; use `ScalarType` instead, e.g. `AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), ...`.  ([17527](https://github.com/pytorch/pytorch/pull/17527), [17996](https://github.com/pytorch/pytorch/pull/17996)).
* **[C++]** the deprecated `variable_tensor_functions` have been removed.  ([15003](https://github.com/pytorch/pytorch/pull/15003)).

Performance

Highlights
* `nn.BatchNorm` CPU inference speed increased up to ~19x.([19152](https://github.com/pytorch/pytorch/pull/19152)).
* `nn.AdaptiveAvgPool`: speed up common-case of size=1 output by ~30x.  ([17011](https://github.com/pytorch/pytorch/pull/17011)).
* `nn.EmbeddingBag` CPU performance increased by ~4x.  ([19329](https://github.com/pytorch/pytorch/pull/19329)).
* `Tensor.copy_`: sped up larger tensor copy ~2-3x, small regression in small tensor copy.  ([18618](https://github.com/pytorch/pytorch/pull/18618)).
* `torch.nonzero`: is now ~2x faster than numpy on CPU.  ([15190](https://github.com/pytorch/pytorch/pull/15190))
* Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN.  ([17120](https://github.com/pytorch/pytorch/pull/17120)).
* `reduction functions`: Speed up some large Tensor cases by 50-80%.  ([17428](https://github.com/pytorch/pytorch/pull/17428)).
* **[JIT]** Graph fuser: better fusion for backwards graphs in the presence of broadcasting. ([14957](https://github.com/pytorch/pytorch/pull/14957))
* **[JIT]** Graph fuser: `batch_norm` fusion for inference. ([15146](https://github.com/pytorch/pytorch/pull/15146))
* **[JIT]** Graph fuser: `layer_norm` fusion for inference. ([18266](https://github.com/pytorch/pytorch/pull/18266))


Other

* `torch.abs`, `torch.frac`, `torch.repiprocal`, `torch.neg` have been vectorized and parallelized ([19041](https://github.com/pytorch/pytorch/pull/19041)).
* `torch.bmm`: CPU performance increased by 2x.  ([19338](https://github.com/pytorch/pytorch/pull/19338)).
* `torch.sort`: CUDA performance increased by ~2x.  ([19379](https://github.com/pytorch/pytorch/pull/19379)).
* `torch.cat` on CPU is now ~4x faster in the case where inputs are contiguous and `dim` != 0.  ([17032](https://github.com/pytorch/pytorch/pull/17032)).
* `torch.multinomial` fixed a 2x performance regression.  ([17121](https://github.com/pytorch/pytorch/pull/17121)).
* `torch.empty` (and another factory functions): reduce overhead by 20-40%.  ([17565](https://github.com/pytorch/pytorch/pull/17565)).
* `torch.linspace` has been parallelized on CPU.  ([15320](https://github.com/pytorch/pytorch/pull/15320)).
* `torch.logspace` has been parallelized on CPU.  ([15438](https://github.com/pytorch/pytorch/pull/15438)).
* `torch.range` has been parallelized on CPU.  ([15484](https://github.com/pytorch/pytorch/pull/15484)).
* `torch.arange` has been parallelized on CPU.  ([15667](https://github.com/pytorch/pytorch/pull/15667)).
* `torch.load`: avoid unnecessary CPU-to-CUDA copy.  ([17297](https://github.com/pytorch/pytorch/pull/17297)).
* `reduction functions`: improve efficiency on CUDA.  ([16224](https://github.com/pytorch/pytorch/pull/16224), [17040](https://github.com/pytorch/pytorch/pull/17040)).
* Speed up some GEMM cases on CPU by up to 7x.([17730](https://github.com/pytorch/pytorch/pull/17730))
* Tensor iterator loop unrolling.  ([17667](https://github.com/pytorch/pytorch/pull/17667)).
* `sparse/dense matrix multiply`: improve speed by ~5x.  ([16905](https://github.com/pytorch/pytorch/pull/16905)).
* `distributions.MultivariateNormal`: sped up.  ([17294](https://github.com/pytorch/pytorch/pull/17294)).
* **[JIT]** Graph fuser: pow scalar exponent / base autodiff, fusion ([19324](https://github.com/pytorch/pytorch/pull/19324))
* **[JIT]** Graph fuser: allow fusion of function float arguments. ([18087](https://github.com/pytorch/pytorch/pull/18087))
* **[JIT]** Shape analysis: specialize optional Tensor inputs to graphs. ([18360](https://github.com/pytorch/pytorch/pull/18360))
* **[JIT]** Shape analysis: various correctness improvements. ([18271](https://github.com/pytorch/pytorch/pull/18271))
* **[JIT]** Shape analysis: `aten::_convolution` now participates in shape analysis. ([16837](https://github.com/pytorch/pytorch/pull/16837)]
* **[JIT]** Autodiff: coverage for ops used in maskrcnn & BERT. ([16689](https://github.com/pytorch/pytorch/pull/16689))
* **[JIT]** Autodiff: support for scalar comparison ops and `randlike`. ([14740](https://github.com/pytorch/pytorch/pull/14740))
* **[JIT]** Autodiff: support for `adaptive_avg_pool2d`. ([15459](https://github.com/pytorch/pytorch/pull/15459))
* **[JIT]** Autodiff: support for `erf` and `erfc`. ([15139](https://github.com/pytorch/pytorch/pull/15139))
* **[JIT]** Autodiff: support for `layernorm`. ([17702](https://github.com/pytorch/pytorch/pull/17702))
* **[JIT]** Autodiff: support for `tanh`. ([17816](https://github.com/pytorch/pytorch/pull/17816))
* **[JIT]** Autodiff: support for `matmul`/`dropout`. ([17523](https://github.com/pytorch/pytorch/pull/17523))
* **[JIT]** Autodiff: specialized CUDA impl for dropout. ([17756](https://github.com/pytorch/pytorch/pull/17756))
* **[JIT]** Constant folding: improved inlining of control flow. ([16244](https://github.com/pytorch/pytorch/pull/16244))

Documentation

* `Tensor.scatter_`: add documentation about `value` parameter.  ([17467](https://github.com/pytorch/pytorch/pull/17467)).
* `Tensor.unfold`: correctly document `dimension` parameter, not `dim`.  ([19020](https://github.com/pytorch/pytorch/pull/19020)).
* `Tensor.is_floating_point()` is now documented.  ([15704](https://github.com/pytorch/pytorch/pull/15704)).
* `torch.cholesky`: Fix broken `upper` example in documentation.  ([15215](https://github.com/pytorch/pytorch/pull/15215)).
* `torch.gesv`: document `out` parameter.  ([15649](https://github.com/pytorch/pytorch/pull/15649)).
* `torch.mul`: better explain elementwise multiplication.  ([15664](https://github.com/pytorch/pytorch/pull/15664)).
* `torch.eig`, `torch.symeig`: better explain backwards limitations.  ([15929](https://github.com/pytorch/pytorch/pull/15929)).
* `torch.ormqr`: fixed output specification.  ([15694](https://github.com/pytorch/pytorch/pull/15694)).
* `torch.from_numpy`: replaced usage with `torch.as_tensor` in documentation.  ([16587](https://github.com/pytorch/pytorch/pull/16587)).
* `torch.mvlgamma`: Fix the constant in the docs.  ([17045](https://github.com/pytorch/pytorch/pull/17045)).
* `torch.mode`: more precisely describe what is returned.  ([17069](https://github.com/pytorch/pytorch/pull/17069)).
* `torch.upsample`: documentation now matches `torch.interpolate`.  ([17134](https://github.com/pytorch/pytorch/pull/17134))
* `torch.arange`: correct `dtype` documentation.  ([18604](https://github.com/pytorch/pytorch/pull/18604))
* `torch.cumprod`: document `out` parameter.  ([19340](https://github.com/pytorch/pytorch/pull/19340)).
* `torch.nonzero`: document indices being returned lexicographically.  ([19539](https://github.com/pytorch/pytorch/pull/19539)).
* `torch.nn.functional.interpolate`: better explain `aligned_corners` parameter.  ([14806](https://github.com/pytorch/pytorch/pull/14806)).
* `torch.nn.functional.pad`: documentation has been made consistent with other functional ops.  ([15984](https://github.com/pytorch/pytorch/pull/15984)).
* `nn.functional.grid_sample`: clarify behavior of padding.  ([19754](https://github.com/pytorch/pytorch/pull/19754)).
* `nn.TripletMarginLoss`: correct type of `swap` parameter.  ([18115](https://github.com/pytorch/pytorch/pull/18115)).
* `nn.CrossEntropyLoss`: clarify `ignore_index` documentation.  ([18117](https://github.com/pytorch/pytorch/pull/18117)).
* `nn.CrossEntropyLoss`: the input format is more clearly explained.  ([15990](https://github.com/pytorch/pytorch/pull/15990)).
* `nn.CTCLoss`: Clarify a number of ambiguities.  ([18415](https://github.com/pytorch/pytorch/pull/18415)).
* `nn.BCEWithLogitsLoss`: add better explanation.  ([19212](https://github.com/pytorch/pytorch/pull/19212)).
* `nn.BCEWithLogitsLoss`: better explain positive samples.  ([17258](https://github.com/pytorch/pytorch/pull/17258)).
* `nn.ModuleList` / `nn.ParameterList`: update documentation.  ([17731](https://github.com/pytorch/pytorch/pull/17731)).
* `nn.Module.load_state_dict`: correct semantics of `strict`.  ([17618](https://github.com/pytorch/pytorch/pull/17618))
* `nn.parallel.DataParallel`: more accurately specify how different argument types are handled.  ([15993](https://github.com/pytorch/pytorch/pull/15993)).
* `nn.parallel.DistributedDataParallel`: Clarified batch size requirements.  ([16010](https://github.com/pytorch/pytorch/pull/16010)).
* `torch.distributed`: Document mixed-precision training.  ([15440](https://github.com/pytorch/pytorch/pull/15440)).
* `torch.multiprocessing`: Include example multiprocessing code.  ([16345](https://github.com/pytorch/pytorch/pull/16345)).
* `torch.autograd`: Better explain computing Jacobian-vector product.  ([15197](https://github.com/pytorch/pytorch/pull/15197)).
* `torch.cuda.get_rng_state`, `torch.cuda.set_rng_state`: document taking a `device` object.  ([14324](https://github.com/pytorch/pytorch/pull/14324)).
* `torch.device`: Fix example of passing `device` to tensor factory.  ([16839](https://github.com/pytorch/pytorch/pull/16839)).
* `DataLoader`: update documentation to describe how workers are managed.  ([18091](https://github.com/pytorch/pytorch/pull/18091)).
* Unified shape formats throughout the documentation.  ([15741](https://github.com/pytorch/pytorch/pull/15741)).
* Update documentation for `reduction` arguments to use non-deprecated format.  ([17300](https://github.com/pytorch/pytorch/pull/17300)).
* `mark_non_differentiable`: document correct semantics.  ([17891](https://github.com/pytorch/pytorch/pull/17891)).
* Warn about memory overlaps on inplace operations.  ([17576](https://github.com/pytorch/pytorch/pull/17576)).
* Fix a number of small issues with conv and pooling docstrings.  ([17052](https://github.com/pytorch/pytorch/pull/17052)).
* Fix a number of small issues with padding and activation docstrings.  ([17197](https://github.com/pytorch/pytorch/pull/17197)).
* **[C++]**: mention packed accessors in Tensor basics.  ([19464](https://github.com/pytorch/pytorch/pull/19464)).

ONNX

Exporting More Torch Operators to ONNX

* Export torch.isnan to ONNX ([17698](https://github.com/pytorch/pytorch/pull/17698)).
* Export torch.flatten to ONNX ([16240](https://github.com/pytorch/pytorch/pull/16240)).
* Export torch.where, torch.ceil, torch.floor to ONNX ([18571](https://github.com/pytorch/pytorch/pull/18571)).
* Export torch.narrow to ONNX ([17550](https://github.com/pytorch/pytorch/pull/17550)).
* Export torch.argmax and torch torch.argmin ([17382](https://github.com/pytorch/pytorch/pull/17382), [18264](https://github.com/pytorch/pytorch/pull/18264), [18261](https://github.com/pytorch/pytorch/pull/18261)).
* Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX ([17412](https://github.com/pytorch/pytorch/pull/17412)).
* Export torch.nonzero to ONNX ([17036](https://github.com/pytorch/pytorch/pull/17036), [18047](https://github.com/pytorch/pytorch/pull/18047)).
* Export torch.erf to ONNX ([16106](https://github.com/pytorch/pytorch/pull/16106)).
* Export torch.split ([15092](https://github.com/pytorch/pytorch/pull/15092)).
* Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX ([15677](https://github.com/pytorch/pytorch/pull/15677)).
* Export torch.expand and torch.ne to ONNX ([15050](https://github.com/pytorch/pytorch/pull/15050)).
* Export torch.nn.LogSigmoid to ONNX ([14830](https://github.com/pytorch/pytorch/pull/14830)).
* Export torch.nn.RReLU to ONNX ([14781](https://github.com/pytorch/pytorch/pull/14781)).
* Export torch.reshape and torch.reshape_as to ONNX ([16632](https://github.com/pytorch/pytorch/pull/16632), [16971](https://github.com/pytorch/pytorch/pull/16971)).
* Replace use of ConstantLike with with ConstantOfShape ([16095](https://github.com/pytorch/pytorch/pull/16095), [16214](https://github.com/pytorch/pytorch/pull/16214)).

Extending Existing Exporting Logic

* Enable dim support in torch.nn.Softmax's export ([18482](https://github.com/pytorch/pytorch/pull/18482)).
* Support exporting squeeze & unsqueeze with negative dim attribute ([19297](https://github.com/pytorch/pytorch/pull/19297)).
* Support exporting max_pool1d, max_pool2d, max_pool3d with indices ([16455](https://github.com/pytorch/pytorch/pull/16455)).
* Add dtype support in torch.logsoftmax and torch.softmax's export ([17672](https://github.com/pytorch/pytorch/pull/17672)).
* Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export ([16769](https://github.com/pytorch/pytorch/pull/16769)).

Optimizing Exported ONNX Graph

* Add constant folding in ONNX exporter ([18698](https://github.com/pytorch/pytorch/pull/18698)).
* Retain the parameter names in ONNX exporter ([17551](https://github.com/pytorch/pytorch/pull/17551)).
* Omit slice op if it is a non-op ([19155](https://github.com/pytorch/pytorch/pull/19155)).
* Add a flag to strip doc_string from exported ONNX models ([18882](https://github.com/pytorch/pytorch/pull/18882)).
* Omit torch.dropout if the model is in eval mode ([16547](https://github.com/pytorch/pytorch/pull/16547)).

Adding Utility Functions and Refactoring

* Remove unused arg f from _model_to_graph(). ([19647](https://github.com/pytorch/pytorch/pull/19647)).
* Add the support for stable ONNX opsets in exporter ([16068](https://github.com/pytorch/pytorch/pull/16068), [17419](https://github.com/pytorch/pytorch/pull/17419)).
* Set the default ONNX opset to the latest stable opset (i.e., 9) ([17736](https://github.com/pytorch/pytorch/pull/17736)).
* Add an utility function to check whether it's in the middle of ONNX export or not ([19050](https://github.com/pytorch/pytorch/pull/19050)).
* Refactoring serialization of ONNX initializers to be name-based ([17830](https://github.com/pytorch/pytorch/pull/17830)).
* Expose dim() on type and use it in ONNX symbolics ([15933](https://github.com/pytorch/pytorch/pull/15933)).
* Add scalar_type_to_pytorch_type dict in ONNX symbolic ([15965](https://github.com/pytorch/pytorch/pull/15965)).
* Add an assertion to check the number of the parameters passed to ONNX exporter ([18145](https://github.com/pytorch/pytorch/pull/18145)).

Bugfixes

* Fix different types in rsub caused bug ([15707](https://github.com/pytorch/pytorch/pull/15707)).
* Fix list structure supports in ONNX exporter ([19102](https://github.com/pytorch/pytorch/pull/19102)).
* Fix case for `activations` attribute in nn.RNN ONNX export. ([19368](https://github.com/pytorch/pytorch/pull/19368)).
* Minor fix for onnx ConstantOfShape export ([18199](https://github.com/pytorch/pytorch/pull/18199)).
* Fix the torch.(reduce)min and torch.(reduce)max's export ([15241](https://github.com/pytorch/pytorch/pull/15241)).
* Fixing ONNX export of logical ops to have correct output datatype ([15185](https://github.com/pytorch/pytorch/pull/15185)).
* Fix typo in docstring ([18216](https://github.com/pytorch/pytorch/pull/18216)).

1.0.1

Note: our conda install commands have slightly changed. Version specifiers such as `cuda100` in `conda install pytorch cuda100 -c pytorch` have changed to `conda install pytorch cudatoolkit=10.0 -c pytorch`

Breaking Changes

There are no breaking changes in this release.

Bug Fixes

Serious

- Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) 15686
- Correct gradients for non-contiguous weights in CPU Convolutions 16301
- Fix ReLU on CPU Integer Tensors by fixing vec256 inversions 15634
- Fix bincount for non-contiguous Tensors 15109
- Fix torch.norm on CPU for large Tensors 15602
- Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (15475)
- Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
- blacklist fft algorithms for strided dgrad (16626)

Correctness

- Fix cuda native loss_ctc for varying input length (15798)
- this avoids NaNs in variable length settings
- C++ Frontend: Fix serialization (15033)
- Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
- Fix derivative for mvlgamma (15049)
- Fix numerical stability in log_prob for Gumbel distribution (15878)
- multinomial: fix detection and drawing of zero probability events (16075)


Crashes

- PyTorch binaries were [crashing on AWS Lambda](https://github.com/pytorch/pytorch/issues/15213) and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
- MKL-DNN is now statically built, to avoid conflicts with system versions
- Allow ReadyQueue to handle empty tasks (15791)
- Fixes a segfault with a DataParallel + Checkpoint neural network setting
- Avoid integer divide by zero error in index_put_ (14984)
- Fix for model inference crash on Win10 (15919) (16092)
- Use CUDAGuard when serializing Tensors:
- Before this change, `torch.save` and `torch.load` would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
- Fix error with handling scalars and __rpow__, for example `1 ^^ x`, where x is a PyTorch scalar (16687)
- Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (16403)
- CuDNN crashes when batch size >= 65536
- [Distributed] TCP init method race condition fix (15684)
- [Distributed] Fix a memory leak in Gloo's CPU backend
- [C++ Frontend] Fix LBFGS issue around using inplace ops (16167)
- [Hub] Fix github branch prefix v (15552)
- [Hub] url download bugfix for URLs served without Content-Length header

Performance

- LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. [14976](https://github.com/pytorch/pytorch/pull/14976)
- Make btriunpack work for high dimensional batches and faster than before (15286)
- improve performance of unique with inverse indices (16145)
- Re-enable OpenMP in binaries (got disabled because of a CMake refactor)

Other

- create type hint stub files for module torch (16089)
- This will restore auto-complete functionality in PyCharm, VSCode etc.
- Fix sum_to behavior with zero dimensions (15796)
- Match NumPy by considering NaNs to be larger than any number when sorting (15886)
- Fixes various error message / settings in dynamic weight GRU / LSTMs (15766)
- C++ Frontend: Make call operator on module holder call forward (15831)
- C++ Frontend: Add the normalize transform to the core library (15891)
- Fix bug in torch::load and unpack torch::optim::detail namespace (15926)
- Implements Batched upper triangular, lower triangular (15257)
- Add torch.roll to documentation (14880)
- (better errors) Add backend checks for batch norm (15955)


JIT

- Add better support for bools in the graph fuser (15057)
- Allow tracing with fork/wait (15184)
- improve script/no script save error (15321)
- Add self to Python printer reserved words (15318)
- Better error when torch.load-ing a JIT model (15578)
- fix select after chunk op (15672)
- Add script standard library documentation + cleanup (14912)

1.0

print(unique(a))
tensor([ 2.,  1.])


Windows support

PyTorch now officially supports Windows. We provide pre-compiled Conda binaries and pip wheels for Python 3.5 and 3.6.
PyTorch on Windows doesn't support `distributed` training and might be a tad bit slower than Linux / OSX because Visual Studio supports an older version of OpenMP.

As always, you can use the commands at http://pytorch.org to install PyTorch on Windows
We have an FAQ that answers most questions you might have around Windows here: http://pytorch.org/docs/stable/notes/windows.html


ONNX Improvements


New ONNX operators
- Support export `torch.max(input, dim)` and `torch.min(input, dim)` [6220](https://github.com/pytorch/pytorch/pull/6220)
- Add symbolic for `ReLU` to support exporting to ONNX [5759](https://github.com/pytorch/pytorch/pull/5759)
- Add `sum`, `prod`, `sqrt` and improve `log_softmax` [4579](https://github.com/pytorch/pytorch/pull/4579)
- Add ONNX support for `InstanceNorm` [4626](https://github.com/pytorch/pytorch/pull/4626)
- Add ONNX symbolic for `Elu` [3453](https://github.com/pytorch/pytorch/pull/3453)
- Add ONNX symbolic for `UpsamplingNearest2d` [3450](https://github.com/pytorch/pytorch/pull/3450)

Improvements
- Print source location when ONNX export fails for a node [5652](https://github.com/pytorch/pytorch/pull/5652)
- Export onnx protobuf bindings to python [6651](https://github.com/pytorch/pytorch/pull/6651)
- Support `output_padding` in `ConvTranspose` [4583](https://github.com/pytorch/pytorch/pull/4583)

Better RNN support
PyTorch can now export a subset of RNNs to ONNX [4409](https://github.com/pytorch/pytorch/pull/4409)

- Add Elman RNN export to ONNX [4613](https://github.com/pytorch/pytorch/pull/4613)
- Support batch-first in ONNX export of padded sequences [5360](https://github.com/pytorch/pytorch/pull/5360)
- Bidirectional Elman RNN export to ONNX [5120](https://github.com/pytorch/pytorch/pull/5120)
- Handle sequence lengths correctly when exporting RNNs to ONNX [4695](https://github.com/pytorch/pytorch/pull/4695)
- Support GRU export to ONNX [4390](https://github.com/pytorch/pytorch/pull/4390)

Bugfixes
- Fix a bug in ONNX symbolic of 3d average pooling [6101](https://github.com/pytorch/pytorch/pull/6101)
- Fix onnx export of replication/reflection pad [4263](https://github.com/pytorch/pytorch/pull/4263)


Miscellaneous improvements
* implement ``__dir__`` for Tensors, so that editors can automatically auto-complete and query for the possible fields in Tensors

* Add ``numpy()`` and ``from_numpy()`` to ``HalfTensor``
* Enable `TensorDataset` to have any number of input tensors.

* Add `padding_value` to `torch.nn.utils.rnn.pad_sequence`
* Add `total_length` option to `pack_padded_sequence`, which is useful when using `DataParallel`, as we can ensure that we have sequences of the same length.
* Improve numerical precision of `torch.arange`, making it consistent with `numpy.arange`
* `torch.load()` and `torch.save()` support arbitrary file-like object
* `torch.nn.functional.grid_sample` now supports 2D (spatial) and 3D (volumetric) inputs
* set python random seed in `DataLoader` workers, in order to improve experiment reproducibility

* Add `__delitem__` to `nn.Sequential`. Now one can delete arbitrary elements of a `nn.Sequential`.

For example:

python
model = nn.Sequential(nn.Linear(2, 2), nn.ReLU(), nn.Linear(2, 2))
del model[1]   deletes nn.ReLU


* `ReduceLROnPlateau` is now serializable [5300](https://github.com/pytorch/pytorch/pull/5300)

* Add option to flush denormal numbers on CPU. [5294](https://github.com/pytorch/pytorch/pull/5294)
* PyTorch now exposes the gradients of conv1d, conv2d and conv3d with respect to the input and the weights [5408](https://github.com/pytorch/pytorch/pull/5408)
* Add support for calling `pack_padded_sequence` with either list or with a Tensor [5133](https://github.com/pytorch/pytorch/pull/5133)
- Support negative indexing for ``padding_idx`` in ``nn.Embedding`` [4496](https://github.com/pytorch/pytorch/pull/4496)
- Implement backward pass for ``pack_padded_sequence`` [4512](https://github.com/pytorch/pytorch/pull/4512)
- Add ``nn.utils.rnn.pad_sequence`` and ``nn.utils.rnn.pack_sequence`` to pad lists of variable length Tensors with ``0`` and to pack a list of variable length Tensors.
- Add ``torch.cuda.memory_cached``, ``torch.cuda.max_memory_cached``, ``torch.cuda.memory_allocated``, and ``torch.cuda.max_memory_allocated`` methods
for checking CUDA memory usage [4511](https://github.com/pytorch/pytorch/pull/4511)
- Allow viewing on noncontiguous tensors if the new view size is compatible with the tensor's original size and stride. [4062](https://github.com/pytorch/pytorch/pull/4062)
- ``NLLLoss`` and ``CrossEntropyLoss`` now support more than 2 dimensions. [4654](https://github.com/pytorch/pytorch/pull/4654)

- Add an option to not show ``model_zoo`` download progress bar [4135](https://github.com/pytorch/pytorch/pull/4135)
- You can now assign modules to indices of ``nn.Sequential``. [4931](https://github.com/pytorch/pytorch/pull/4931)
- You can create tensors with a numpy ``np.longlong`` array [4367](https://github.com/pytorch/pytorch/pull/4367)
- Change the autograd execution order to use good heuristics. This greatly improves memory usage for large models. [4746](https://github.com/pytorch/pytorch/pull/4746)

- Add AMSgrad mode to ``Adam`` and ``SparseAdam`` optmizers. [4034](https://github.com/pytorch/pytorch/pull/4034)

- Better ``torch.autograd.profiler`` support for CUDA profiling using the ``cudaEvent`` API. [3734](https://github.com/pytorch/pytorch/pull/3734)

- ``torch.set_num_threads`` also sets the respective MKL option so you won't need to use an environment variable to control it. [4949](https://github.com/pytorch/pytorch/pull/4949)


Performance improvements

- Speed up CPU ``nn.EmbeddingBag``, making training overall 30% faster [5433](https://github.com/pytorch/pytorch/pull/5433)
- Move ``nn.MarginRankingLoss``, `nn.CosineEmbeddingLoss`, `nn.HingeEmbeddingLoss`, and `nn.TripletMarginLoss` from Python to our ATen backend, resulting in some cases up to a 3x performance gains.
[5346](https://github.com/pytorch/pytorch/pull/5346),  [5646](https://github.com/pytorch/pytorch/pull/5646), [5080](https://github.com/pytorch/pytorch/pull/5080), [5680](https://github.com/pytorch/pytorch/pull/5680)
- Implement ``pin_memory()`` as a NativeFunction [4094](https://github.com/pytorch/pytorch/pull/4094)
- Save ``self.numel()`` for backward computation instead of ``self`` to save memory [5747](https://github.com/pytorch/pytorch/pull/5747)
- Rearrange dimensions for pointwise operations for up to 10x better performance in one case. [4174](https://github.com/pytorch/pytorch/pull/4174)
- Vectorize `normal_` for a 5-6x speed up in a small case [4312](https://github.com/pytorch/pytorch/pull/4312)
- Allowing usage of GPU Direct within PyTorch for the Broadcast operation [4183](https://github.com/pytorch/pytorch/pull/4183)
- Speed-up ``nn.Linear`` for the 3D input case [5279](https://github.com/pytorch/pytorch/pull/5279)
- Speed up `Conv3D` on the CPU by parallelizing ``vol2col`` and ``col2vol`` [4824](https://github.com/pytorch/pytorch/pull/4824)
- Add AVX2 implementation for sigmoid function, showing around 10x speedup [5010](https://github.com/pytorch/pytorch/pull/5010)
- Use fast integer division algorithm to avoid division ops inside kernels. [5054](https://github.com/pytorch/pytorch/pull/5054)
- Improve occupancy for CUDA random number generation [5710](https://github.com/pytorch/pytorch/pull/5710)
- Add optimization to norm for common norms [5722](https://github.com/pytorch/pytorch/pull/5722)
- Add a fast fused GLU backward [5782](https://github.com/pytorch/pytorch/pull/5782)
- Optimize unique sorting by using ``std::vector+sort`` instead of ``std::set``, giving up to 5x speedup. [5913](https://github.com/pytorch/pytorch/pull/5913)
- Speed up sum over a dimension [6026](https://github.com/pytorch/pytorch/pull/6026)
- Enable MKLDNN convolution forward and backward. [6062](https://github.com/pytorch/pytorch/pull/6062)
- Parallelize non-contiguous point-wise operations with OpenMP [2764](https://github.com/pytorch/pytorch/pull/2764)
- Add cudnn Tensor Core ops to RNNs for Volta [3409](https://github.com/pytorch/pytorch/pull/3409)
- Vectorize ``exp``, ``log``, ``sin``, ``cos`` [6078](https://github.com/pytorch/pytorch/pull/6078)
- Reuse intermediate results over multiple backwards grad_inputs [3526](https://github.com/pytorch/pytorch/pull/3526)

Distributed
- DistributedDataParallel: 10% of NCCL backend perf improvements with mixed-precision support [5064](https://github.com/pytorch/pytorch/pull/5064)
- Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance [4870](https://github.com/pytorch/pytorch/pull/4870)


Bug fixes

torch operators
- Improve ``torch.digamma`` precision near poles [6517](https://github.com/pytorch/pytorch/pull/6517)
- Fix incorrect behavior of ``Tensor.random_`` on negative inputs [6463](https://github.com/pytorch/pytorch/pull/6463)
- Fix undefined behavior in backward pass for ``tensor.permute(dims)`` with negative dims [5945](https://github.com/pytorch/pytorch/pull/5945)
- Fix integer overflow in ``torch.remainder`` operator (it would break with a divisor above ``2**48``) [5906](https://github.com/pytorch/pytorch/pull/5906)
- Fix memory leak in ``torch.bmm`` [5744](https://github.com/pytorch/pytorch/pull/5744)
- Make dimension checker of `scatter_add_` consistent with `scatter_`'s [5659](https://github.com/pytorch/pytorch/pull/5659)
- Fix CPU ``torch.multinomial`` with noncontiguous probability tensor input (previously, it would overwrite input data)[5093](https://github.com/pytorch/pytorch/pull/5093)
- Fix CUDA ``torch.multinomial`` using incorrect strides and being able to select zero-probability events. [5774](https://github.com/pytorch/pytorch/pull/5774), [5238](https://github.com/pytorch/pytorch/pull/5238)
- Support empty index tensor for ``index_select`` [3429](https://github.com/pytorch/pytorch/pull/3429)
- Support empty indices tensor in CUDA ``Tensor.put_`` [4486](https://github.com/pytorch/pytorch/pull/4486)
- Improve stability of ``torch.cat`` with empty tensors [3602](https://github.com/pytorch/pytorch/pull/3602), [5971](https://github.com/pytorch/pytorch/pull/5971), [5819](https://github.com/pytorch/pytorch/pull/5819)
- Fix ``torch.fft`` in the case where any of the input dimensions is not aligned [6118](https://github.com/pytorch/pytorch/pull/6118)
- Improve the CUDA btrifact error message [5644](https://github.com/pytorch/pytorch/pull/5644)
- Return zeros for eigenvector tensor when not requested in ``torch.symeig``[3411](https://github.com/pytorch/pytorch/pull/3411)
- Fix ``torch.btrifact`` on tensors. [4318](https://github.com/pytorch/pytorch/pull/4318)
- Fix ``torch.pstrf`` on tensors. [4883](https://github.com/pytorch/pytorch/pull/4883)
- Fix memory leak in `torch.median` [6889](https://github.com/pytorch/pytorch/pull/6889)
- Fix SVD backward on non-square matrices when `some=False` [6870](https://github.com/pytorch/pytorch/pull/6870)

core
- Detect re-initialization of ``_C`` shared library that would often result in segfaults on exit [6232](https://github.com/pytorch/pytorch/pull/6232)
- Fix indexing with all zero ByteTensors [3926](https://github.com/pytorch/pytorch/pull/3926)
- Only allow dense floating-point types as the default tensor type. [5674](https://github.com/pytorch/pytorch/pull/5674)
- Initialize CUDA before setting CUDA tensor types as default to prevent crash [4788](https://github.com/pytorch/pytorch/pull/4788)
- Fix a bug where ``from_dlpack`` fails if CUDA is not initialized. [4182](https://github.com/pytorch/pytorch/pull/4182)
- Fix crash in creating a CUDA tensor with a numpy array [5850](https://github.com/pytorch/pytorch/pull/5850)
- Fix broken sharing of empty tensor in multiprocessing on some OSes [6229](https://github.com/pytorch/pytorch/pull/6229)

autograd
- Restore allow_unused functionality: throw error when differentiated input is unused or unreachable. [6553](https://github.com/pytorch/pytorch/pull/6553)
- Fix ``output_nr`` not being incremented correctly. This caused crashes in the backward pass of operations that don't ``requires_grad`` on some inputs. [4812](https://github.com/pytorch/pytorch/pull/4812)
- Fix nvprof parsing in the ``torch.autograd.profiler`` [5840](https://github.com/pytorch/pytorch/pull/5840)

nn layers
- Support only specifying size in certain dimension for adaptive pooling [3127](https://github.com/pytorch/pytorch/pull/3127)
- Fix reflection padding boundary checks to not cause invalid memory access [6438](https://github.com/pytorch/pytorch/pull/6438)
- Improve error messages for ``NLLLoss``. [5299](https://github.com/pytorch/pytorch/pull/5299), [6072](https://github.com/pytorch/pytorch/pull/6072)
- Fix ``kl_div`` backward on CUDA. Previously it would not respect ``gradOutput`` when computing ``gradInput``. [5814](https://github.com/pytorch/pytorch/pull/5814)
- Fix incorrect ``bias`` size assert for ``Linear`` [5992](https://github.com/pytorch/pytorch/pull/5992)
- Fix incorrect ``nn.functional.convNd`` and ``nn.functional.conv_transposeNd`` error message [5701](https://github.com/pytorch/pytorch/pull/5701)
- Check that shape for input and target matches instead of number of elements for some loss functions [5085](https://github.com/pytorch/pytorch/pull/5085)
- Fix ``torch.diag`` backward returning square grad with non-square input [4538](https://github.com/pytorch/pytorch/pull/4538)
- Fix convolution type mismatch error message [5815](https://github.com/pytorch/pytorch/pull/5815)
- Add ``align_corners`` option to linearly interpolating upsampling and make the default upsampling behavior more consistent with other frameworks [5927](https://github.com/pytorch/pytorch/pull/5927)
- Prevent numerical issues with ``poisson_nll_loss`` when log_input=False [3336](https://github.com/pytorch/pytorch/pull/3336)

CUDA
- Ensure convolution weights are contiguous to fix CUDA ``ConvTranspose`` double backward [4543](https://github.com/pytorch/pytorch/pull/4543)
- Fix CUDA double backwards [4460](https://github.com/pytorch/pytorch/pull/4460)

sparse
- Fix embedding with ``sparse=True`` [4686](https://github.com/pytorch/pytorch/pull/4686)
- Fix sparse embedding backward when input contains only ``padding_idx`` [6211](https://github.com/pytorch/pytorch/pull/6211)
- Handle copying empty sparse tensors to/from CPU, GPU. [5361](https://github.com/pytorch/pytorch/pull/5361)

dataloader
- Add argument checks to the  ``torch.utils.data.Sampler`` classes, fixing a bug where ``DataLoader`` tries to load the entire dataset on non-integer ``batch_size``. [6249](https://github.com/pytorch/pytorch/pull/6249)
- Set ``dataloader.batch_size = None`` when batch_sampler is given, fixing a bug where ``DataLoader`` would report ``batch_size`` as ``1``. [6108](https://github.com/pytorch/pytorch/pull/6108)
- Improve signal handling in ``DataLoader`` [4643](https://github.com/pytorch/pytorch/pull/4643)
- Ignore ``FileNotFoundError`` when shutting down [5380](https://github.com/pytorch/pytorch/pull/5380)
- Make preprocessing deterministic [4640](https://github.com/pytorch/pytorch/pull/4640)

optim
- Cast tensors when loading optimizer state dicts to improve usability [3658](https://github.com/pytorch/pytorch/pull/3658)
- List model parameters in deterministic order to improve stability of ``load_state_dict()`` [6031](https://github.com/pytorch/pytorch/pull/6031)
- Add parameter range checks for all optimizers [6000](https://github.com/pytorch/pytorch/pull/6000)
- Fix ``AMSGrad`` mode for ``SparseAdam`` [4314](https://github.com/pytorch/pytorch/pull/4314)

distributed and multi-gpu
- Fix a number of distributed training errors caused by a detach in place error [5829](https://github.com/pytorch/pytorch/pull/5829)
- Don't modify requires_grad when running DataParallel in no_grad mode [5880](https://github.com/pytorch/pytorch/pull/5880)
- Add GPU guard for ``broadcast_coalesce`` for Distributed Data Parallel stability [5655](https://github.com/pytorch/pytorch/pull/5655)

1.0.0

Table of Contents

* **Highlights**
* JIT
* Brand New Distributed Package
* C++ Frontend [API Unstable]
* Torch Hub
* **Breaking Changes**
* **Additional New Features**
* N-dimensional empty tensors
* New Operators
* New Distributions
* Sparse API Improvements
* Additions to existing Operators and Distributions
* **Bug Fixes**
* Serious
* Backwards Compatibility
* Correctness
* Error checking
* Miscellaneous
* **Other Improvements**
* **Deprecations**
* CPP Extensions
* **Performance**
* **Documentation Improvements**

Highlights

JIT

The JIT is a set of compiler tools for bridging the gap between research in PyTorch
and production. It allows for the creation of models that can run without a dependency on the Python interpreter and which can be optimized more aggressively. Using program annotations existing models can be transformed into Torch Script, a subset of Python that PyTorch can run directly. Model code is still valid Python code and can be debugged with the standard Python toolchain. PyTorch 1.0 provides two ways in which you can make your existing code compatible with the JIT, using `torch.jit.trace` or `torch.jit.script`. Once annotated, Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.

python
Write in Python, run anywhere!
torch.jit.script
def RNN(x, h, W_h, U_h, b_h):
y = []
for t in range(x.size(0)):
h = torch.tanh(x[t]  W_h + h  U_h + b_h)
y += [h]
return torch.stack(y), h


As an example, see a tutorial on [deploying a seq2seq model](https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html),
[loading an exported model from C++](https://pytorch.org/tutorials/advanced/cpp_export.html), or [browse the docs](https://pytorch.org/docs/jit.html).

Brand New Distributed Package

The [torch.distributed](https://pytorch.org/docs/master/distributed.html) package and [torch.nn.parallel.DistributedDataParallel](https://pytorch.org/docs/master/nn.htmltorch.nn.parallel.DistributedDataParallel) module are backed by a brand new re-designed distributed library.  The main highlights of the new library are:
* New `torch.distributed` is performance driven and operates entirely asynchronously for all backends: `Gloo`, `NCCL`, and `MPI`.
* Significant Distributed Data Parallel performance improvements especially for hosts with slower networks such as ethernet-based hosts
* Adds async support for all distributed collective operations in the [torch.distributed](https://pytorch.org/docs/master/distributed.html) package.
* Adds the following CPU ops in the Gloo backend: [send](https://pytorch.org/docs/master/distributed.htmltorch.distributed.send), [recv](https://pytorch.org/docs/master/distributed.htmltorch.distributed.recv), [reduce](https://pytorch.org/docs/master/distributed.htmltorch.distributed.reduce), [all_gather](https://pytorch.org/docs/master/distributed.htmltorch.distributed.all_gather), [gather](https://pytorch.org/docs/master/distributed.htmltorch.distributed.gather), [scatter](https://pytorch.org/docs/master/distributed.htmltorch.distributed.scatter)
* Adds [barrier](https://pytorch.org/docs/master/distributed.htmltorch.distributed.barrier) op in the NCCL backend
* Adds [new_group](https://pytorch.org/docs/master/distributed.htmltorch.distributed.new_group) support for the NCCL backend

C++ Frontend _**[API Unstable]**_.

The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to `torch.nn`, `torch.optim`, `torch.data` and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

<p align="center">
<table align="center">
<tr><th>Python</th><th>C++</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
import torch
<br>
model = torch.nn.Linear(5, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
prediction = model.forward(torch.randn(3, 5))
loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1))
loss.backward()
optimizer.step()
</pre></sub></td>
<td><sub><pre lang="cpp">
include &lt;torch/torch.h&gt;
<br>
torch::nn::Linear model(5, 1);

1.0rc1

**This is a pre-release preview, do not rely on the tag to have a fixed set of commits, or rely on the tag for anything practical / important**

Table of Contents

* [Highlights](highlights)
* [JIT](jit)
* [torch.distributed new "C10D" library](torchdistributed-new-c10d-library)
* [C++ Frontend [API Unstable]](c-frontend-api-unstable)
* [Breaking Changes](breaking-changes)
* [Additional New Features](additional-new-features)
* [N-dimensional empty tensors](n-dimensional-empty-tensors)
* [New Operators](new-operators)
* [New Distributions](new-distributions)
* [Additions to existing Operators and Distributions](additions-to-existing-operators-and-distributions)
* [Bug Fixes](bug-fixes)
* [Serious](serious)
* [Backwards Compatibility](backwards-compatibility)
* [Correctness](correctness)
* [Error checking](error-checking)
* [Miscellaneous](miscellaneous)
* [Other Improvements](other-improvements)
* [Deprecations](deprecations)
* [CPP Extensions](cpp-extensions)
* [Performance](performance)
* [Documentation Improvements](documentation-improvements)

Highlights

JIT

The JIT is a set of compiler tools for bridging the gap between research in PyTorch
and production. It includes a language called Torch Script (don't worry it is a subset of Python,
so you'll still be writing Python), and two ways in which you can make your existing code compatible with the JIT.
Torch Script code can be aggressively optimized and it can be serialized for later use in our new C++ API, which doesn't depend on Python at all.

python
Write in Python, run anywhere!
torch.jit.script
def RNN(x, h, W_h, U_h, b_h):
y = []
for t in range(x.size(0)):
h = torch.tanh(x[t]  W_h + h  U_h + b_h)
y += [h]
return torch.stack(y), h


As an example, see a tutorial on [deploying a seq2seq model](https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html),
[loading an exported model from C++](https://pytorch.org/tutorials/advanced/cpp_export.html), or [browse the docs](https://pytorch.org/docs/master/jit.html).

torch.distributed new "C10D" library

The [torch.distributed](https://pytorch.org/docs/master/distributed.html) package and [torch.nn.parallel.DistributedDataParallel](https://pytorch.org/docs/master/nn.htmltorch.nn.parallel.DistributedDataParallel) module are backed by the new "C10D" library.  The main highlights of the new library are:
* C10D is performance driven and operates entirely asynchronously for all backends: `Gloo`, `NCCL`, and `MPI`.
* Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts
* Adds async support for all distributed collective operations in the [torch.distributed](https://pytorch.org/docs/master/distributed.html) package.
* Adds [send](https://pytorch.org/docs/master/distributed.htmltorch.distributed.send) and [recv](https://pytorch.org/docs/master/distributed.htmltorch.distributed.recv) support in the Gloo backend

C++ Frontend _**[API Unstable]**_.

The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to `torch.nn`, `torch.optim`, `torch.data` and other components of the Python frontend. Here is a minimal side-by-side comparison of the two language frontends:

<p align="center">
<table align="center">
<tr><th>Python</th><th>C++</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
import torch
<br>
model = torch.nn.Linear(5, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
prediction = model.forward(torch.randn(3, 5))
loss = torch.nn.functional.mse_loss(prediction, torch.ones(3, 1))
loss.backward()
optimizer.step()
</pre></sub></td>
<td><sub><pre lang="cpp">
include &lt;torch/torch.h&gt;
<br>
torch::nn::Linear model(5, 1);

0.5496

requires_grad=True)


*Version 1.2:*


>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([1., 1., 1., 1., 1.], requires_grad=True)




A number of deprecated Linear Algebra operators have been removed ([22841](https://github.com/pytorch/pytorch/pull/22841))

| Removed        | Use Instead  |
| ------------- | ------------- |
| `btrifact`    | `lu` |
| `btrifact_with_info`      | `lu` with `get_infos=True`      |
| `btrisolve` | `lu_solve`     |
| `btriunpack` | `lu_unpack`    |
| `gesv` | `solve`     |
| `pstrf` | `cholesky`     |
| `potrf` | `cholesky`     |
| `potri` | `cholesky_inverse`     |
| `potrs` | `cholesky_solve`     |
| `trtrs` | `triangular_solve`     |


Sparse Tensors: Changing the sparsity of a Tensor through `.data` is no longer supported.  ([17072](https://github.com/pytorch/pytorch/pull/17072))


>>> x = torch.randn(2,3)
>>> x.data = torch.sparse_coo_tensor((2, 3))
RuntimeError: Attempted to call `variable.set_data(tensor)`,
but `variable` and  `tensor` have incompatible tensor type.




Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself ([20614](https://github.com/pytorch/pytorch/pull/20614))

*Version 1.1:*


>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 1])

>>> s.coalesce().values().shape
torch.Size([1])


Notice `indices()` and `values()` reflect the resized tensor shapes.


*Version 1.2:*


>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 2])

>>> s.coalesce().values().shape
torch.Size([2])


Notice `indices()` and `values()` reflect the original tensor shapes.

Sparse Tensors: Accumulating dense gradients into a sparse `.grad` will no longer retain Python object identity.  ([17072](https://github.com/pytorch/pytorch/pull/17072))

*Version 1.1:*


>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
m_weight_grad_saved still refers to the .grad of m's weight
even though the sparsity has changed
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)


*Version 1.2:*


>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
m_weight_grad_saved NO LONGER refers to the .grad of m's weight
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)
AssertionError




`nn.utils.convert_sync_batchnorm` has been replaced with `nn.SyncBatchNorm.convert_sync_batchnorm `([18787)](https://github.com/pytorch/pytorch/pull/18787)

Example of new usage:


>>>  Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>>     torch.nn.Linear(20, 100),
>>>     torch.nn.BatchNorm1d(100)
>>> ).cuda()
>>>  creating process group (optional)
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)


Error Checking: `torch.addcmul` and `torch.lerp` operators enforce stronger shape requirements on the output tensor (`out=` keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

*Version 1.1:*


>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
tensor([[0., 0., 0.],
[0., 0., 0.]])


*Version 1.2:*


>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]


If you run into this error, please ensure the `out` parameter is of the correct output shape (post-broadcasting).

Error Checking: Improved Variable version tracking ([20391](https://github.com/pytorch/pytorch/pull/20391), [22821](https://github.com/pytorch/pytorch/pull/22821), [21865](https://github.com/pytorch/pytorch/pull/21865))

PyTorch’s autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they haven’t been updated in-place since they were saved).  See [In Place Correctness Checks](https://pytorch.org/docs/stable/notes/autograd.htmlin-place-correctness-checks) in the docs for more information.

In PyTorch 1.2 we have enhanced the version tracking in a number of cases, which may flag issues that were not caught previously.  There is now additional tracking through the `Variable()` constructor, the `nn.Parameter()` constructor, after setting `.data`, and via `nn.Module._apply` (internal API).

*Track changes through Variable constructor:*


>>> x = torch.ones(1, requires_grad=True)+1
>>> y = x*x

do an in-place update through Variable constructor
>>> torch.autograd.Variable(x).add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.


*Track changes on an nn.Parameter:*


>>> x = torch.ones(1)
>>> p = torch.nn.Parameter(x)
>>> y = p * p

do an in-place update on a saved Parameter
>>> x.add_(1)
>>> y.sum().backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0
instead.


*Track changes after setting `.data`:*


>>> x = torch.zeros(1, requires_grad=True)+1
>>> y = x * x
>>> x.data = torch.zeros(1, requires_grad=True)+1

>>> x.add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
is at version 1; expected version 0 instead.


[JIT] Python called from scripted modules must be `ignore`d

`torch.jit.script` now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly `ignore` it. See [the new API guide](https://pytorch.org/docs/master/jit.htmlmigrating-to-pytorch-1-2-recursive-scripting-api) for more details.

*Version 1.1*


def my_unscriptable_python_fn():
weird stuff

torch.jit.script
def fn():
This gets inserted as a Python call, and only errors on `save()`.
my_unscriptable_python_fn()


*Version 1.2*


torch.jit.ignore   this needs to be added ...
def my_unscriptable_python_fn():
...

torch.jit.script
def fn():
... or else recursive compilation will attempt to compile this call
my_unscriptable_python_fn()


NOTE: This is also a change to behavior of the `torch.jit.ignore` decorator. In version 1.1, `ignore` tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 `ignore`, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.

To get the old behavior, use `torch.jit.ignore(drop_on_export=True)` (`torch.jit.ignore` with no arguments is equivalent to `torch.jit.ignore(drop_on_export=False`)).

[JIT] `optimize` for ScriptModules is now a context manager

Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).

*Version 1.1*


torch.jit.script(optimize=False)
def fn(inputs):
...

fn(inputs)


*Version 1.2*


torch.jit.script
def fn(inputs):
...

with torch.jit.optimized_execution(False):
fn(inputs)


[jit] `script::Module` is now a reference type

To better align with the [PyTorch C++ API philosophy](https://github.com/pytorch/pytorch/wiki/Writing-Python-in-cpp-(a-manifesto)), `script::Module` and `script::Method` are now reference types. Our APIs have been updated to use `script::Module` instead of `std::shared_ptr<script::Module>`.

*Version 1.1*


using torch::jit::script::Module;

std::shared_ptr<Module> m = torch::jit::load("my_model.py");
m->forward(...);


*Version 1.2*


using torch::jit::script::Module;

Module m = torch::jit::load("my_model.py");
m.forward(...);


[C++ only] mean() / sum() / prod() APIs have changed slightly ([21088](https://github.com/pytorch/pytorch/pull/21088))

*Version 1.1 API*:


Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;


*Version 1.2 API*:


Tensor sum(IntArrayRef dim, bool keepdim=false,
c10::optional<ScalarType> dtype=c10::nullopt) const;


that is, to override `dtype`, `keepdim` must now be provided.

Binary distribution and nightly changes

We have streamlined our conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on https://pytorch.org/ have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes we made:

**Wheels now have local version identifiers.** Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URL—just specify an appropriate version constraint like `torch==1.2.0+cu92`.

*Version 1.1 (for Python 3.7 on Linux only):*


pip install numpy
pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp37-cp37m-linux_x86_64.whl


*Version 1.2 (works for all versions of Python, and both Linux and Mac):*


pip install torch==1.2.0+cpu -f https://download.pytorch.org/whl/torch_stable.html


**CPU-only binaries on conda can be selected with the cpuonly feature.** We’ve eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.

*Version 1.1:*


conda install -c pytorch pytorch-cpu


*Version 1.2:*


conda install -c pytorch pytorch cpuonly


**Conda nightlies now live in the pytorch-nightly channel and no longer have “-nightly” in their name.** We have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, when we had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.

*Version 1.1:*


conda install -c pytorch pytorch-nightly


*Version 1.2:*


conda install -c pytorch-nightly pytorch


**Wheel nightlies no longer have -nightly in their name.** Similar to the changes we made in Conda, we no longer suffix wheel nightlies with “-nightly”, to make it harder to accidentally install a copy of nightly and stable at the same time.

*Version 1.1:*


pip install --pre torch_nightly -f https://download.pytorch.org/whl/nightly/torch_nightly.html


*Version 1.2:*


pip install --pre torch -f https://download.pytorch.org/whl/nightly/torch_nightly.html


New Features

Tensor Type Support

* `torch.bool`: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity with `torch.uint8`.  See the **Breaking Changes** section for details about how this could affect existing programs. ([21032](https://github.com/pytorch/pytorch/pull/21032), etc.)
* `torch.sparse.HalfTensor`: Added support for `torch.float16` sparse Tensors on both CPU and CUDA.  ([19695](https://github.com/pytorch/pytorch/pull/19695))
* `torch.bfloat16`: Added basic creation and serialization support for [Brain Floating Point Tensors](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). ([21522](https://github.com/pytorch/pytorch/pull/21522), [21523](https://github.com/pytorch/pytorch/pull/21523), [21860](https://github.com/pytorch/pytorch/pull/21860), [22852](https://github.com/pytorch/pytorch/pull/22852))

NN Package

* `nn.Transformer`: added implementation of Transformer from [Attention is All You Need](https://arxiv.org/abs/1706.03762). ([20170](https://github.com/pytorch/pytorch/pull/20170), [22588](https://github.com/pytorch/pytorch/pull/22588))
* `nn.Embedding`: support `float16` embeddings on CUDA.  ([19695](https://github.com/pytorch/pytorch/pull/19695))
* `nn.Flatten`: added a Module that performs `torch.flatten`. ([22245](https://github.com/pytorch/pytorch/pull/22245/))
* `nn.functional.gelu`: Added support for [Gaussian Error Linear Units](https://arxiv.org/abs/1606.08415). ([20665](https://github.com/pytorch/pytorch/pull/20665), [21237](https://github.com/pytorch/pytorch/pull/21237))
* `nn.Module hooks`: add ability to replace input/output via `forward_pre_hook` and `forward_hook`. ([22285](https://github.com/pytorch/pytorch/pull/22285))
* `nn.Module`: add `requires_grad_() `method for turning on/off `requires_grad` for Module parameters. ([22576](https://github.com/pytorch/pytorch/pull/22576))

Operators

* `Tensor.to_sparse`: now supports autograd. ([20458](https://github.com/pytorch/pytorch/pull/20458))
* `Tensor.fill_diagonal_`: operator to fill the main diagonal of a Tensor. ([21892](https://github.com/pytorch/pytorch/pull/21892))
* `torch.qr`: supports autograd. ([21274](https://github.com/pytorch/pytorch/pull/21274))
* `torch.bitwise_not`: add operator for boolean/integer types.  Also have python `~` operator use this. ([22283](https://github.com/pytorch/pytorch/pull/22283), [22320](https://github.com/pytorch/pytorch/pull/22320))
* `torch.trapz`: integrate using the trapezoid rule; equivalent to [numpy.trapz](https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html). ([21610](https://github.com/pytorch/pytorch/pull/21610))
* `torch.var_mean` / `torch.std_mean`: compute variance and mean at the same time.([18731](https://github.com/pytorch/pytorch/pull/18731))
* `torch.utils.ThroughputBenchmark`: benchmark utility for measuring the throughput of PyTorch operators. ([20766](https://github.com/pytorch/pytorch/pull/20766)).
* `Logging`: lightweight at-most-once logging to record operators that are used (`c10::Logging`). ([20745](https://github.com/pytorch/pytorch/pull/20745))

Optim Package

* `optim.AdamW`: introduce AdamW optimizer from [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). ([21250](https://github.com/pytorch/pytorch/pull/21250))
* `optim.LBFGS`: added support for strong Wolfe line search. ([8824](https://github.com/pytorch/pytorch/pull/8824))

Distributed Package

* `DistributedDataParallel`: support CPU modules.  ([20236](https://github.com/pytorch/pytorch/pull/20236))
* `DistributedDataParallel`: support sparse tensors. ([19146](https://github.com/pytorch/pytorch/pull/19146))
* `DistributedDataParallel`: support local gradient accumulation. ([21736](https://github.com/pytorch/pytorch/pull/21736))

IterableDataset

* `IterableDataset`: introduces a new type of Dataset designed for data read from a stream. ([19228](https://github.com/pytorch/pytorch/pull/19228))

Tensorboard Package

* TensorBoard support in PyTorch has improved and is no longer experimental!
* `SummaryWriter.flush`: now supported. ([20607](https://github.com/pytorch/pytorch/pull/20607))
* `SummaryWriter.add_mesh`: add support for 3D point clouds. ([20413](https://github.com/pytorch/pytorch/pull/20413))

JIT Features

* Improved support for iterator infrastructure. TorchScript now supports looping through a `List`, `Tuple`, `Dict`, `Tensor`, `String` and you can also use `zip()`, `enumerate()`, and `for...in`. ([21801](https://github.com/pytorch/pytorch/pull/21801), [22006](https://github.com/pytorch/pytorch/pull/22006), [21990](https://github.com/pytorch/pytorch/pull/21990), [21985](https://github.com/pytorch/pytorch/pull/21985))
* Support `in` membership checks. ([21527](https://github.com/pytorch/pytorch/pull/21527))
* Improved support for strings and the string libraries. ([20826](https://github.com/pytorch/pytorch/pull/20826), [20188](https://github.com/pytorch/pytorch/pull/20188), [20761](https://github.com/pytorch/pytorch/pull/20761), [21656](https://github.com/pytorch/pytorch/pull/21656), [20617](https://github.com/pytorch/pytorch/pull/20617))
* Improved `math` support. ([20979](https://github.com/pytorch/pytorch/pull/20979), [19707](https://github.com/pytorch/pytorch/pull/19707), [21151](https://github.com/pytorch/pytorch/pull/21151), [21131](https://github.com/pytorch/pytorch/pull/21131), [21129](https://github.com/pytorch/pytorch/pull/21129), [21130](https://github.com/pytorch/pytorch/pull/21130), [21512](https://github.com/pytorch/pytorch/pull/21512), [21126](https://github.com/pytorch/pytorch/pull/21126), [21127](https://github.com/pytorch/pytorch/pull/21127), [21128](https://github.com/pytorch/pytorch/pull/21128))
* Support for various other Python builtin functions. ([21451](https://github.com/pytorch/pytorch/pull/21451))
* Support for `NamedTuple`. ([21428](https://github.com/pytorch/pytorch/pull/21428))
* All the rest of the `dict` methods. ([21979](https://github.com/pytorch/pytorch/pull/21979))
* `sorted()` keyword for lists and dicts. ([23274](https://github.com/pytorch/pytorch/pull/23274))
* Add support for breaks and continues. ([21692](https://github.com/pytorch/pytorch/pull/21692))
* Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports `torch::List`, `torch::Dict` and `torch::Optional`, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator).
* Support `nn.GRU` in script. ([23266](https://github.com/pytorch/pytorch/pull/23266))
* Support `pack_padded_sequence` and `pad_packed_sequence`. ([23249](https://github.com/pytorch/pytorch/pull/23249))
* Support `torch._C._get_tracing_state` in TorchScript. ([23248](https://github.com/pytorch/pytorch/pull/23248))
* Support `torch.as_tensor` in TorchScript. ([23247](https://github.com/pytorch/pytorch/pull/23247))
* add support for recursive compilation on `Modules`. ([20708](https://github.com/pytorch/pytorch/pull/20708))
* add `all` builtin. ([20521](https://github.com/pytorch/pytorch/pull/20521))
* Add `Final[T]` annotated members to `__constants__`. ([21603](https://github.com/pytorch/pytorch/pull/21603))
* Add `save()` to scripted `Function`s. ([20386](https://github.com/pytorch/pytorch/pull/20386))
* Support for serializing class attributes. ([22953](https://github.com/pytorch/pytorch/pull/22953))
* Support for class annotations. ([21379](https://github.com/pytorch/pytorch/pull/21379))
* support Python 3.8 `Constant` node. ([22007](https://github.com/pytorch/pytorch/pull/22007))
* Support for type annotations instead of `torch.jit.annotate()`. ([21390](https://github.com/pytorch/pytorch/pull/21390))
* Support operator overloading for user-defined classes. ([20033](https://github.com/pytorch/pytorch/pull/20033))
* Support recursive `ModuleList` / `Sequential`. ([21306](https://github.com/pytorch/pytorch/pull/21306))
* Trace multiple methods in a single `Module`. ([19905](https://github.com/pytorch/pytorch/pull/19905))

Improvements

* `Tensor.pin_memory()`: only ask for context on current device. ([22229](https://github.com/pytorch/pytorch/pull/22229))
* `Tensor.view()`: suggest using `reshape()` instead of `contiguous()` when the input is non-contiguous. ([20968](https://github.com/pytorch/pytorch/pull/20968))
* `Tensor.numpy()`: throw `TypeError` instead of `ValueError` if the type isn’t supported. ([21608](https://github.com/pytorch/pytorch/pull/21608))
* `torch.norm`: add support for `p="nuc"` with `dim` specified. ([21022](https://github.com/pytorch/pytorch/pull/21022))
* `torch.qr`: support batching of input matrices. ([20689](https://github.com/pytorch/pytorch/pull/20689))
* `torch.qr`: support `some` parameter akin to NumPy's `mode` option. ([20689](https://github.com/pytorch/pytorch/pull/20689))
* `torch.det` / `torch.logdet` / `torch.slogdet`: added batching support. ([22909](https://github.com/pytorch/pytorch/pull/22909))
* `torch.cdist`: support batching. ([20934](https://github.com/pytorch/pytorch/pull/20934))
* `torch.symeig`: support batching. ([21858](https://github.com/pytorch/pytorch/pull/21858))
* `torch._dirichlet_grad`: support CUDA. ([21191](https://github.com/pytorch/pytorch/pull/21191))
* `torch.randperm`: support `torch.float16`. ([22102](https://github.com/pytorch/pytorch/pull/22102))
* `torch.Size` is now pickle-able in Python2. ([20952](https://github.com/pytorch/pytorch/pull/20952))
* `torch.tensor` / `torch.as_tensor`: infer device if input supports Numba’s [`__cuda_array_interface__`](https://numba.pydata.org/numba-doc/latest/cuda/cuda_array_interface.html). ([20584](https://github.com/pytorch/pytorch/pull/20584))
* `torch.isinf` / `torch.isfinite`: throw `TypeError` instead of `ValueError` when a non-tensor is passed in. ([20817](https://github.com/pytorch/pytorch/pull/20817))
* `nn.MultiheadedAttention`: add functional support. ([20415](https://github.com/pytorch/pytorch/pull/20415))
* `nn.MultiheadedAttention`: added support for key/value to have different number of features. ([21288](https://github.com/pytorch/pytorch/pull/21288))
* `nn.MultiheadAttention`: allow static key/values. ([21288](https://github.com/pytorch/pytorch/pull/21288))
* `nn.Conv{1,2,3}D`: support `torch.int64` dtype in forward. ([20730](https://github.com/pytorch/pytorch/pull/20730), [22594](https://github.com/pytorch/pytorch/pull/22594))
* `nn.AvgPool{1,2,3}D`: support `torch.int64` dtype in forward. ([22433](https://github.com/pytorch/pytorch/pull/22433))
* `nn.Module`: make `_save_to_state_dict` overrideable. ([21933](https://github.com/pytorch/pytorch/pull/21933))
* `autograd`: Checkpointing of modules inside large fanout networks no longer hits a recursion error. ([22397](https://github.com/pytorch/pytorch/pull/22397))
* `autograd`: Track in-pace changes of Tensors through `Module._apply` (internal API). ([21865](https://github.com/pytorch/pytorch/pull/21865))
* `autograd.profiler`: Add shape aggregation support.  [20035](https://github.com/pytorch/pytorch/pull/20035))
* `autograd.profiler`: Profile custom c10 ops. ([20175](https://github.com/pytorch/pytorch/pull/20175))
* `DataLoader`: support setting `batch_size=0` to disable automatic batching (collation) in `DataLoader` for easier bulk loading.  ([19228](https://github.com/pytorch/pytorch/pull/19228))
* `DataLoader`: add `multiprocessing_context` parameter. ([22990](https://github.com/pytorch/pytorch/pull/22990))
* `DataLoader`: added error detection for `worker_init_fn`. ([20150](https://github.com/pytorch/pytorch/pull/20150))
* `DataLoader`: Retry on `EINTR`. ([21723](https://github.com/pytorch/pytorch/pull/21723))
* `torch.cuda.set_rng_state` / `torch.cuda.get_rng_state`: accept string as `device` parameter. ([23448](https://github.com/pytorch/pytorch/pull/23448))
* `CUDA`: add warning when using Turing GPUs and CUDA <= 9000. ([21468](https://github.com/pytorch/pytorch/pull/21468))
* `CUDA`: warn on conditions that can trigger a cuBLAS 9.0 bug.  ([22034](https://github.com/pytorch/pytorch/pull/22034))
* `CPU`: Improve CPUAllocator OOM message. ([20618](https://github.com/pytorch/pytorch/pull/20618))
* `[memory_format]`: added support for `torch.empty`, `torch.empty_like`, `Tensor.contiguous()`, `Tensor.is_contiguous()` to specify / check the order in which dimensions are laid out in memory. ([20455](https://github.com/pytorch/pytorch/pull/20455), [20558](https://github.com/pytorch/pytorch/pull/20558))
* `distributions.MultivariateNormal`: fix precision matrix instability. ([21366](https://github.com/pytorch/pytorch/pull/21366))
* `distributions.transforms.SigmoidTransform`: fix numerical instability. ([19802](https://github.com/pytorch/pytorch/pull/19802))

Distributed Improvements

* `DistributedDataParallel`: Support DDP forward/backward calls even if no module parameter is used. ([19821](https://github.com/pytorch/pytorch/pull/19821))
* `DistributedDataParallel`: Only call into reducer if grad is enabled. ([19897](https://github.com/pytorch/pytorch/pull/19897))
* `DistributedDataParallel`:  Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. ([19901](https://github.com/pytorch/pytorch/pull/19901))
* `DistributedDataParallel`: Improve DDP backward reduction error messages. ([20586](https://github.com/pytorch/pytorch/pull/20586))
* `DistributedDataParallel`:  make DDP failure recoverable. ([21591](https://github.com/pytorch/pytorch/pull/21591))
* `DistributedDataParallel`:  Delay reduction of unused parameters until first autograd hook is called. ([22219](https://github.com/pytorch/pytorch/pull/22219))
* `c10d:` support tensors shared across processes. ([21449](https://github.com/pytorch/pytorch/pull/21449))
* `c10d:` `ProcessGroupMPI` Add device guard around MPI operations. ([22446](https://github.com/pytorch/pytorch/pull/22446))
* `utils.data.distributed.DistributedSampler`: Make shuffling optional. ([22479](https://github.com/pytorch/pytorch/pull/22479))

Tensorboard Improvements

* Usage of kwarg-only arguments has been removed. ([21786](https://github.com/pytorch/pytorch/pull/21786))

Numpy Compatibility Improvements

* `Tensor.T:` added numpy-like support for reversing dimensions. ([20598](https://github.com/pytorch/pytorch/pull/20598))
* `Tensor.ndim`: NumPy equivalent property for the number of dimensions. ([20565](https://github.com/pytorch/pytorch/pull/20565))
* `Tensor.nonzero`: added `as_tuple` argument (default `False`) that when `True`, will return a tuple of Tensors, which matches the behavior of [numpy.nonzero](https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html). ([20293](https://github.com/pytorch/pytorch/pull/20293))
* `torch.dtype`: support passing in NumPy dtypes as arguments. ([21215](https://github.com/pytorch/pytorch/pull/21215))
* `torch.normal`: add `size` parameter when called with two floats. ([20545](https://github.com/pytorch/pytorch/pull/20545))
* `torch.where`: add one-argument overload that is an alias for Numpy-like `nonzero`. ([21986](https://github.com/pytorch/pytorch/pull/21986))
* support a number of argument name overrides, e.g. `axis` instead of `dim`. ([20451](https://github.com/pytorch/pytorch/pull/20451))

JIT Improvements

* The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. ([22177](https://github.com/pytorch/pytorch/pull/22177), [22178](https://github.com/pytorch/pytorch/pull/22178), [22179](https://github.com/pytorch/pytorch/pull/22179), [22180](https://github.com/pytorch/pytorch/pull/22180))
* Error message source range highlighting now includes filename, line number, and column number. ([21157](https://github.com/pytorch/pytorch/pull/21157))
* Better Constant Propagation through Tuples. ([22561](https://github.com/pytorch/pytorch/pull/22561))
* Add `start` and `step` parameters for `range` in TorchScript. ([20795](https://github.com/pytorch/pytorch/pull/20795))
* Support for threading options for TorchScript inference ([doc](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html))
* Add `max_pool2d` to symbolic derivatives. ([19661](https://github.com/pytorch/pytorch/pull/19661))
* Optimize `matmul` memory usage for certain cases. ([23433](https://github.com/pytorch/pytorch/pull/23433))
* Avoid kernel launches for zero-sized tensor inputs. ([22790](https://github.com/pytorch/pytorch/pull/22790))
* Add support for steps (strides) in tensor slices. ([20929](https://github.com/pytorch/pytorch/pull/20929))
* Added error for classes that don't have an `__init__` function. ([21880](https://github.com/pytorch/pytorch/pull/21880))
* Allow classes to be used in their own methods. ([20106](https://github.com/pytorch/pytorch/pull/20106))
* Better error message when a variable is conditionally defined. ([20911](https://github.com/pytorch/pytorch/pull/20911))
* Consider contained types in alias analysis. ([21431](https://github.com/pytorch/pytorch/pull/21431))
* Convenience APIs for script objects. ([20226](https://github.com/pytorch/pytorch/pull/20226))
* Don't print backtrace for interpreter errors. ([20925](https://github.com/pytorch/pytorch/pull/20925))
* Improve error msg for missing attribute. ([20779](https://github.com/pytorch/pytorch/pull/20779))
* Improve error msg on inferred type. ([21058](https://github.com/pytorch/pytorch/pull/21058))
* Improve error msg on recursive class defs. ([21842](https://github.com/pytorch/pytorch/pull/21842))
* Include module names in recursive error stacks. ([22921](https://github.com/pytorch/pytorch/pull/22921))
* Improve recursive scripting error message. ([21841](https://github.com/pytorch/pytorch/pull/21841))
* Index into a tuple with non constant integer. ([20081](https://github.com/pytorch/pytorch/pull/20081))
* Let `ScriptModule` buffer attributes can also cast device/type. ([19700](https://github.com/pytorch/pytorch/pull/19700))
* Lower batchmm to non-diff optimization. ([19987](https://github.com/pytorch/pytorch/pull/19987))
* Make `ScriptModule.training` an attribute instead of a parameter. ([21078](https://github.com/pytorch/pytorch/pull/21078))
* Make `strtod_c` compatible with different gcc abi. ([21293](https://github.com/pytorch/pytorch/pull/21293))
* make magic methods work with casts too. ([20654](https://github.com/pytorch/pytorch/pull/20654))
* Improve performance of alias analysis. ([20899](https://github.com/pytorch/pytorch/pull/20899))
* Print a warning if a type annotation prefix is invalid according to mypy. ([20884](https://github.com/pytorch/pytorch/pull/20884))
* schema_matching.cpp: improve error messages. ([21141](https://github.com/pytorch/pytorch/pull/21141))
* Resolve with closed over variables instead of stack frame. ([22270](https://github.com/pytorch/pytorch/pull/22270))
* Report errors through call stack. ([22280](https://github.com/pytorch/pytorch/pull/22280))
* Reduce number of stack manipulation instructions in interpreter. ([21240](https://github.com/pytorch/pytorch/pull/21240))

C++ API Improvements

* `nn::PoissonNLLLoss`: Added support. ([19316](https://github.com/pytorch/pytorch/pull/19316))
* `nn::Module`: added `replace_module` API to overwrite submodules in C++ Frontend. ([22546](https://github.com/pytorch/pytorch/pull/22546))
* `nn:Module::register_module` / `register_parameter` / `register_buffer`: make public ([23196](https://github.com/pytorch/pytorch/pull/23196))
* `data::datasets::ChunkDataReader`: fix include headers and a vector issue. ([19485](https://github.com/pytorch/pytorch/pull/19485))
* `data::datasets::ChunkDataset`: add new `get_batch` method. ([21797](https://github.com/pytorch/pytorch/pull/21797))
* `data::datasets::ChunkDataset`: add checkpoint support. ([21889](https://github.com/pytorch/pytorch/pull/21889))
* `data::datasets::ChunkDataset`: add support for cross-chunk shuffling. ([22347](https://github.com/pytorch/pytorch/pull/22347))
* `data::datasets::ChunkDataset`: add sorting policy. ([23053](https://github.com/pytorch/pytorch/pull/23053))

MKLDNN Tensor Improvements

Add support for a number of operators on MKLDNN Tensors including:
* `Tensor.is_mkldnn`: ([22386](https://github.com/pytorch/pytorch/pull/22386))
* `Tensor.transpose()`: ([21943](https://github.com/pytorch/pytorch/pull/21943))
* `Tensor.zero_()`: ([20573](https://github.com/pytorch/pytorch/pull/20573))
* `torch.empty`: ([21184](https://github.com/pytorch/pytorch/pull/21184))
* `torch.mul`: ([20575](https://github.com/pytorch/pytorch/pull/20575))
* `nn.AdaptiveAvgPool{1,2,3}D`: ([19818](https://github.com/pytorch/pytorch/pull/19818))
* `nn.Sigmoid`: ([20820](https://github.com/pytorch/pytorch/pull/20820))
* `nn.Softmax`: ([21516](https://github.com/pytorch/pytorch/pull/21516))
* `nn.Module`: support saving/loading MKLDNN modules. ([20799](https://github.com/pytorch/pytorch/pull/20799))
* `nn.MaxPool{1,2,3}D`: support `ceil_mode`. ([21310](https://github.com/pytorch/pytorch/pull/21310))

Bug Fixes

* Indexing: fix advanced indexing where there are more than (2^31)-1 bytes in the output. ([20919](https://github.com/pytorch/pytorch/pull/20919))
* Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. ([23123](https://github.com/pytorch/pytorch/pull/23123))
* Indexing: fix issue with slicing empty tensors. ([20914](https://github.com/pytorch/pytorch/pull/20914))
* `Tensor.index_copy_:` fix segfault by properly checking dimension is in range. ([21617](https://github.com/pytorch/pytorch/pull/21617))
* `Tensor.copy_`: Fix a bug where non-blocking was not being respected. ([20305](https://github.com/pytorch/pytorch/pull/20305))
* `Tensor.clone`: Fix an issue with MKLDNN tensors. ([20943](https://github.com/pytorch/pytorch/pull/20943))
* Tensor subclassing: give a proper error instead of crashing. ([20283](https://github.com/pytorch/pytorch/pull/20283))
* `torch.cat`:  Fix segfault with tensors that can't be indexed with 32-bit ints. ([21530](https://github.com/pytorch/pytorch/pull/21530))
* `torch.range` / `torch.linspace` / `torch.logspace`: properly respect the current `Stream`. ([21619](https://github.com/pytorch/pytorch/pull/21619))
* `torch.lu`: return the identity permutation instead of zeros when not using pivoting. ([22242](https://github.com/pytorch/pytorch/pull/22242))
* `torch.einsum`: Fix an issue where the backward pass would potentially be skipped. ([22111](https://github.com/pytorch/pytorch/pull/22111))
* `torch.cosh`: Fix an issue where `torch.cos` was instead calculated with `torch.double` dtype and vectorized instructions. ([20797](https://github.com/pytorch/pytorch/pull/20797))
* `torch.triu` / `torch.tril`: handle strides correctly for in-place versions. ([22730](https://github.com/pytorch/pytorch/pull/22730)).
* `torch.triu` / `torch.tril`: Fix handling of batches > 65535 on CUDA. ([21067](https://github.com/pytorch/pytorch/pull/21067))
* `torch.inverse` / `torch.solve` / `torch.cholesky_solve` /  `torch.triangular_solve`: Fix batch sizes > 65535 on CUDA. ([21689](https://github.com/pytorch/pytorch/pull/21689))
* `torch.histc`: return `dtype` is now the same as the input tensor on CUDA, matching CPU behavior. ([20369](https://github.com/pytorch/pytorch/pull/20369))
* `torch.histc`: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. ([21497](https://github.com/pytorch/pytorch/pull/21497))
* `torch.randperm`: handle non-contiguous `out` parameter. ([23043](https://github.com/pytorch/pytorch/pull/23043))
* `torch.unique`: Fix empty tensor handling when `dim` is passed as an argument. ([19000](https://github.com/pytorch/pytorch/pull/19000))
* `torch.min` / `torch.max`: properly error on empty tensor inputs, as with CPU tensors. ([19612](https://github.com/pytorch/pytorch/pull/19612)).
* `CUDA`: fix launch parameters for reductions. ([22827](https://github.com/pytorch/pytorch/pull/22827)).
* `torch.hub`: fix an issue with `find_module`. ([20782](https://github.com/pytorch/pytorch/pull/20782))
* `autograd`: Fix a number of custom autograd `Function` corner cases by inverting the relationship between PyFunction and THPFunction. ([22983](https://github.com/pytorch/pytorch/pull/22983))
* `autograd`: give “Trying to backward through the graph a second time" error instead of internal assert when the buffers are a list of Tensors (with indexing). ([21533](https://github.com/pytorch/pytorch/pull/21533))
* `optim.lr_scheduler.CosineAnnealingLR`: rename from CosineAnnealingLr. ([23242](https://github.com/pytorch/pytorch/pull/23242))
* `distributions.Binomial`: Fix overflow of `log_prob` when `logits` is large. ([20679](https://github.com/pytorch/pytorch/pull/20679))
* `distributions.SigmoidTransform`: Fix numerical issues that could result in `inf` / `-inf` return values. ([20288](https://github.com/pytorch/pytorch/pull/20288))
* `distributions.Categorical.sample`: fix a view bug. ([23328](https://github.com/pytorch/pytorch/pull/23328))
* `CUDA`: Give proper error message for bad cuda forks. ([23322](https://github.com/pytorch/pytorch/pull/23322))
* `pickle`: Fix Unpickling error when loading multiple objects from a file. ([20270](https://github.com/pytorch/pytorch/pull/20270))
* `NCCL`: Fix race condition. ([23040](https://github.com/pytorch/pytorch/pull/23040))

torch.nn Bug Fixes

* `nn.Conv{1,2,3}D`: fix memory leak on MKLDNN code path. ([22392](https://github.com/pytorch/pytorch/pull/22392))
* `nn.Conv{1,2,3}D`: properly unpickle older pickled versions. ([21687](https://github.com/pytorch/pytorch/pull/21687))
* `nn.CTCLoss`: fix backward on CUDA when 2d target tensor is larger than `max_target_length`. ([20971](https://github.com/pytorch/pytorch/pull/20971))
* `nn.CTCLoss`: fix some numerical stability issues. ([21392](https://github.com/pytorch/pytorch/pull/21392))
* `nn.CTCLoss`: disable buggy non-deterministic CudNN algorithm. ([22977](https://github.com/pytorch/pytorch/pull/22977))
* `nn.CTCLoss`: fixed empty target handling. ([21910](https://github.com/pytorch/pytorch/pull/21910), [23298](https://github.com/pytorch/pytorch/pull/23298))
* `nn.SyncBatchNorm`: fix syncing of running statistics when count size differs between GPUs. ([22248](https://github.com/pytorch/pytorch/pull/22248))
* `nn.SyncBatchNorm`: retain `requires_grad` value when converting from `nn.BatchNorm`. ([22569](https://github.com/pytorch/pytorch/pull/22569))
* `nn.SyncBatchNorm`: correctly handle `process_group` in `convert_sync_batchnorm`. ([19240](https://github.com/pytorch/pytorch/pull/19240))
* `nn.MultiheadedAttention`: fix for `torch.float16` dtype. ([21658](https://github.com/pytorch/pytorch/pull/21658)).
* `nn.EmbeddingBag`: fix NaN output when input is empty. ([21400](https://github.com/pytorch/pytorch/pull/21400))
* `nn.Dropout`: fix python crash (with SIGFPE) when called on an empty cuda tensor. ([20541](https://github.com/pytorch/pytorch/pull/20541))
* `nn.MaxPool`: fix output size calculation in some corner cases. ([22304](https://github.com/pytorch/pytorch/pull/22304))
* `nn.MaxPool`: return valid indices if all entries are `-inf`. ([23161](https://github.com/pytorch/pytorch/pull/23161))
* `nn.Softmax`: respect the current Stream. ([22470](https://github.com/pytorch/pytorch/pull/22470))
* `nn.LogSoftmax`: fix numerical stability issues. ([21672](https://github.com/pytorch/pytorch/pull/21672))
* `nn.Module.load_state_dict`: break ref cycle. ([20397](https://github.com/pytorch/pytorch/pull/20397))
* `nn.Module`: fix loading in 32-bit environments. ([20900](https://github.com/pytorch/pytorch/pull/20900))
* `nn.utils.rnn.pack_padded_sequence`: Fix segfault on empty tensors. ([21461](https://github.com/pytorch/pytorch/pull/21461))
* `nn.utils.spectral_norm`: fix loading `state_dict` when `strict=False`. ([22545](https://github.com/pytorch/pytorch/pull/22545))
* `CudNN`: Fix uninitialized PoolWindow on Windows. ([22405](https://github.com/pytorch/pytorch/pull/22405))

Distributed Bug fixes

* `nn.parallel.DataParallel`: fix error in `no_grad` mode. ([21262](https://github.com/pytorch/pytorch/pull/21262))
* `torch.distributed.all_gather`: fix errors for views and aliases. ([21490](https://github.com/pytorch/pytorch/pull/21490))
* `c10d`: fix collective communication errors on empty tensors. ([20658](https://github.com/pytorch/pytorch/pull/20658))

JIT Bug Fixes

* Fix specialized list from dict keys. ([23267](https://github.com/pytorch/pytorch/pull/23267))
* Switch keys to be sequential and stable in pickle serialization. ([23280](https://github.com/pytorch/pytorch/pull/23280))
* `deepCopy` also copies type information of lists, ([23271](https://github.com/pytorch/pytorch/pull/23271))
* `dictKeys` and `dictItems` ops on typed dicts return typed lists. ([23270](https://github.com/pytorch/pytorch/pull/23270))
* Fix pickler bug where it would not load if no tensors were saved. ([23263](https://github.com/pytorch/pytorch/pull/23263))
* Avoid multiple writes to files on export. ([21186](https://github.com/pytorch/pytorch/pull/21186))
* Better error msg for mismatched `dict` key type. ([22231](https://github.com/pytorch/pytorch/pull/22231))
* Better error msg for using Python `builtin_function_or_method`. ([22935](https://github.com/pytorch/pytorch/pull/22935))
* Better error msg in `__get_state__` to let a user know that ScriptModules can't be deep-copied at the moment.([20885](https://github.com/pytorch/pytorch/pull/20885))
* Better error msg when seeing a unsupported builtin function. ([21068](https://github.com/pytorch/pytorch/pull/21068))
* `dropout` derivative should respect the `train` flag. ([20760](https://github.com/pytorch/pytorch/pull/20760))
* Fix `__constants__` for some nn modules. ([21071](https://github.com/pytorch/pytorch/pull/21071))
* Fix `ScriptModule.__dir__()`. ([22426](https://github.com/pytorch/pytorch/pull/22426))
* Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. ([21425](https://github.com/pytorch/pytorch/pull/21425))
* Fix a bug in loop unrolling. ([21239](https://github.com/pytorch/pytorch/pull/21239))
* Fix alias annotations for dict ops. ([22900](https://github.com/pytorch/pytorch/pull/22900))
* Fix inaccurate SourceRange reporting. ([21109](https://github.com/pytorch/pytorch/pull/21109))
* Fix broken indexing when using None and ellipses indexing together. ([22905](https://github.com/pytorch/pytorch/pull/22905))
* Fix bug in `CompilationUnit::define`. ([21886](https://github.com/pytorch/pytorch/pull/21886))
* Fix compilation order for class methods. ([20094](https://github.com/pytorch/pytorch/pull/20094))
* Fix dead code elimination over loops. ([22632](https://github.com/pytorch/pytorch/pull/22632))
* Fix dead code elimination in onnx export. ([22476](https://github.com/pytorch/pytorch/pull/22476))
* Fix incorrect default on `Graph::toString`. ([21370](https://github.com/pytorch/pytorch/pull/21370))
* Fix optional type promotion for classes. ([21593](https://github.com/pytorch/pytorch/pull/21593))
* Fix optional type unification. ([19813](https://github.com/pytorch/pytorch/pull/19813))
* Fix `NameError` with `PYTORCH_JIT=0`. ([20120](https://github.com/pytorch/pytorch/pull/20120))
* Fix overspecializing constants in compilation. ([22816](https://github.com/pytorch/pytorch/pull/22816))
* Fix `pow()` bug on overloads. ([20824](https://github.com/pytorch/pytorch/pull/20824))
* Fix recusive method compilation. ([21862](https://github.com/pytorch/pytorch/pull/21862))
* Fix reflection on weak modules, copy attributes. ([20190](https://github.com/pytorch/pytorch/pull/20190))
* Fix slow unpickling. ([21542](https://github.com/pytorch/pytorch/pull/21542))
* Fix input/output type mismatch. ([20829](https://github.com/pytorch/pytorch/pull/20829))
* Fix insert_guard for norm decomposation. ([19646](https://github.com/pytorch/pytorch/pull/19646))
* Fix Trace inlining of graphs with optional inputs. ([22686](https://github.com/pytorch/pytorch/pull/22686))
* Fix tracing bugs where using `1 - x` in C++ would cause the size of 1 to get hardcoded. ([20932](https://github.com/pytorch/pytorch/pull/20932))
* Fix tuple indexing bug. ([21521](https://github.com/pytorch/pytorch/pull/21521))
* Fix type hints for `None` constants. ([23029](https://github.com/pytorch/pytorch/pull/23029))
* Fix weak module cuda() `_flat_weights bug`. ([21107](https://github.com/pytorch/pytorch/pull/21107))
* Fix `WeakIValueEq`. ([21891](https://github.com/pytorch/pytorch/pull/21891))
* Fixed gcd to use 64 bit integers. ([21041](https://github.com/pytorch/pytorch/pull/21041))
* Fixed `list()` not making a copy. ([22093](https://github.com/pytorch/pytorch/pull/22093))
* Fix race condition on `Module::forward` method. ([21398](https://github.com/pytorch/pytorch/pull/21398))
* Made `a += b` for lists do an in place add. ([21896](https://github.com/pytorch/pytorch/pull/21896))
* Made `floor/ceil` return ints. ([21124](https://github.com/pytorch/pytorch/pull/21124))
* Out-of-memory on GPU due to the "weak_script" decorators. ([20588](https://github.com/pytorch/pytorch/issues/20588))
* Override print when python is present. ([21625](https://github.com/pytorch/pytorch/pull/21625))
* Set `__file__` for `torch.ops`. ([21888](https://github.com/pytorch/pytorch/pull/21888))
* Set correct list type in pybind_utils. ([23188](https://github.com/pytorch/pytorch/pull/23188))

C++ Frontend bug fixes

* `nn::RNN`: Fix assertions in bidirectional RNN. ([22850](https://github.com/pytorch/pytorch/pull/22850)).
* `nn::MaxPool ` / ` nn::AvgPool`: expand incomplete kernel size, as in Python. ([22073](https://github.com/pytorch/pytorch/pull/22073), [22075](https://github.com/pytorch/pytorch/pull/22075))
* `Optim`: Fix memory leak when `weight_decay` is applied to `Adam`, `Adagrad`,  `RMSProp`. ([23125](https://github.com/pytorch/pytorch/pull/23125))
* `Optim::SGD`: fix memory leak with weight_decay. ([23007](https://github.com/pytorch/pytorch/pull/23007))
* `torch::autograd::Scatter` `/ torch::autograd::Gather`: Fix nullptr bug. ([20286](https://github.com/pytorch/pytorch/pull/20286))
* `torch::nn::parallel::data_parallel`: fix gradient computation error. ([20910](https://github.com/pytorch/pytorch/pull/20910))
* [C++ Extensions] Fix an issue when building multiple extensions in the same directory. ([20221](https://github.com/pytorch/pytorch/pull/20221))

Deprecations

**Masking via `torch.uint8` Tensors is now deprecated in favor of masking via `torch.bool` Tensors.**

See the **Breaking Changes** section for more details about `torch.bool` Tensors and comparison operators.

**`torch.masked_select`, `torch.masked_fill`, `torch.masked_scatter` now expect `torch.bool` masks rather than `torch.uint8`.**


>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])

>>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
UserWarning: masked_select received a mask with dtype torch.uint8,
this behavior is now deprecated, please use a mask with dtype torch.bool instead.

tensor([2, 3])

instead use torch.bool
>>> a.masked_select(tensor([False,  True,  True]))
tensor([2, 3])



**Comparison operators with `out=` parameters now expect `torch.bool` dtype rather than `torch.uint8`.**


>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> res = torch.empty_like(a, dtype=torch.uint8)
>>> torch.gt(a, b, out=res)
UserWarning: torch.gt received 'out' parameter with dtype torch.uint8, this behavior
is now deprecated, please use 'out' parameter with dtype torch.bool instead.

tensor([0, 1, 1], dtype=torch.uint8)

instead use torch.bool
>>> res = torch.empty_like(a, dtype=torch.bool)
>>> torch.gt(a, b, out=res)
tensor([False, True, True])




Legacy `autograd.Function` (Function without static forward method) is now deprecated


>>> class MyLegacyFunction(Function):
>>>     def forward(self, x):
>>>         return x
>>>
>>>     def backward(self, grad_output):
>>>         return grad_output
>>>
>>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
UserWarning: Legacy autograd function with non-static forward method is deprecated
and will be removed in 1.3. Please use new-style autograd function
with static forward method.

instead use new-style Autograd Function
>>> class MyFunction(Function):
>>>     staticmethod
>>>     def forward(ctx, x):
>>>         return x
>>>
>>>     staticmethod
>>>     def backward(ctx, grad_output):
>>>         return grad_output
>>>
>>> MyFunction.apply(torch.randn((3,), requires_grad=True)


See the [torch.autograd.Function](https://pytorch.org/docs/stable/autograd.htmltorch.autograd.Function) documentation for more details.

`torch.gels`: has been renamed to `torch.lstsq`; `torch.gels` will work for this release but is now deprecated.  ([23460](https://github.com/pytorch/pytorch/pull/23460))

Performance

* Advanced Indexing: significantly improve performance of advanced indexing backward. ([20557](https://github.com/pytorch/pytorch/pull/20557))
* `Tensor.copy_`: increase broadcasting CUDA copy performance by 25%. ([20685](https://github.com/pytorch/pytorch/pull/20685))
* `torch.matmul`: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. ([20448](https://github.com/pytorch/pytorch/pull/20448))
* `torch.bmm`: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. ([20266](https://github.com/pytorch/pytorch/pull/20266))
* `torch.inverse`: Move workspace query and allocation outside loop to improve performance by up to 5x. ([20904](https://github.com/pytorch/pytorch/pull/20904))
* `torch.topk`: Optimize CPU perf using parallel and partial sort, up to 6x improvement. ([22865](https://github.com/pytorch/pytorch/pull/22865))
* `torch.cdist`: Improve CPU perf by up to 10x for some cases. ([20605](https://github.com/pytorch/pytorch/pull/20605))
* `torch.normal`: Move `normal`, `normal_means`, `normal_stddevs`, and `normal_means_stddevs` to ATen, increasing performance by up to 3x. ([21287](https://github.com/pytorch/pytorch/pull/21287))
* `torch.bernoulli`: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. ([21300](https://github.com/pytorch/pytorch/pull/21300))
* `torch.coalesce`: Use `_sparse_coo_tensor_unsafe` in `coalesce` for up to 10x speedup. ([21214](https://github.com/pytorch/pytorch/pull/21214))
* `torch.sinh` / `torch.cosh`: Parallelize and vectorize on CPU. ([21115](https://github.com/pytorch/pytorch/pull/21115))
* `torch.lerp`: Vectorize on CPU. ([22038](https://github.com/pytorch/pytorch/pull/22038))
* `torch.eye`: Parallelize on CPU. ([21077](https://github.com/pytorch/pytorch/pull/21077))
* `torch.randperm`: Parallelize initialization in randperm on CPU. ([21529](https://github.com/pytorch/pytorch/pull/21529))
* Vectorization: Don't split 256-bit AVX2 load/store intrinsics. ([20609](https://github.com/pytorch/pytorch/pull/20609)).

Torch.NN Performance Improvements

* `nn.Softmax`: Add persistent CUDA kernels that increase performance 2-10x on small inputs. ([20827](https://github.com/pytorch/pytorch/pull/20827))
* `nn.Embedding` / `nn.EmbeddingBag`: Optimize CUDA kernel, increasing performance up to 2.7x. ([22016](https://github.com/pytorch/pytorch/pull/22016))
* `nn.Linear`: optimize BERT model perf by using mkldnn inner product. ([21851](https://github.com/pytorch/pytorch/pull/21851))
* `nn.Conv{1,2,3}D`: improve perf for depthwise convolutions in `torch.float16` on Volta and Turing GPUs. ([22302](https://github.com/pytorch/pytorch/pull/22302))
* `nn.RNN`: optimize on CPU by fusing matmul ops. ([22512](https://github.com/pytorch/pytorch/pull/22512))
* `nn.Upsample`: a number of significant perf improvements on CUDA. ([21879](https://github.com/pytorch/pytorch/pull/21879), [21694](https://github.com/pytorch/pytorch/pull/21694)).
* `nn.functional.layer_norm`: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. ([20345](https://github.com/pytorch/pytorch/pull/20345), [20883](https://github.com/pytorch/pytorch/pull/20883))
* Use `mkldnn` inner product for `nn.Linear()` to improve BERT perf. ([21851](https://github.com/pytorch/pytorch/pull/21851)).

Documentation

* `torch.bool`: doc the Boolean tensor type. ([21601](https://github.com/pytorch/pytorch/pull/21601))
* `torch.as_strided`: add docs. ([22842](https://github.com/pytorch/pytorch/pull/22842))
* `torch.empty_strided`: add docs. ([23740](https://github.com/pytorch/pytorch/pull/23740))
* `torch.lerp`: clarify broadcasting requirements. ([23268](https://github.com/pytorch/pytorch/pull/23268))
* `torch.enable_grad` / `torch.no_grad` / `torch.set_grad_enable`: clarify interaction between these features. ([23310](https://github.com/pytorch/pytorch/pull/23310))
* `torch.autograd.grad_mode`: Document that no_grad is thread local. ([21755](https://github.com/pytorch/pytorch/pull/21755))
* `torch.multiprocessing`: Explain refcounting of CUDA tensors. ([19904](https://github.com/pytorch/pytorch/pull/19904))
* `torch.Tensor`: Add a warning about memory usage. ([20801](https://github.com/pytorch/pytorch/pull/20801))
* `torch.utils.data.Dataloader`: Document RNG state consumption. ([22540](https://github.com/pytorch/pytorch/pull/22540))
* `torch.optim.lr_scheduler.CyclicLR`: Clarify `base_momentum` and `max_momentum`. ([20880](https://github.com/pytorch/pytorch/pull/20880)).
* Document production environment features. ([23010](https://github.com/pytorch/pytorch/pull/23010))
* Add note about contributing recently released research. ([23513](https://github.com/pytorch/pytorch/pull/23513))
* Clarify performance implications of deterministic mode. ([21337](https://github.com/pytorch/pytorch/pull/21337))
* Update cuda pinned memory note to include `tensor.to`. ([20977](https://github.com/pytorch/pytorch/pull/20977))

Torch.NN Documentation

* `nn.functional / nn.init`: Break up NN in docs so they load faster. ([21291](https://github.com/pytorch/pytorch/pull/21291))
* `nn.functional.conv{1,2,3}d`: Remove `padding_mode`.  ([20891](https://github.com/pytorch/pytorch/pull/20891))
* `nn.functional.upsample` / `nn.functional.interpolate`: add note about overshooting with `mode=‘bicubic’`. ([23321](https://github.com/pytorch/pytorch/pull/23321))
* `nn.init.zeros_` / `nn.init.ones_`: add documentation. ([23145](https://github.com/pytorch/pytorch/pull/23145))
* `nn.MultiheadAttention`: Add documentation for `add_bias_kv`, `add_zero_attn`, and `attn_mask`. ([20071](https://github.com/pytorch/pytorch/pull/20071))
* `nn.MultiheadAttention`: Fix documentation for attention mask shape. ([20850](https://github.com/pytorch/pytorch/pull/20850))
* `nn.Softmax`: Fixed to specify dimension to prevent warning in 1.1.0. ([20310](https://github.com/pytorch/pytorch/pull/20310)*)*

Contributor Documentation

* Updated web links on contribution_guide and governance documentation. ([21243](https://github.com/pytorch/pytorch/pull/21243))
* Improve documentation for publishing hub models. ([21307](https://github.com/pytorch/pytorch/pull/21307))
* Suggest a faster linker in the contributing guide. ([21334](https://github.com/pytorch/pytorch/pull/21334))
* Add CUDA C++11 and profiling notes to the contribution guide. ([21386](https://github.com/pytorch/pytorch/pull/21386))

Build Documentation

* Add magma for CUDA 10.1 to Windows docs. ([19914](https://github.com/pytorch/pytorch/pull/19914))
* Improve build-from-source instructions. ([20088](https://github.com/pytorch/pytorch/pull/20088))
* Add `ninja` to build instructions. ([20079](https://github.com/pytorch/pytorch/pull/20079))
* Update libtorch build docs. ([21150](https://github.com/pytorch/pytorch/pull/21150))

TensorBoard Documentation

* Tensorboard Documentation has been greatly improved!  Browse the latest version [here](https://pytorch.org/docs/stable/tensorboard.html).

Torch HUB Documentation

* Improve docs for publishing hub models. ([21307](https://github.com/pytorch/pytorch/pull/21307))
* Update docs of entry point in hub. ([21568](https://github.com/pytorch/pytorch/pull/21568))

ONNX


In PyTorch 1.2, we have added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and we have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.

Supporting More ONNX Opsets

* Add basic supports for multiple ONNX Opsets and support for Opset 10. ([19294](https://github.com/pytorch/pytorch/pull/19294))
* Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. ([22421](https://github.com/pytorch/pytorch/pull/22421), [20036](https://github.com/pytorch/pytorch/pull/20036))
* Export `Dropout` for Opset 10. ([20710](https://github.com/pytorch/pytorch/pull/20710))
* Export `Slice` and `Flip` for Opset 10. ([20533](https://github.com/pytorch/pytorch/pull/20533))
* Export `Interpolate (Resize)` for Opset 10. ([21434](https://github.com/pytorch/pytorch/pull/21434))

Enhancing the Support for ScriptModule

* Support multiple outputs in ScriptModule in ONNX Exporter. ([20256](https://github.com/pytorch/pytorch/pull/20256))
* Support tensor factories in ScriptModule in ONNX Exporter. ([20255](https://github.com/pytorch/pytorch/pull/20255))
* Support tuples as inputs and outputs in ScriptModule. ([20784](https://github.com/pytorch/pytorch/pull/20784))

Exporting More Torch Operators to ONNX

* Export custom ops. ([21321](https://github.com/pytorch/pytorch/pull/21321))
* Export `torch.arange `. ([22601](https://github.com/pytorch/pytorch/pull/22601))
* Export `torch.masked_fill`. ([22521](https://github.com/pytorch/pytorch/pull/22521))
* Export `torch.floor`, ` torch.ceil`, `torch.log2` and `prim::shape`. ([17895](https://github.com/pytorch/pytorch/pull/17895))
* Export `torch._dim_arange`. ([20078](https://github.com/pytorch/pytorch/pull/20078))
* Export `torch.randn_like`. ([20093](https://github.com/pytorch/pytorch/pull/20093))
* Export `torch._standard_gamma`. ([20126](https://github.com/pytorch/pytorch/pull/20126))
* Export `torch.topk`. ([21104](https://github.com/pytorch/pytorch/pull/21104))
* Export `__ and__`, `__or__`. ([17894](https://github.com/pytorch/pytorch/pull/17894))
* Export `torch.sign`. ([20470](https://github.com/pytorch/pytorch/pull/20470))
* Export `torch.scatter`. ([18543](https://github.com/pytorch/pytorch/pull/18543))
* Export `torch.rand`. ([20559](https://github.com/pytorch/pytorch/pull/20559))
* Export `torch.gather`. ([21235](https://github.com/pytorch/pytorch/pull/21235))
* Export `torch.cosine_similarity`. ([21884](https://github.com/pytorch/pytorch/pull/21884))
* Export `torch.sum`. ([22240](https://github.com/pytorch/pytorch/pull/22240))
* Export `torch.logsumexp`. ([22306](https://github.com/pytorch/pytorch/pull/22306))
* Export `torch.layer_norm`. ([22265](https://github.com/pytorch/pytorch/pull/22265))

Extending Existing Exporting Logic

* Support `torch.min` and `torch.max` with dim. ([19689](https://github.com/pytorch/pytorch/pull/19689))
* Support `maxpool` with dilations. ([18721](https://github.com/pytorch/pytorch/pull/18721))
* Support `RNN` with `batch_first=True`. ([19766](https://github.com/pytorch/pytorch/pull/19766))
* Support `Upsample` with dynamic input. ([20116](https://github.com/pytorch/pytorch/pull/20116))
* Improve support for Loop export. ([20445](https://github.com/pytorch/pytorch/pull/20445))
* Enable `torch.full` with scalar parameters. ([21931](https://github.com/pytorch/pytorch/pull/21931))
* Added support for exporting models with variable length input/output to ONNX. ([20034](https://github.com/pytorch/pytorch/pull/20034))

Optimizing Exported ONNX Graph

* Support constant folding in Opset 10. ([22515](https://github.com/pytorch/pytorch/pull/22515))
* Support negative indexing for `Slice` in constant folding optimization. ([21811](https://github.com/pytorch/pytorch/pull/21811))

Bugfixes/Improvements

* Fix the shape of `PReLU` weight. ([21330](https://github.com/pytorch/pytorch/pull/21330))
* Fix the export for `torch.pixel_shuffle`. ([21486](https://github.com/pytorch/pytorch/pull/21486))
* Fix the export for `torch.full`. ([21669](https://github.com/pytorch/pytorch/pull/21669))
* Update logic for folding `onnx::Constant` nodes. ([20109](https://github.com/pytorch/pytorch/pull/20109))

0.13

named_parameters to filter out specific parameter types

Let's say that you want to add weight decay to all parameters of your model except for the biases. How do you get only the biases of your model?
We introduce [nn.Module.named_parameters](http://pytorch.org/docs/nn.htmltorch.nn.Module.named_parameters) for this.
It joins `named_children` and `named_modules` in helping you filter specific attributes of models.

Example of filtering out biases of a model and give them weight_decay of 0:

python
import torch
import torch.nn as nn
import torch.optim as optim
m = nn.Sequential(
nn.Linear(10, 20),
nn.ReLU(),
nn.Linear(20, 20),
nn.ReLU(),
)
weights, biases = [], []
for name, p in m.named_parameters():
if 'bias' in name:
biases += [p]
else:
weights += [p]

optim.SGD([
{'params': weights},
{'params': biases, weight_decay=0}
], lr=1e-2, momentum=0.9, weight_decay=1e-5)


Performance Improvements
------------------------
- `cumsum` and `cumprod` have been significantly made faster on the GPU via using some thrust primitives where appropriate.
- `LSTMCell` and `GRUCell` are now significantly faster on the GPU via a fused kernel
- The default Algorithm for CuDNN has been changed to `PRECOMP_GEMM` which is a
much faster algorithm that takes a tiny bit of workspace. Previously, it used to
be `IMPLICIT_GEMM` which took zero workspace, but was significantly slower.
- 5% to 10% improvement in data loader by collating batches directly into shared memory.
- SVD is now computed on the GPU via divide-and-conquer (sgesdd) which gives a 2x to 5x speedup.
- The commonly used function `expand` has been moved to C, to have better performance in smaller models.

Bug Fixes
---------
- Added contiguous checks on weight and bias for a large range of THNN functions
- make the range of `random_` correct when both lower and upper bound are specified
- `parallel_apply` now can take arguments that are unhashable
- Reshape `grad` correctly in the Dot function (inputs don't have to be 1D vectors...)
- Added `Variable.type_as`
- Unify argument names of `norm` and `renorm` to have `p=norm_type, dim=dim`
- `btrisolve` works on CPU doubles
- ipython autocomplete for torch.nn.Module fixed via implementing `__dir__`
- `device_ids` can now be `None` again in `F.data_parallel` and will use all available GPUs
- workaround cudnn bugs in BatchNorm (<5.1.10) and Dilation (6.0.20)
- Padding bugfix in Conv1d CPU
- `remainder` and `cremainder` are fixed for integer types
- fix memory leak in `btrisolve` and `getri`
- If nn.Module's source cant be retrieved because of any exception,
handle serialization to be non-fatal
- `collate_fn` now retains the type of the numpy array
- `is_tensor` and `is_storage` are now fixed for old-style Python classes
- `torch.cat` now supports keyword arguments
- CUDA collectives supported coalescing, but the inputs were all assumed
to be of the same Tensor type. This is fixed.
- Fix a deadlock bug in autograd because of an underlying glibc bug in specific
linux distros (ArchLinux in particular)
- `abs` is now fixed for `char` and `short` cuda types
- fix `torch.diag` autograd when giving a dimension argument
- fix grouped convolution on CPU when `bias=False`
- expose `dilated` convolutions for `ConvTranspose*d`
- Fix a bug in `HingeEmbeddingLoss` where `margin` can now be specified via kwargs

Improved error messages
-----------------------
- Fix errors and messages when no CUDA devices are available.

0.5

we can now index with a mask that has fewer
dimensions than the indexing tensor
c = a[mask, :5]


Fast Fourier Transform
- Add new FFT methods [5856](https://github.com/pytorch/pytorch/pull/5856)
- Add ``torch.stft`` (short time Fourier transform) and hann/hamming/bartlett window functions. [4095](https://github.com/pytorch/pytorch/pull/4095)
- Support arbitrary number of batch dimensions in *FFT [6528](https://github.com/pytorch/pytorch/pull/6528)

New and updated Torch operators

* Added `torch.log2` and `torch.log10` [6272](https://github.com/pytorch/pytorch/pull/6272)
* Added `torch.isnan` [5273](
https://github.com/pytorch/pytorch/pull/5273)
* Add `torch.reshape`, which is similar to `numpy.reshape`. It is roughly equivalent to `tensor.contiguous().view()`, but avoids copying in certain cases [5575](https://github.com/pytorch/pytorch/pull/5575)
* Add CPU implementation of `torch.unique`, which outputs the unique elements of a Tensor [5503](https://github.com/pytorch/pytorch/pull/5503)
* Add `torch.det`, `torch.logdet` and `torch.slogdet`, for computing the (log-)determinant of square 2D tensors. For negative determinants, `torch.logdet` returns `nan`, while `torch.slogdet` returns the sign of the log-determinant and the log of the absolute value of the determinant. [3816](https://github.com/pytorch/pytorch/pull/3816) and [5393](https://github.com/pytorch/pytorch/pull/5393)
* Add `nn.functional.gumbel_softmax`, which lets you use the reparametrization trick for discrete variables [3341](https://github.com/pytorch/pytorch/pull/3341)
* Add `torch.take` and `Tensor.put_`. Those functions are equivalent to numpy.take and numpy.put, and are the base for full support of advanced indexing in PyTorch [3263](https://github.com/pytorch/pytorch/pull/3263)
* Add `torch.randint`, similar to `numpy.random.randint` [6136](https://github.com/pytorch/pytorch/pull/6136)
* Add `torch.diagonal` and `torch.diagflat`, similar to `numpy.diagonal` and `numpy.diagflat`. They are meant as a replacement for `torch.diag`, which handled both the cases of constructing a diagonal tensor as well as extracting the diagonal of a matrix [5622](https://github.com/pytorch/pytorch/pull/5622)
* Add `torch.einsum`, equivalent to `numpy.einsum`. einsum allows you to perform operations using Einstein's notation. [5503](https://github.com/pytorch/pytorch/pull/5503)
python
a = torch.arange(0, 9).reshape(3, 3)
the following transposes a
b = torch.einsum('ij->ji', (a,))

- Add ``torch.expm1``, a numerically stable ``exp(x)-1`` for small ``x``. [4350](https://github.com/pytorch/pytorch/pull/4350)
- Allow users to specify individual split sizes with ``torch.split`` [3837](https://github.com/pytorch/pytorch/pull/3837)
- Add ``torch.where(condition, tensor1, tensor2)`` that returns a tensors of elements selected from  ``tensor1`` or ``tensor2`` based on ``condition``. [4259](https://github.com/pytorch/pytorch/pull/4259), [4259](https://github.com/pytorch/pytorch/pull/4259)
- Add ``Tensor.norm(dim)`` for sparse tensors. [4882](https://github.com/pytorch/pytorch/pull/4882)
- Implement ``torch.neg`` for all types. [4075](https://github.com/pytorch/pytorch/pull/4075)
- Implement gradient calculation for ``torch.trtrs``. [3972](https://github.com/pytorch/pytorch/pull/3972)
- Deprecate out-of-place ``Tensor.resize`` and ``Tensor.resize_as``. These have weird semantics and are hard to use correctly. Please use their in-place variants ``Tensor.resize_`` and ``Tensor.resize_as_``. [4886](https://github.com/pytorch/pytorch/pull/4886)

Rename `async` argument in ``.cuda()`` to `non_blocking`

The `async` keyword argument in conversion calls is now deprecated in PyTorch, and it has been replaced by `non_blocking`.  This was necessary because `async` will be a keyword in Python 3.7


Neural Networks


A new autograd container that lets you trade compute for memory

The new `checkpoint` container allows you to only store a subset of the outputs necessary for backpropagation. If an output is missing (to save memory), the `checkpoint` container will recompute the intermediate outputs from the closest checkpoint, so that memory usage can be reduced (with an increase in computation time).
Here is an example:
python
input
input = torch.rand(1, 10)
suppose we have a very deep model
layers = [nn.Linear(10, 10) for _ in range(1000)]
model = nn.Sequential(*layers)
output = model(input)

The above model uses a lot of memory, because it needs to keep the intermediate values of every operation for backpropagation. `checkpoint` lets your reduce the memory requirements:

python

create the input tensors and set the requires_grad=True
NOTE: the requires_grad=True for the input is a current
limitation of checkpointing. At least one of the
model inputs should have requires_grad=True.
If you don't do it, you might have empty gradients.
input = torch.rand(1, 10, requires_grad=True)
layers = [nn.Linear(10, 10) for _ in range(1000)]

define function that will define where
we will checkpoint and store
intermediate gradients. In this case,
we will only store one intermediate
gradient, in the middle of the
model

def run_first_half(*args):
x = args[0]
for layer in layers[:500]:
x = layer(x)
return x

def run_second_half(*args):
x = args[0]
for layer in layers[500:-1]:
x = layer(x)
return x

now uses the new checkpoint functionality
from torch.utils.checkpoint import checkpoint

x = checkpoint(run_first_half, input)
x = checkpoint(run_second_half, x)
last output need to be run without checkpoint
x = layers[-1](x)
x.sum.backward()   works!

For sequential modules (which can have arbitrary blocks inside), a helper function `checkpoint_sequential` is provided, which takes care of the most common use-cases:
python
input = torch.rand(1, 10, requires_grad=True)
layers = [nn.Linear(10, 10) for _ in range(1000)]
model = nn.Sequential(*layers)

from torch.utils.checkpoint import checkpoint_sequential

split in two blocks
num_segments = 2
x = checkpoint_sequential(model, num_segments, input)
x.sum().backward()   works!


bottleneck - a tool to identify hotspots in your code

``torch.utils.bottleneck`` ([5216](https://github.com/pytorch/pytorch/pull/5216), [6425](https://github.com/pytorch/pytorch/pull/6425)) is a tool that can be used as an initial step for
debugging bottlenecks in your program. It summarizes runs of your script with
the Python profiler and PyTorch’s autograd profiler. See the [bottleneck docs](http://pytorch.org/docs/master/bottleneck.html) for more details.

reduce=False Losses
As of this release, all of our loss functions support the ``reduce`` keyword. Specifying ``reduce=False`` gives a Tensor per unit of loss instead of a single reduced loss. [4924](https://github.com/pytorch/pytorch/pull/4924), [5346](https://github.com/pytorch/pytorch/pull/5346), [5646](https://github.com/pytorch/pytorch/pull/5646), [4231](https://github.com/pytorch/pytorch/pull/4231), [4705](https://github.com/pytorch/pytorch/pull/4705),  [5680](https://github.com/pytorch/pytorch/pull/5680)


New modules and module improvements

* Add `DistributedDataParallelCPU`. This is similar to `DistributedDataParallel`, but with specific support for models running on the CPU (contrary to `DistributedDataParallel`, which targets GPU), and supports `mpi`, `gloo` and `tcp` backends [5919](https://github.com/pytorch/pytorch/pull/5919).
* Add [Group Normalization](https://arxiv.org/abs/1803.08494) (`nn.GroupNorm`), an alternative to batch normalization that doesn't suffer from the same issues as `BatchNorm` for small batch sizes
* Add [Layer Normalization](https://arxiv.org/abs/1607.06450) (``nn.LayerNorm``), an alternative for batch normalization often used in NLP tasks. [4922](https://github.com/pytorch/pytorch/pull/4922)
* Add Local Response Normalization (``nn.LocalResponseNorm``). [4922](https://github.com/pytorch/pytorch/pull/4922)
* `MaxPool3d` now supports double backwards. MaxPool3d and MaxUnpool3d now use indices consistent with the rest of the pooling layers. [5328](https://github.com/pytorch/pytorch/pull/5328)
* All loss functions now support a reduce argument to return a batch of losses. [264](https://github.com/pytorch/pytorch/issues/264)
* Add util to clip gradient value in torch.nn.utils.clip_grad and add param to He initialization scheme in `torch.nn.init`. [6173](https://github.com/pytorch/pytorch/pull/6173)
* Renamed ``torch.nn.init.*`` methods to have an underscore in the end, as they operate in-place, and deprecate the old versions [6093](https://github.com/pytorch/pytorch/pull/6093)
* Added support for returning dictionaries in `DataParallel` [6113](https://github.com/pytorch/pytorch/pull/6113)
* Added support for N-D tensors in `torch.nn.Bilinear` [5764](https://github.com/pytorch/pytorch/pull/5764)
* Add `Embedding.from_pretrained` factory. This allows to initialize an Embedding layer with an existing tensor, bypassing the initial random initialization of its weights.
* You can now slice ``nn.Sequential``, ``nn.ModuleList``, and ``nn.ParameterList`` [4491](https://github.com/pytorch/pytorch/pull/4491)
* Registered ``nn.Module`` integer parameters and buffers are now immune to ``module.float()``, ``module.double()`` ``module.half()`` calls. [3820](https://github.com/pytorch/pytorch/pull/3820)

torch.distributions
`torch.distributions` has expanded to include 24 [basic probability distributions](http://pytorch.org/docs/0.4.0/distributions.html): `Bernoulli`, `Beta`, `Binomial`, `Categorical`, `Cauchy`, `Chi2`, `Dirichlet`, `Exponential`, `FisherSnedecor`, `Gamma`, `Geometric`, `Gumbel`, `Laplace`, `LogNormal`, `Multinomial`, `MultivariateNormal`, `Normal`, `OneHotCategorical`, `Pareto`, `Poisson`, `RelaxedBernoulli`, `RelaxedOneHotCategorical`, `StudentT`, and `Uniform`.

The [`Distribution`](http://pytorch.org/docs/0.4.0/distributions.htmldistribution) interface has expanded to include many methods including `.cdf()`, `.icdf()`, `.mean()`, `.variance()`, `.entropy()`, and `.perplexity()`. Distributions now split tensor dimensions into [`sample_shape`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.distribution.Distribution.sample)+[`batch_shape`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.distribution.Distribution.batch_shape)+[`event_shape`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.distribution.Distribution.event_shape). Most continuous distributions now also implement a differentiable `.rsample()` method to compute [pathwise derivatives](http://pytorch.org/docs/0.4.0/distributions.htmlpathwise-derivative) aka the reparameterization trick (check `.has_rsample` for availability):
python
>>> loc = torch.tensor(0., requires_grad=True)
>>> scale = torch.tensor(1., requires_grad=True)
>>> samples = Normal(loc, scale).rsample(sample_shape=(1000,))
>>> loss = (samples - 0.5).pow(4).mean()   average over 1000 monte carlo samples
>>> grad(loss, [loc, scale])
(tensor(-7.5092), tensor(15.2704))

Most discrete distributions implement an [`.enumerate_support()`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.distribution.Distribution.enumerate_support) method to make it easy to sum over all possible sample values (check `.has_enumerate_support` for availability).

[`kl_divergence`](http://pytorch.org/docs/0.4.0/distributions.htmlmodule-torch.distributions.kl) is defined for many pairs of distributions, e.g.
python
>>> x = torch.tensor(1.0, requires_grad=True)
>>> kl = kl_divergence(Uniform(-x, x), Normal(0., 1.))
>>> grad(kl, [x])[0]
tensor(-0.6667)


Distribution Transforms
New distributions can be created by combining [`TransformedDistribution`](http://pytorch.org/docs/0.4.0/distributions.htmltransformeddistribution) with any number of [`Transform`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.transforms.Transform) objects from the [`torch.distributions.transforms`](http://pytorch.org/docs/0.4.0/distributions.htmlmodule-torch.distributions.transforms) library, including: `ExpTransform`, `PowerTransform`, `SigmoidTransform`, `AbsTransform`, `AffineTransform`, `SoftmaxTransform`, `StickBreakingTransform`, `LowerCholeskyTransform`, and their inverses via the [`.inv`](http://pytorch.org/docs/0.4.0/distributions.htmltorch.distributions.transforms.Transform.inv) property.

Distribution Constraints
Distributions provide metadata about the constraints of their `.support` and about their arguments (`.arg_constraints`). These [`Constraint`](http://pytorch.org/docs/0.4.0/distributions.htmlmodule-torch.distributions.constraints) objects are registered with transforms using [`transform_to()` and `biject_to()`](http://pytorch.org/docs/0.4.0/distributions.htmlmodule-torch.distributions.constraint_registry). Together constraints and transforms make it easy to specify new distributions in a generic way
python
>>> scale = torch.tensor(1., requires_grad=True)
>>> p = Normal(0., scale)
>>> assert p.arg_constraints['scale'] == constraints.positive
>>> prior = TransformedDistribution(Normal(0., 1.),
...                                 transform_to(constraints.positive))

Constraints in the [`torch.distributions.constraints`](http://pytorch.org/docs/0.4.0/distributions.htmlmodule-torch.distributions.constraints) library include: `boolean`, `greater_than(lower_bound)`, `integer_interval(lower_bound, upper_bound)`, `interval(lower_bound, upper_bound)`, `lower_cholesky`, `lower_triangular`, `nonnegative_integer`, `positive`, `positive_definite`, `positive_integer`, `real`, `real_vector`, `simplex`, and `unit_interval`.

Distributed

Helper utility for launching Distributed Training jobs

We have added an utility function to help launch jobs on a distributed setup.
In order to launch a script that leverages `DistributedDataParallel` on either single-node multiple-nodes, we can make use of torch.distributed launch as follows

python -m torch.distributed.launch my_script.py --arg1 --arg2 --arg3


The script simplifies day to day usability of the `distributed` package.

You can read about it's usage here: http://pytorch.org/docs/stable/distributed.htmllaunch-utility

A new distributed backend based on NCCL 2.0

PyTorch now has a new distributed backend, which leverages NCCL 2.0 for maximum speed.
It also provides new APIs for collective operations on multiple GPUs.
You can enable the new backend via
python
torch.distributed.init_process_group("nccl")


Other distributed improvements

- Coalesce many small broadcasts to improve performance [4978](https://github.com/pytorch/pytorch/pull/4978)
- Add mixed-precision support for distributed training [4891](https://github.com/pytorch/pytorch/pull/4891)
- Release NCCL distributed backend. Previously it was marked as ``experimental``. [4921](https://github.com/pytorch/pytorch/pull/4921)
- Enable Infiniband support for Gloo data channel with automatic IB device detection [4795](https://github.com/pytorch/pytorch/pull/4795)


C++ extensions
Previously, the official way of writing extensions using C or CUDA for custom modules was through the cffi extension. The drawback of this method was that it required a separate step for compiling the CUDA kernels, which could be a bit messy.

PyTorch now provides a better system for [writing your own C++ / CUDA extensions](http://pytorch.org/docs/master/cpp_extension.html). Example implementations using this new extension support can be found in the [pytorch/cpp_extensions](https://github.com/pytorch/extension-cpp) repo.

We provide two compilation modes:
- ahead of time compilation: you write a `setup.py` script using the new `CppExtension` or `CUDAExtension`, which is an extension of `setuptools.Extension` module;
- just-in-time compilation: you pass the list of C++ / CUDA files that you want to compile to `torch.utils.cpp_extension.load`, and it will compile on the fly and cache the libraries for you. Here is an example illustrating how easy it is to implement an extension:

In C++
cpp
// my_implementation.cpp
include <torch/torch.h>
include <unordered_set>

// can use templates as well. But let's keep it
// simple
using scalar_t = float;

at::Tensor unique_float(at::Tensor input_) {
// only works for floats
AT_ASSERT(input_.type().scalarType() == at::ScalarType::Float, "input must be a float tensor");
// and CPU tensors
AT_ASSERT(!input_.type().is_cuda(), "input must be a CPU tensor");

// make the input contiguous, to simplify the implementation
at::Tensor input = input_.contiguous();

// get the pointer that holds the data
scalar_t* input_data = input.data<scalar_t>();
// let's use a function from the std library to implement
// the unique function
std::unordered_set<scalar_t> set(input_data, input_data + input.numel());

// create the output tensor, with size set.size()
at::Tensor output = input.type().tensor({static_cast<int64_t>(set.size())});
scalar_t* output_data = output.data<scalar_t>();
// copy the content of the set to the output tensor
std::copy(set.begin(), set.end(), output_data);

return output;
}

// this defines the functions exposed to Python
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("unique_float", &unique_float, "Unique for float tensors");
}

And then in Python
python
import torch
from torch.utils.cpp_extension import load as load_ext
pass the source files, they will be compiled on the fly
and will return a python module
_C = load_ext('my_unique_lib', sources=['my_implementation.cpp'])

now can use the functions implemented in C++
unique = _C.unique_float

0.4.1

Table of Contents

- Breaking Changes
- New Features
- Neural Networks
- Adaptive Softmax, Spectral Norm, etc.
- Operators
- torch.bincount, torch.as_tensor, ...
- torch.distributions
- Half Cauchy, Gamma Sampling, ...
- Other
- Automatic anomaly detection (detecting NaNs, etc.)
- Performance
- Faster CPU ops in a wide variety of cases
- Other improvements
- Bug Fixes
- Documentation Improvements

Breaking Changes

- [`torch.stft`](https://pytorch.org/docs/stable/torch.htmltorch.stft)  has changed its signature to be consistent with librosa https://github.com/pytorch/pytorch/pull/9497
+ Before: `stft(signal, frame_length, hop, fft_size=None, normalized=False, onesided=True, window=None, pad_end=0)`
+ After: `stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)`
+ `torch.stft` is also now using FFT internally and is much faster.
- [`torch.slice`](https://pytorch.org/docs/stable/torch.htmltorch.slice)  is removed in favor of the tensor slicing notation https://github.com/pytorch/pytorch/pull/7924
- [`torch.arange`](https://pytorch.org/docs/stable/torch.htmltorch.arange)  now does dtype inference: any floating-point argument is inferred to be the default `dtype`; all integer arguments are inferred to be `int64`. https://github.com/pytorch/pytorch/pull/7016
- [`torch.nn.functional.embedding_bag`](https://pytorch.org/docs/stable/nn.htmltorch.nn.functional.embedding_bag)'s old signature embedding_bag(weight, input, ...) is deprecated, embedding_bag(input, weight, ...) (consistent with torch.nn.functional.embedding) should be used instead
- [`torch.nn.functional.sigmoid`](https://pytorch.org/docs/stable/nn.htmltorch.nn.functional.sigmoid)  and [`torch.nn.functional.tanh`](https://pytorch.org/docs/stable/nn.htmltorch.nn.functional.tanh)  are deprecated in favor of [`torch.sigmoid`](https://pytorch.org/docs/stable/torch.htmltorch.sigmoid)  and [`torch.tanh`](https://pytorch.org/docs/stable/torch.htmltorch.tanh)  https://github.com/pytorch/pytorch/pull/8748
- Broadcast behavior changed in an (very rare) edge case: `[1] x [0]` now broadcasts to `[0]` (used to be `[1]`) https://github.com/pytorch/pytorch/pull/9209

New Features

Neural Networks

- Adaptive Softmax [`nn.AdaptiveLogSoftmaxWithLoss`](https://pytorch.org/docs/stable/nn.htmladaptivelogsoftmaxwithloss) https://github.com/pytorch/pytorch/pull/5287

python
>>> in_features = 1000
>>> n_classes = 200
>>> adaptive_softmax = nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs=[20, 100, 150])
>>> adaptive_softmax
AdaptiveLogSoftmaxWithLoss(
(head): Linear(in_features=1000, out_features=23, bias=False)
(tail): ModuleList(
(0): Sequential(
(0): Linear(in_features=1000, out_features=250, bias=False)
(1): Linear(in_features=250, out_features=80, bias=False)
)
(1): Sequential(
(0): Linear(in_features=1000, out_features=62, bias=False)
(1): Linear(in_features=62, out_features=50, bias=False)
)
(2): Sequential(
(0): Linear(in_features=1000, out_features=15, bias=False)
(1): Linear(in_features=15, out_features=50, bias=False)
)
)
)
>>> batch = 15
>>> input = torch.randn(batch, in_features)
>>> target = torch.randint(n_classes, (batch,), dtype=torch.long)
>>>  get the log probabilities of target given input, and mean negative log probability loss
>>> adaptive_softmax(input, target)
ASMoutput(output=tensor([-6.8270, -7.9465, -7.3479, -6.8511, -7.5613, -7.1154, -2.9478, -6.9885,
-7.7484, -7.9102, -7.1660, -8.2843, -7.7903, -8.4459, -7.2371],
grad_fn=<ThAddBackward>), loss=tensor(7.2112, grad_fn=<MeanBackward1>))
>>>  get the log probabilities of all targets given input as a (batch x n_classes) tensor
>>> adaptive_softmax.log_prob(input)
tensor([[-2.6533, -3.3957, -2.7069,  ..., -6.4749, -5.8867, -6.0611],
[-3.4209, -3.2695, -2.9728,  ..., -7.6664, -7.5946, -7.9606],
[-3.6789, -3.6317, -3.2098,  ..., -7.3722, -6.9006, -7.4314],
...,
[-3.3150, -4.0957, -3.4335,  ..., -7.9572, -8.4603, -8.2080],
[-3.8726, -3.7905, -4.3262,  ..., -8.0031, -7.8754, -8.7971],
[-3.6082, -3.1969, -3.2719,  ..., -6.9769, -6.3158, -7.0805]],
grad_fn=<CopySlices>)
>>>  predit: get the class that maximize log probaility for each input
>>> adaptive_softmax.predict(input)
tensor([ 8,  6,  6, 16, 14, 16, 16,  9,  4,  7,  5,  7,  8, 14,  3])

- Add spectral normalization [`nn.utils.spectral_norm`](https://pytorch.org/docs/stable/nn.htmltorch.nn.utils.spectral_norm) https://github.com/pytorch/pytorch/pull/6929

python
>>>  Usage is similar to weight_norm
>>> convT = nn.ConvTranspose2d(3, 64, kernel_size=3, pad=1)
>>>  Can specify number of power iterations applied each time, or use default (1)
>>> convT = nn.utils.spectral_norm(convT, n_power_iterations=2)
>>>
>>>  apply to every conv and conv transpose module in a model
>>> def add_sn(m):
for name, c in m.named_children():
m.add_module(name, add_sn(c))
if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)):
return nn.utils.spectral_norm(m)
else:
return m

>>> my_model = add_sn(my_model)

- [`nn.ModuleDict`](https://pytorch.org/docs/stable/nn.htmltorch.nn.ModuleDict) and [`nn.ParameterDict`](https://pytorch.org/docs/stable/nn.htmltorch.nn.ParameterDict) containers https://github.com/pytorch/pytorch/pull/8463
- Add `nn.init.zeros_` and `nn.init.ones_` https://github.com/pytorch/pytorch/pull/7488
- Add sparse gradient option to pretrained embedding https://github.com/pytorch/pytorch/pull/7492
- Add max pooling support to [`nn.EmbeddingBag`](https://pytorch.org/docs/stable/nn.htmltorch.nn.EmbeddingBag) https://github.com/pytorch/pytorch/pull/5725
- Depthwise convolution support for MKLDNN https://github.com/pytorch/pytorch/pull/8782
- Add `nn.FeatureAlphaDropout` (featurewise Alpha Dropout layer) https://github.com/pytorch/pytorch/pull/9073

Operators

- [`torch.bincount`](https://pytorch.org/docs/stable/torch.htmltorch.bincount)  (count frequency of each value in an integral tensor) https://github.com/pytorch/pytorch/pull/6688

python
>>> input = torch.randint(0, 8, (5,), dtype=torch.int64)
>>> weights = torch.linspace(0, 1, steps=5)
>>> input, weights
(tensor([4, 3, 6, 3, 4]),
tensor([ 0.0000,  0.2500,  0.5000,  0.7500,  1.0000])

>>> torch.bincount(input)
tensor([0, 0, 0, 2, 2, 0, 1])

>>> input.bincount(weights)
tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])

- [`torch.as_tensor`](https://pytorch.org/docs/stable/torch.htmltorch.as_tensor)  (similar to `torch.tensor` but never copies unless necessary) https://github.com/pytorch/pytorch/pull/7109

python
>>> tensor = torch.randn(3, device='cpu', dtype=torch.float32)
>>> torch.as_tensor(tensor)                        doesn't copy
>>> torch.as_tensor(tensor, dtype=torch.float64)   copies due to incompatible dtype
>>> torch.as_tensor(tensor, device='cuda')         copies due to incompatible device
>>> array = np.array([3, 4.5])
>>> torch.as_tensor(array)                         doesn't copy, sharing memory with the numpy array
>>> torch.as_tensor(array, device='cuda')          copies due to incompatible device

- [`torch.randperm`](https://pytorch.org/docs/stable/torch.htmltorch.randperm)  for CUDA tensors https://github.com/pytorch/pytorch/pull/7606
- [`nn.HardShrink`](https://pytorch.org/docs/stable/nn.html?torch.nn.Hardshrink) for CUDA tensors https://github.com/pytorch/pytorch/pull/8117
- [`torch.flip`](https://pytorch.org/docs/stable/torch.htmltorch.flip)  (flips a tensor along specified dims) https://github.com/pytorch/pytorch/pull/7873
- [`torch.flatten`](https://pytorch.org/docs/stable/torch.htmltorch.flatten)  (flattens a contiguous range of dims) https://github.com/pytorch/pytorch/pull/8578
- [`torch.pinverse`](https://pytorch.org/docs/stable/torch.htmltorch.pinverse)  (computes svd-based pseudo-inverse) https://github.com/pytorch/pytorch/pull/9052
- [`torch.meshgrid`](https://pytorch.org/docs/stable/torch.htmltorch.meshgrid)  https://github.com/pytorch/pytorch/pull/8581
- [`torch.unique`](https://pytorch.org/docs/stable/torch.htmltorch.unique)  for CUDA tensors https://github.com/pytorch/pytorch/pull/8899
- [`torch.erfc`](https://pytorch.org/docs/stable/torch.htmltorch.erfc)  (complementary error function) https://github.com/pytorch/pytorch/pull/9366/files
- [`torch.isinf`](https://pytorch.org/docs/stable/torch.htmltorch.isinf)  and [`torch.isfinite`](https://pytorch.org/docs/stable/torch.htmltorch.isfinite)  https://github.com/pytorch/pytorch/pull/9169 https://github.com/pytorch/pytorch/pull/9487
- [`torch.reshape_as`](https://pytorch.org/docs/stable/torch.htmltorch.reshape_as)  https://github.com/pytorch/pytorch/pull/9452
- Support backward for target tensor in [`torch.nn.functional.kl_div`](https://pytorch.org/docs/stable/nn.htmltorch.nn.functional.kl_div)  https://github.com/pytorch/pytorch/pull/7839
- [`torch.logsumexp`](https://pytorch.org/docs/stable/torch.htmltorch.logsumexp)  https://github.com/pytorch/pytorch/pull/7254
- Add batched linear solver to `torch.gesv` https://github.com/pytorch/pytorch/pull/6100
- [`torch.sum`](https://pytorch.org/docs/stable/torch.htmltorch.sum)  now supports summing over multiple dimensions https://github.com/pytorch/pytorch/pull/6152/files
- [`torch.diagonal`](https://pytorch.org/docs/stable/torch.htmltorch.diagonal)  [`torch.diagflat`](https://pytorch.org/docs/stable/torch.htmltorch.diagflat)  to take arbitrary diagonals with numpy semantics https://github.com/pytorch/pytorch/pull/6718
- [`tensor.any`](https://pytorch.org/docs/stable/tensors.htmltorch.ByteTensor.any) and [`tensor.all`](https://pytorch.org/docs/stable/tensors.htmltorch.ByteTensor.all) on `ByteTensor` can now accept `dim` and `keepdim` arguments https://github.com/pytorch/pytorch/pull/4627

Distributions

- Half Cauchy and Half Normal https://github.com/pytorch/pytorch/pull/8411
- Gamma sampling for CUDA tensors https://github.com/pytorch/pytorch/pull/6855
- Allow vectorized counts in Binomial Distribution https://github.com/pytorch/pytorch/pull/6720

Misc

- Autograd automatic anomaly detection for `NaN` and errors occuring in backward. Two functions [detect_anomaly](https://pytorch.org/docs/stable/autograd.htmltorch.autograd.detect_anomaly) and [set_detect_anomaly](https://pytorch.org/docs/stable/autograd.htmltorch.autograd.set_detect_anomaly) are provided for this. https://github.com/pytorch/pytorch/pull/7677
- Support `reversed(torch.Tensor)` https://github.com/pytorch/pytorch/pull/9216
- Support `hash(torch.device)` https://github.com/pytorch/pytorch/pull/9246
- Support `gzip` in [`torch.load`](https://pytorch.org/docs/stable/torch.htmltorch.load)  https://github.com/pytorch/pytorch/pull/6490

Performance

- Accelerate bernoulli number generation on CPU https://github.com/pytorch/pytorch/pull/7171
- Enable cuFFT plan caching (80% speed-up in certain cases) https://github.com/pytorch/pytorch/pull/8344
- Fix unnecessary copying in `bernoulli_` https://github.com/pytorch/pytorch/pull/8682
- Fix unnecessary copying in `broadcast` https://github.com/pytorch/pytorch/pull/8222
- Speed-up multidim `sum` (2x~6x speed-up in certain cases) https://github.com/pytorch/pytorch/pull/8992
- Vectorize CPU `sigmoid` (>3x speed-up in most cases) https://github.com/pytorch/pytorch/pull/8612
- Optimize CPU `nn.LeakyReLU` and `nn.PReLU` (2x speed-up) https://github.com/pytorch/pytorch/pull/9206
- Vectorize `softmax` and `logsoftmax` (4.5x speed-up on single core and 1.8x on 10 threads) https://github.com/pytorch/pytorch/pull/7375
- Speed up `nn.init.sparse` (10-20x speed-up) https://github.com/pytorch/pytorch/pull/6899


Improvements

Tensor printing

- Tensor printing now includes `requires_grad` and `grad_fn` information https://github.com/pytorch/pytorch/pull/8211
- Improve number formatting in tensor print https://github.com/pytorch/pytorch/pull/7632
- Fix scale when printing some tensors https://github.com/pytorch/pytorch/pull/7189
- Speed up printing of large tensors https://github.com/pytorch/pytorch/pull/6876

Neural Networks

- `NaN` is now propagated through many activation functions https://github.com/pytorch/pytorch/pull/8033
- Add `non_blocking` option to nn.Module.to https://github.com/pytorch/pytorch/pull/7312
- Loss modules now allow target to require gradient https://github.com/pytorch/pytorch/pull/8460
- Add `pos_weight` argument to `nn.BCEWithLogitsLoss` https://github.com/pytorch/pytorch/pull/6856
- Support `grad_clip` for parameters on different devices https://github.com/pytorch/pytorch/pull/9302
- Removes the requirement that input sequences to `pad_sequence` have to be sorted https://github.com/pytorch/pytorch/pull/7928
- `stride` argument for `max_unpool1d`, `max_unpool2d`, `max_unpool3d` now defaults to `kernel_size` https://github.com/pytorch/pytorch/pull/7388
- Allowing calling grad mode context managers (e.g., `torch.no_grad`, `torch.enable_grad`) as decorators https://github.com/pytorch/pytorch/pull/7737
- `torch.optim.lr_scheduler._LRSchedulers` `__getstate__` include optimizer info https://github.com/pytorch/pytorch/pull/7757
- Add support for accepting `Tensor` as input in `clip_grad_*` functions https://github.com/pytorch/pytorch/pull/7769
- Return `NaN` in `max_pool`/`adaptive_max_pool` for `NaN` inputs https://github.com/pytorch/pytorch/pull/7670
- `nn.EmbeddingBag` can now handle empty bags in all modes https://github.com/pytorch/pytorch/pull/7389
- `torch.optim.lr_scheduler.ReduceLROnPlateau` is now serializable https://github.com/pytorch/pytorch/pull/7201
- Allow only tensors of floating point dtype to require gradients https://github.com/pytorch/pytorch/pull/7034 and https://github.com/pytorch/pytorch/pull/7185
- Allow resetting of BatchNorm running stats and cumulative moving average https://github.com/pytorch/pytorch/pull/5766
- Set the gradient of `LP-Pool`ing to zero if the sum of all input elements to the power of p is zero https://github.com/pytorch/pytorch/pull/6766

Operators

- Add ellipses ('...') and diagonals (e.g. 'ii→i') to [`torch.einsum`](https://pytorch.org/docs/stable/torch.htmltorch.einsum)  https://github.com/pytorch/pytorch/pull/7173
- Add `to` method for `PackedSequence` https://github.com/pytorch/pytorch/pull/7319
- Add support for `__floordiv__` and `__rdiv__` for integral tensors https://github.com/pytorch/pytorch/pull/7245
- [`torch.clamp`](https://pytorch.org/docs/stable/torch.htmltorch.clamp)  now has subgradient 1 at min and max https://github.com/pytorch/pytorch/pull/7049
- [`torch.arange`](https://pytorch.org/docs/stable/torch.htmltorch.arange)  now uses NumPy-style type inference: https://github.com/pytorch/pytorch/pull/7016
- Support infinity norm properly in [`torch.norm`](https://pytorch.org/docs/stable/torch.htmltorch.norm)  and [`torch.renorm`](https://pytorch.org/docs/stable/torch.htmltorch.renorm)  https://github.com/pytorch/pytorch/pull/6969
- Allow passing an output tensor via `out=` keyword arugment in [`torch.dot`](https://pytorch.org/docs/stable/torch.htmltorch.dot)  and [`torch.matmul`](https://pytorch.org/docs/stable/torch.htmltorch.matmul)  https://github.com/pytorch/pytorch/pull/6961

Distributions

- Always enable grad when calculating `lazy_property` https://github.com/pytorch/pytorch/pull/7708

Sparse Tensor

- Add log1p for sparse tensor https://github.com/pytorch/pytorch/pull/8969
- Better support for adding zero-filled sparse tensors https://github.com/pytorch/pytorch/pull/7479

Data Parallel

- Allow modules that return scalars in `nn.DataParallel` https://github.com/pytorch/pytorch/pull/7973
- Allow `nn.parallel.parallel_apply` to take in a list/tuple of tensors https://github.com/pytorch/pytorch/pull/8047

Misc

- `torch.Size` can now accept PyTorch scalars https://github.com/pytorch/pytorch/pull/5676
- Move `torch.utils.data.dataset.random_split` to torch.utils.data.random_split, and `torch.utils.data.dataset.Subset` to `torch.utils.data.Subset` https://github.com/pytorch/pytorch/pull/7816
- Add serialization for `torch.device` https://github.com/pytorch/pytorch/pull/7713
- Allow copy.deepcopy of `torch.(int/float/...)*` dtype objects https://github.com/pytorch/pytorch/pull/7699
- [`torch.load`](https://pytorch.org/docs/stable/torch.htmltorch.load)  can now take a `torch.device` as map location https://github.com/pytorch/pytorch/pull/7339

Bug Fixes

- Fix [`nn.BCELoss`](https://pytorch.org/docs/stable/nn.html?torch.nn.BCELoss) sometimes returning negative results https://github.com/pytorch/pytorch/pull/8147
- Fix `tensor._indices` on scalar sparse tensor giving wrong result https://github.com/pytorch/pytorch/pull/8197
- Fix backward of `tensor.as_strided` not working properly when input has overlapping memory https://github.com/pytorch/pytorch/pull/8721
- Fix `x.pow(0)` gradient when x contains 0 https://github.com/pytorch/pytorch/pull/8945
- Fix CUDA [`torch.svd`](https://pytorch.org/docs/stable/torch.htmltorch.svd)  and [`torch.eig`](https://pytorch.org/docs/stable/torch.htmltorch.eig)  returning wrong results in certain cases https://github.com/pytorch/pytorch/pull/9082
- Fix `nn.MSELoss` having low precision https://github.com/pytorch/pytorch/pull/9287
- Fix segmentation fault when calling `torch.Tensor.grad_fn` https://github.com/pytorch/pytorch/pull/9292
- Fix [`torch.topk`](https://pytorch.org/docs/stable/torch.htmltorch.topk)  returning wrong results when input isn't contiguous https://github.com/pytorch/pytorch/pull/9441
- Fix segfault in convolution on CPU with large `inputs` / `dilation` https://github.com/pytorch/pytorch/pull/9274
- Fix `avg_pool2/3d` `count_include_pad` having default value `False` (should be `True`) https://github.com/pytorch/pytorch/pull/8645
- Fix `nn.EmbeddingBag`'s `max_norm` option https://github.com/pytorch/pytorch/pull/7959
- Fix returning scalar input in Python autograd function https://github.com/pytorch/pytorch/pull/7934
- Fix THCUNN `SpatialDepthwiseConvolution` assuming contiguity https://github.com/pytorch/pytorch/pull/7952
- Fix bug in seeding random module in `DataLoader` https://github.com/pytorch/pytorch/pull/7886
- Don't modify variables in-place for [`torch.einsum`](https://pytorch.org/docs/stable/torch.htmltorch.einsum)  https://github.com/pytorch/pytorch/pull/7765
- Make return uniform in lbfgs step https://github.com/pytorch/pytorch/pull/7586
- The return value of `uniform.cdf()` is now clamped to `[0..1]` https://github.com/pytorch/pytorch/pull/7538
- Fix advanced indexing with negative indices https://github.com/pytorch/pytorch/pull/7345
- `CUDAGenerator` will not initialize on the current device anymore, which will avoid unnecessary memory allocation on `GPU:0` https://github.com/pytorch/pytorch/pull/7392
- Fix `tensor.type(dtype)` not preserving device https://github.com/pytorch/pytorch/pull/7474
- Batch sampler should return the same results when used alone or in dataloader with `num_workers` > 0 https://github.com/pytorch/pytorch/pull/7265
- Fix broadcasting error in LogNormal, TransformedDistribution https://github.com/pytorch/pytorch/pull/7269
- Fix [`torch.max`](https://pytorch.org/docs/stable/torch.htmltorch.max) and [`torch.min`](https://pytorch.org/docs/stable/torch.htmltorch.min)  on CUDA in presence of `NaN` https://github.com/pytorch/pytorch/pull/7052
- Fix [`torch.tensor`](https://pytorch.org/docs/stable/torch.htmltorch.tensor) device-type calculation when used with CUDA https://github.com/pytorch/pytorch/pull/6995
- Fixed a missing `'='` in `nn.LPPoolNd` repr function https://github.com/pytorch/pytorch/pull/9629

Documentation

- Expose and document `torch.autograd.gradcheck` and `torch.autograd.gradgradcheck` https://github.com/pytorch/pytorch/pull/8166
- Document `tensor.scatter_add_` https://github.com/pytorch/pytorch/pull/9630
- Document variants of [`torch.add`](https://pytorch.org/docs/stable/torch.htmltorch.add) and `tensor.add_`, e.g. `tensor.add(value=1, other)` -> Tensor https://github.com/pytorch/pytorch/pull/9027
- Document [`torch.logsumexp`](https://pytorch.org/docs/stable/torch.htmltorch.logsumexp)  https://github.com/pytorch/pytorch/pull/8428
- Document [`torch.sparse_coo_tensor`](https://pytorch.org/docs/stable/torch.htmltorch.sparse_coo_tensor)  https://github.com/pytorch/pytorch/pull/8152
- Document [`torch.utils.data.dataset.random_split`](https://pytorch.org/docs/stable/data.html?torch.utils.data.random_split) https://github.com/pytorch/pytorch/pull/7676
- Document [`torch.nn.GroupNorm`](https://pytorch.org/docs/stable/nn.html?torch.nn.GroupNorm) https://github.com/pytorch/pytorch/pull/7086
- A lot of other various documentation improvements including RNNs, `ConvTransposeNd`, `Fold`/`Unfold`, `Embedding`/`EmbeddingBag`, Loss functions, etc.

0.4.0

PyTorch 0.4.0 release notes

Table of Contents

- Major Core Changes
- Tensor / Variable merged
- Zero-dimensional Tensors
- dtypes
- migration guide
- New Features
- Tensors
- Full support for advanced indexing
- Fast Fourier Transforms
- Neural Networks
- Trade-off memory for compute
- bottleneck - a tool to identify hotspots in your code
- torch.distributions
- 24 basic probability distributions
- Added cdf, variance, entropy, perplexity etc.
- Distributed Training
- Launcher utility for ease of use
- NCCL2 backend
- C++ Extensions
- Windows Support
- ONNX Improvements
- RNN support
- Performance improvements
- Bug fixes

Major Core changes

Here is a summary of the updates to the most important core features users will use daily.

**Major Changes and Potentially Breaking Changes:**
* ``Tensors`` and ``Variables`` have merged
* Some operations now return 0-dimensional (scalar) ``Tensors``
* Deprecation of the ``volatile`` flag

**Improvements:**
* ``dtypes``, ``devices``, and Numpy-style ``Tensor`` creation functions added
* Support for writing device-agnostic code


We wrote a [migration guide](http://pytorch.org/2018/04/22/0_4_0-migration-guide.html) that should help you transition your code to new APIs and style. Please read it if you have code in a previous version of PyTorch that you would like to migrate.

**Please read the [migration guide](http://pytorch.org/2018/04/22/0_4_0-migration-guide.html) if you have code in a previous version of PyTorch that you would like to migrate.**
**Please read the [migration guide](http://pytorch.org/2018/04/22/0_4_0-migration-guide.html) if you have code in a previous version of PyTorch that you would like to migrate.**
**Please read the [migration guide](http://pytorch.org/2018/04/22/0_4_0-migration-guide.html) if you have code in a previous version of PyTorch that you would like to migrate.**

The contents of this section (Major Core changes) are included in the [migration guide](http://pytorch.org/2018/04/22/0_4_0-migration-guide.html).

Merging [``Tensor``](http://pytorch.org/docs/0.4.0/tensors.html) and ``Variable`` classes

``torch.autograd.Variable`` and [``torch.Tensor``](http://pytorch.org/docs/0.4.0/tensors.html) are now the same class.  More precisely, [``torch.Tensor``](http://pytorch.org/docs/0.4.0/tensors.html) is capable of tracking history and behaves like the old ``Variable``; ``Variable`` wrapping continues to work as before but returns an object of type [``torch.Tensor``](http://pytorch.org/docs/0.4.0/tensors.html).  This means that you don't need the ``Variable`` wrapper everywhere in your code anymore.

The `type()` of a [``Tensor``](http://pytorch.org/docs/0.4.0/tensors.html) has changed

Note also that the ``type()`` of a Tensor no longer reflects the data type. Use ``isinstance()`` or ``x.type()`` instead:

python
>>> x = torch.DoubleTensor([1, 1, 1])
>>> print(type(x))  was torch.DoubleTensor
<class 'torch.autograd.variable.Variable'>
>>> print(x.type())   OK: 'torch.DoubleTensor'
'torch.DoubleTensor'
>>> print(isinstance(x, torch.DoubleTensor))   OK: True
True


When does [``autograd``](http://pytorch.org/docs/0.4.0/autograd.html) start tracking history now?

``requires_grad``, the central flag for [``autograd``](http://pytorch.org/docs/0.4.0/autograd.html), is now an attribute on ``Tensor``s. Let's see how this change manifests in code.

[``autograd``](http://pytorch.org/docs/0.4.0/autograd.html) uses the same rules previously used for ``Variable``s. It starts tracking history when any input ``Tensor`` of an operation has ``requires_grad=True``. For example,

python
>>> x = torch.ones(1)   create a tensor with requires_grad=False (default)
>>> x.requires_grad
False
>>> y = torch.ones(1)   another tensor with requires_grad=False
>>> z = x + y
>>>  both inputs have requires_grad=False. so does the output
>>> z.requires_grad
False
>>>  then autograd won't track this computation. let's verify!
>>> z.backward()
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
>>>
>>>  now create a tensor with requires_grad=True
>>> w = torch.ones(1, requires_grad=True)
>>> w.requires_grad
True
>>>  add to the previous result that has require_grad=False
>>> total = w + z
>>>  the total sum now requires grad!
>>> total.requires_grad
True
>>>  autograd can compute the gradients as well
>>> total.backward()
>>> w.grad
tensor([ 1.])
>>>  and no computation is wasted to compute gradients for x, y and z, which don't require grad
>>> z.grad == x.grad == y.grad == None
True


Manipulating ``requires_grad`` flag

Other than directly setting the attribute, you can change this flag **in-place** using [``my_tensor.requires_grad_(requires_grad=True)``](http://pytorch.org/docs/0.4.0/tensors.htmltorch.Tensor.requires_grad_), or, as in the above example, at creation time by passing it in as an argument (default is ``False``), e.g.,

python
>>> existing_tensor.requires_grad_()
>>> existing_tensor.requires_grad
True
>>> my_tensor = torch.zeros(3, 4, requires_grad=True)
>>> my_tensor.requires_grad
True


What about ``.data``?

``.data`` was the primary way to get the underlying ``Tensor`` from a ``Variable``. After this merge, calling ``y = x.data`` still has similar semantics. So ``y`` will be a ``Tensor`` that shares the same data with ``x``, is unrelated with the computation history of ``x``, and has ``requires_grad=False``.

However, ``.data`` can be unsafe in some cases. Any changes on ``x.data`` wouldn't be tracked by ``autograd``, and the computed gradients would be incorrect if ``x`` is needed in a backward pass. A safer alternative is to use [``x.detach()``](http://pytorch.org/docs/0.4.0/autograd.htmltorch.Tensor.detach), which also returns a ``Tensor`` that shares data with ``requires_grad=False``, but will have its in-place changes reported by ``autograd`` if ``x`` is needed in backward.


Some operations now return 0-dimensional (scalar) ``Tensors``

Previously, indexing into a ``Tensor`` vector (1-dimensional tensor) gave a Python number but indexing into a ``Variable`` vector gave (incosistently!) a vector of size ``(1,)``!  Similar behavior existed with reduction functions, i.e. `tensor.sum()` would return a Python number, but `variable.sum()` would retun a vector of size `(1,)`.

Fortunately, this release introduces proper scalar (0-dimensional tensor) support in PyTorch!  Scalars can be created using the new `torch.tensor` function (which will be explained in more detail later; for now just think of it as the PyTorch equivalent of `numpy.array`).  Now you can do things like:

python
>>> torch.tensor(3.1416)          create a scalar directly
tensor(3.1416)
>>> torch.tensor(3.1416).size()   scalar is 0-dimensional
torch.Size([])
>>> torch.tensor([3]).size()      compare to a vector of size 1
torch.Size([1])
>>>
>>> vector = torch.arange(2, 6)   this is a vector
>>> vector
tensor([ 2.,  3.,  4.,  5.])
>>> vector.size()
torch.Size([4])
>>> vector[3]                     indexing into a vector gives a scalar
tensor(5.)
>>> vector[3].item()              .item() gives the value as a Python number

0.3.1

Binaries

- Removed support for CUDA capability 3.0 and 5.0 (they still work for source builds for now, but the commitment to support this forward is removed)

0.3.0

Table of contents

- Breaking changes: removed `reinforce()`
- New features
- Unreduced losses
- A profiler for the autograd engine
- More functions support Higher order gradients
- New features in Optimizers
- New layers and nn functionality
- New Tensor functions and Features
- Other additions
- API changes
- Performance improvements
- Big reduction in framework overhead (helps small models)
- 4x to 256x faster Softmax/LogSoftmax
- More...
- Framework Interoperability
- DLPack Interoperability
- Model Exporter to ONNX (ship PyTorch to Caffe2, CoreML, CNTK, MXNet, Tensorflow)
- Bug Fixes (a lot of them)

Breaking changes

Stochastic functions, i.e. `Variable.reinforce()` were removed because of their limited functionality and broad performance implications. The motivation for stochastic functions was to avoid book-keeping of sampled values. In practice, users were still book-keeping in their code for various reasons. We constructed an alternative, equally effective API, but did not have a reasonable deprecation path to the new API. Hence this removal is a breaking change.

We introduce the [torch.distributions](http://pytorch.org/docs/0.3.0/distributions.html) package to replace Stochastic functions.

Your previous code typically looked like this:

python
probs = policy_network(state)
action = probs.multinomial()
next_state, reward = env.step(action)
action.reinforce(reward)
action.backward()

This is the new equivalent code:

python
probs = policy_network(state)
NOTE: categorical is equivalent to what used to be called multinomial
m = torch.distributions.Categorical(probs)
action = m.sample()
next_state, reward = env.step(action)
loss = -m.log_prob(action) * reward
loss.backward()


New features

Unreduced losses

Now, Some loss functions can compute per-sample losses in a mini-batch
- By default PyTorch sums losses over the mini-batch and returns a single scalar loss. This was limiting to users.
- Now, a subset of loss functions allow specifying `reduce=False` to return individual losses for each sample in the mini-batch
- Example: `loss = nn.CrossEntropyLoss(..., reduce=False)`
- Currently supported losses: `MSELoss`, `NLLLoss`, `NLLLoss2d`, `KLDivLoss`, `CrossEntropyLoss`, `SmoothL1Loss`, `L1Loss`
- More loss functions will be covered in the next release

An in-built Profiler in the autograd engine

We built a low-level profiler to help you identify bottlenecks in your models

Let us start with an example:


>>> x = Variable(torch.randn(1, 1), requires_grad=True)
>>> with torch.autograd.profiler.profile() as prof:
...     y = x ** 2
...     y.backward()
>>>  NOTE: some columns were removed for brevity
... print(prof)
--------------------------------  ----------  ---------
Name                               CPU time   CUDA time
-------------------------------   ----------  ---------
PowConstant                        142.036us    0.000us
N5torch8autograd9GraphRootE         63.524us    0.000us
PowConstantBackward                184.228us    0.000us
MulConstant                         50.288us    0.000us
PowConstant                         28.439us    0.000us
Mul                                 20.154us    0.000us
N5torch8autograd14AccumulateGradE   13.790us    0.000us
N5torch8autograd5CloneE              4.088us    0.000us


The profiler works for both CPU and CUDA models.
For CUDA models, you have to run your python program with a special `nvprof` prefix. For example:


nvprof --profile-from-start off -o trace_name.prof -- python <your arguments>

in python
>>> with torch.cuda.profiler.profile():
...     model(x)  Warmup CUDA memory allocator and profiler
...     with torch.autograd.profiler.emit_nvtx():
...         model(x)


Then, you can load `trace_name.prof` in PyTorch and print a summary profile report.


>>> prof = torch.autograd.profiler.load_nvprof('trace_name.prof')
>>> print(prof)


[Read additional documentation here](http://pytorch.org/docs/0.3.0/autograd.htmlprofiler)


Higher order gradients

Added higher-order gradients support for the following layers

- ConvTranspose, AvgPool1d, AvgPool2d, LPPool2d, AvgPool3d, MaxPool1d, MaxPool2d, AdaptiveMaxPool, AdaptiveAvgPool, FractionalMaxPool2d, MaxUnpool1d, MaxUnpool2d, nn.Upsample, ReplicationPad2d, ReplicationPad3d, ReflectionPad2d
- PReLU, HardTanh, L1Loss, SoftSign, ELU, RReLU, Hardshrink, Softplus, SoftShrink, LogSigmoid, Softmin, GLU
- MSELoss, SmoothL1Loss, KLDivLoss, HingeEmbeddingLoss, SoftMarginLoss, MarginRankingLoss, CrossEntropyLoss
- DataParallel

Optimizers

- [optim.SparseAdam](http://pytorch.org/docs/0.3.0/optim.htmltorch.optim.SparseAdam): Implements a lazy version of Adam algorithm suitable for sparse tensors.
- In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters.
- Optimizers now have an [add_param_group](http://pytorch.org/docs/0.3.0/optim.htmltorch.optim.Optimizer.add_param_group) function that lets you add new parameter groups to an already constructed optimizer.

New layers and nn functionality

- Added AdpativeMaxPool3d and AdaptiveAvgPool3d
- Added LPPool1d
- [F.pad](http://pytorch.org/docs/master/nn.htmltorch.nn.functional.pad) now has support for:
- 'reflection' and 'replication' padding on 1d, 2d, 3d signals (so 3D, 4D and 5D Tensors)
- constant padding on n-d signals
- nn.Upsample now works for 1D signals (i.e. B x C x L Tensors) in `nearest` and `linear` modes.
- [grid_sample](http://pytorch.org/docs/0.3.0/nn.html?highlight=grid_samplertorch.nn.functional.grid_sample) now allows padding with the border value via `padding_mode="border"`. `grid_sample` expects a grid in the range of `[-1, 1]`, and if the values are out of these bounds, padding with the value `0.0` is applied by default. However, in a lot of cases, using the border value (i.e. the nearest valid value) helps improve accuracy of the overall model.
- Introducing `nn.utils.parameters_to_vector` and `nn.utils.vector_to_parameters`
- `parameters_to_vector` takes `net.parameters()` and return a 1D vector that contains all the parameters
- `vector_to_parameters` takes a vector of flattened parameters and copies the values over to a network's parameters
- Convenient for some reinforcement learning algorithms, such as cross-entropy method, TRPO etc., which need to pull all network parameters as one big vector, modify them, and put the modified vector back.
- Allow user to not specify certain input dimensions for `AdaptivePool*d` and infer them at runtime.
- For example:
python
target output size of 10x7
m = nn.AdaptiveMaxPool2d((None, 7))

- DataParallel container on CPU is now a no-op (instead of erroring out)


New Tensor functions and features
- Introduced `torch.erf` and `torch.erfinv` that compute the error function and the inverse error function of each element in the Tensor.
- adds broadcasting support to bitwise operators
- Added `Tensor.put_` and `torch.take` similar to `numpy.take` and `numpy.put`.
- The take function allows you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices.
- The put function copies value into a tensor also using linear indices.
- Differences from `numpy` equivalents:
- `numpy.take` has an optional axis argument, which behaves like `index_select`. This `axis` argument is not yet present.
- `numpy.put` repeats the values if necessary to make them as long as indices. This behavior is not yet replicated.
- add `zeros` and `zeros_like` for sparse Tensors.
- 1-element Tensors can now be casted to Python scalars. For example: `int(torch.Tensor([5]))` works now.

Other additions

- Added `torch.cuda.get_device_name` and `torch.cuda.get_device_capability` that do what the names say. Example:
python
>>> torch.cuda.get_device_name(0)
'Quadro GP100'
>>> torch.cuda.get_device_capability(0)
(6, 0)

- If one sets ` torch.backends.cudnn.deterministic = True`, then the CuDNN convolutions use deterministic algorithms
- `torch.cuda_get_rng_state_all` and `torch.cuda_set_rng_state_all` are introduced to let you save / load the state of the random number generator over all GPUs at once
- `torch.cuda.emptyCache()` frees the cached memory blocks in PyTorch's caching allocator. This is useful when having long-running ipython notebooks while sharing the GPU with other processes.


API changes

- `softmax` and `log_softmax` now take a `dim` argument that specifies the dimension in which slices are taken for the softmax operation. `dim` allows negative dimensions as well (`dim = -1` will be the last dimension)
- `torch.potrf` (Cholesky decomposition) is now differentiable and defined on `Variable`
- Remove all instances of `device_id` and replace it with `device`, to make things consistent
- `torch.autograd.grad` now allows you to specify inputs that are unused in the autograd graph if you use `allow_unused=True`
This gets useful when using `torch.autograd.grad` in large graphs with lists of inputs / outputs
For example:
python
x, y = Variable(...), Variable(...)
torch.autograd.grad(x * 2, [x, y])  errors
torch.autograd.grad(x * 2, [x, y], allow_unused=True)  works

- `pad_packed_sequence` now allows a `padding_value` argument that can be used instead of zero-padding
- `Dataset` now has a `+` operator (which uses `ConcatDataset`). You can do something like `MNIST(...) + FashionMNIST(...)` for example, and you will get a concatenated dataset containing samples from both.
- `torch.distributed.recv` allows Tensors to be received from any sender (hence, `src` is optional). `recv` returns the rank of the sender.
- adds `zero_()` to `Variable`
- `Variable.shape` returns the size of the Tensor (now made consistent with Tensor)
- `torch.version.cuda` specifies the CUDA version that PyTorch was compiled with
- Add a missing function `random_` for CUDA.
- torch.load and torch.save can now take a `pathlib.Path` object, which is a standard Python3 typed filepath object
- If you want to load a model's `state_dict` into another model (for example to fine-tune a pre-trained network), `load_state_dict` was strict on matching the key names of the parameters. Now we provide a `strict=False` option to `load_state_dict` where it only loads in parameters where the keys match, and ignores the other parameter keys.
- added `nn.functional.embedding_bag` that is equivalent to `nn.EmbeddingBag`


Performance Improvements

- The overhead of `torch` functions on Variables was around 10 microseconds. This has been brought down to ~1.5 microseconds by moving most of the core autograd formulas into C++ using our ATen library. This speeds-up models that are very small, such as small LSTMs and other common models seen in NLP.
- softmax and log_softmax are now [4x to 256x faster](https://github.com/pytorch/pytorch/pull/3245issue-267805013) on the GPU after rewriting the gpu kernels
- 2.5x to 3x performance improvement of the distributed AllReduce (gloo backend) by enabling GPUDirect
- nn.Embedding's renorm option is much faster on the GPU. For embedding dimensions of `100k x 128` and a batch size of 1024, it is 33x faster.
- All pointwise ops now use OpenMP and get multi-core CPU benefits
- Added dedicated CUDA kernels for group convolutions where `groups == nInputPlane` (depthwise convolution). Speedups range from 5x to 1000x for tested layer sizes. See the [benchmark table](https://github.com/pytorch/pytorch/pull/3057issuecomment-336519873) for more details as well as [this table](https://github.com/pytorch/pytorch/pull/3265issue-268106225).
- Fixed `optim.SGD`'s memory usage for sparse gradients (for ex. `nn.Embedding(..., sparse=True)`), reducing the usage on a user-provided test script by 10x.
- Optional NNPack integration for faster CPU convolutions (not part of binaries)
- Reduce overhead of broadcasting if Tensors aren't broadcastable
- `torch.nn.utils.weight_norm` over the right-most dimensions is faster
- Backward of `torch.norm` is sped up by ~1.5x
- Improve the performance of `pack_padded_sequence`
- Add a single-argument version of `torch.arange`. For example `torch.arange(10)`

Framework Interoperability

DLPack Interoperability

[DLPack Tensors](https://github.com/dmlc/dlpack) are cross-framework Tensor formats. We now have `torch.utils.to_dlpack(x)` and `torch.utils.from_dlpack(x)` to convert between DLPack and torch Tensor formats. The conversion has zero memory copy and hence is very efficient.

Model exporter to ONNX

[ONNX](http://onnx.ai) is a common model interchange format that can be executed in Caffe2, CoreML, CNTK, MXNet, Tensorflow at the moment. PyTorch models that are ConvNet-like and RNN-like (static graphs) can now be shipped to the ONNX format.

- There is a new module torch.onnx (http://pytorch.org/docs/0.3.0/onnx.html) which provides the API for exporting ONNX models.

- The operations supported in this release are:
- add, sub (nonzero alpha not supported), mul, div, cat, mm, addmm, neg, tanh, sigmoid, mean, t, transpose, view, split, squeeze
- expand (only when used before a broadcasting ONNX operator; e.g., add)
- prelu (single weight shared among input channels not supported)
- threshold (non-zero threshold/non-zero value not supported)
- Conv, ConvTranspose, BatchNorm, MaxPool, RNN, Dropout, ConstantPadNd, Negate
- elu, leaky_relu, glu, softmax, log_softmax, avg_pool2d
- unfold (experimental support with ATen-Caffe2 integration)
- Embedding (no optional arguments supported)
- RNN
- FeatureDropout (training mode not supported)
- Index (constant integer and tuple indices supported)

Usability Improvements

- More cogent error messages during indexing of Tensors / Variables
Breaking changes
- Add proper error message for specifying dimension on a tensor with no dimensions
- better error messages for Conv*d input shape checking
- More user-friendly error messages for LongTensor indexing
- Better error messages and argument checking for Conv*d routines
- Trying to construct a Tensor from a Variable fails more appropriately
- If you are using a PyTorch binary with insufficient CUDA version, then a `warning` is printed to the user.
- Fixed incoherent error messages in `load_state_dict`
- Fix error message for type mismatches with sparse tensors

Bug fixes

torch

- Fix CUDA lazy initialization to not trigger on calls to `torch.manual_seed` (instead, the calls are queued and run when CUDA is initialized)

Tensor

- if `x` is 2D, `x[[0, 3],]` was needed to trigger advanced indexing. The trailing comma is no longer needed, and you can do `x[[0, 3]]`
- `x.sort(descending=True)` used to incorrectly fail for Tensors. Fixed a bug in the argument checking logic to allow this.
- Tensor constructors with numpy input: `torch.DoubleTensor(np.array([0,1,2], dtype=np.float32))`
- torch will now copy the contents of the array in a storage of appropriate type.
- If types match, it will share the underlying array (no-copy), with equivalent semantics to initializing a tensor with another tensor.
- On CUDA, `torch.cuda.FloatTensor(np.random.rand(10,2).astype(np.float32))` will now work by making a copy.
- `ones_like` and `zeros_like` now create Tensors on the same device as the original Tensor
- `torch.multinomial` on the CPU would reshape the input `prob_dist` in-place. Fixed this to make sure the `prob_dist` input's shape is unchanged after the call to `multinomial`
- `expand` and `expand_as` allow expanding an empty Tensor to another empty Tensor
- when `[..., None, ...]` was given (i.e. newaxis placement in indexing was specified), PyTorch had different behavior from NumPy. This is made consistent with NumPy in all cases.
- Fix exponential distribution implementation to never sample infinity - cuRAND returns numbers in (0, 1]
- torch.HalfTensor supports `numpy()` and `torch.from_numpy`
- Add additional size checking for `torch.scatter`
- fix `torch.tril` and `torch.triu` on the GPU for storage-offset Tensors (would return incorrect result).
- Fix a memory leak in CUDA qr decomposition
- Fix stream-awareness issues in THCUNN kernels
- Fix kwargs parsing in `torch.topk`
- Fixed `random_` on CPU (which previously had a max value of 2^32) for DoubleTensor and LongTensor
- Fix `ZeroDivisionError: float division by zero` when printing certain Tensors
- `torch.gels` when `m > n` had a truncation bug on the CPU and returned incorrect results. Fixed.
- Add a check in tensor.numpy() that checks if no positional arguments are passed
- Before a Tensor is moved to CUDA pinned memory, added a check to ensure that it is `contiguous`
- `any` and `all` work on empty Tensors on the cpu (previously errored out)
- Fix `symeig` on CUDA for large matrices. The bug is that not enough space was being allocated for the workspace, causing some undefined behavior.
- Improved the numerical stability of `torch.var` and `torch.std` by using Welford's algorithm
- The Random Number Generator returned `uniform` samples with inconsistent bounds (inconsistency in cpu implementation and running into a cublas bug).
- Now, all `uniform` sampled numbers will return within the bounds `[0, 1)`, across all types and devices
- Fix `torch.svd` to not segfault on large CUDA Tensors (fixed an overflow error in the magma bindings)
- Allows empty index Tensor for `index_select` (instead of erroring out)
- Previously when `eigenvector=False`, `symeig` returns some unknown value for the eigenvectors. Now we zero them out.

sparse

- Fix bug with 'coalesced' calculation in sparse 'cadd'
- Fixes `.type()` not converting indices tensor.
- Fixes sparse tensor coalesce on the GPU in corner cases


autograd

- Fixed crashes when calling backwards on leaf variable with requires_grad=False
- fix bug on Variable `type()` around non-default GPU input.
- when `torch.norm` returned `0.0`, the gradient was `NaN`. We now use the subgradient at `0.0`, so the gradient is `0.0`.
- Fix an correctness issue with advanced indexing and higher-order gradients
- `torch.prod`'s backward was failing on the GPU due to a type error, fixed.
- Advanced Indexing on Variables now allows the index to be a LongTensor backed Variable
- Variable.cuda() and Tensor.cuda() are consistent in kwargs options

optim

- `torch.optim.lr_scheduler` is now imported by default.

nn

- Returning a dictionary from a nn.Module's forward function is now supported (used to throw an error)
- When `register_buffer("foo", ...)` is called, and self.foo already exists, then instead of silently failing, now raises a `KeyError`
- Fixed loading of older checkpoints of RNN/LSTM which were missing `_data_ptrs` attributes.
- `nn.Embedding` had a hard error when using the `max_norm` option. This is fixed now.
- when using the `max_norm` option, the passed-in indices are written upon (by the underlying implementation). To fix this, pass a clone of the indices to the renorm kernel.
- `F.affine_grid` now can take non-contiguous inputs
- EmbeddingBag can accept both 1D and 2D inputs now.
- Workaround a CuDNN bug where batch sizes greater than 131070 fail in CuDNN BatchNorm
- fix nn.init.orthogonal to correctly return orthonormal vectors when rows < cols
- if BatchNorm has only `1` value per channel in total, raise an error in training mode.
- Make cuDNN bindings respect the current cuda stream (previously raised incoherent error)
- fix grid_sample backward when gradOutput is a zero-strided Tensor
- Fix a segmentation fault when reflection padding is out of Tensor bounds.
- If LogSoftmax has only 1 element, `-inf` was returned. Now this correctly returns `0.0`
- Fix pack_padded_sequence to accept inputs of arbitrary sizes (not just 3D inputs)
- Detect pointer aliasing in cuDNN RNN flatten_parameters and avoid that path.
- Fixed ELU higher order gradients when applied in-place
- Workaround a CuDNN RNN bug for half-precision
- Prevent numerical issues with `poisson_nll_loss` when `log_input=False` by adding a small epsilon

distributed and multi-gpu

- Allow kwargs-only inputs to DataParallel. This used to fail: `n = nn.DataParallel(Net()); out = n(input=i)`
- DistributedDataParallel calculates num_samples correctly in python2
- Fix the case of DistributedDataParallel when 1-GPU per process is used.
- Fixed DataParallel to specify GPUs that don't include GPU-0
- DistributedDataParallel's exit doesn't error out anymore, the daemon flag is set.
- Fix a bug in DistributedDataParallel in the case when model has no `buffers` (previously raised incoherent error)
- Fix `__get_state__` to be functional in `DistributedDataParallel` (was returning nothing)
- Fix a deadlock in the NCCL bindings when GIL and CudaFreeMutex were starving each other

Others

- `model.zoo.load_url` now first attempts to use the `requests` library if available, and then falls back to `urllib`
- Fix error when default_collate is passed a collection of `numpy.str_`

0.2.0

Here comes the next major release of PyTorch, just in time for ICML.  Install it today from our website http://pytorch.org
Package documentation for this release is available at [http://pytorch.org/docs/0.2.0/](http://pytorch.org/docs/0.2.0/)

We're introducing long-awaited features such as Broadcasting, Advanced Indexing, Higher-order gradients and finally: Distributed PyTorch.

**Due to introducing Broadcasting, the code behavior for certain broadcastable situations is different from behavior in 0.1.12. This might lead to silent bugs in your existing code. We've provided easy ways of identifying this ambiguous code in the *Important Breakages and Workarounds* section.**

Table of contents:
- Tensor Broadcasting (numpy-style)
- Advanced Indexing for Tensors and Variables
- Higher-order gradients
- Distributed PyTorch (multi-node training, etc.)
- Neural Network layers and features: SpatialTransformers, WeightNorm, EmbeddingBag, etc.
- New in torch and autograd: matmul, inverse, etc.
- Easier debugging, better error messages
- Bug Fixes
- **Important Breakages and Workarounds**

Tensor Broadcasting (numpy-style)

In short, if a PyTorch operation supports broadcasting, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data).

PyTorch Broadcasting semantics [closely follow numpy-style broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.htmlmodule-numpy.doc.broadcasting); if you are familiar with numpy broadcasting, things should just work as expected.

General Semantics

Two tensors are “broadcastable” if the following rules hold:
- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

For Example:

python
>>> x=torch.FloatTensor(5,7,3)
>>> y=torch.FloatTensor(5,7,3)
same shapes are always broadcastable (i.e. the above rules always hold)

can line up trailing dimensions
>>> x=torch.FloatTensor(5,3,4,1)
>>> y=torch.FloatTensor(  3,1,1)

x and y are broadcastable.
1st trailing dimension: both have size 1
2nd trailing dimension: y has size 1
3rd trailing dimension: x size == y size
4th trailing dimension: y dimension doesn't exist

but:
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor(  3,1,1)
x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3


If two tensors x, y are "broadcastable", the resulting tensor size is calculated as follows:
- If the number of dimensions of x and y are not equal, prepend 1 to the dimensions of the tensor with fewer dimensions to make them equal length.
- Then, for each dimension size, the resulting dimension size is the max of the sizes of x and y along that dimension.

For Example:

python
can line up trailing dimensions to make reading easier
>>> x=torch.FloatTensor(5,1,4,1)
>>> y=torch.FloatTensor(  3,1,1)
>>> (x+y).size()
torch.Size([5, 3, 4, 1])

error case
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor(  3,1,1)
>>> (x+y).size()
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1


More details [can be found on the PyTorch documentation site](http://pytorch.org/docs/0.2.0/notes/broadcasting.html).  Also, each torch function lists its broadcasting semantics in the documentation.

Advanced Indexing for Tensors and Variables

PyTorch now supports a subset of NumPy style [advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.htmladvanced-indexing). This allows users to select arbitrary indices at each dimension of the Tensor, including non-adjacent indices and duplicate indices, using the same `[]`-style operation. This allows for a more flexible indexing strategy without needing calls to PyTorch's `Index[Select, Add, ...]`  functions.

Let's look at some examples:

python
x = torch.Tensor(5, 5, 5)


**Pure Integer Array Indexing - specify arbitrary indices at each dimension**

python
x[[1, 2], [3, 2], [1, 0]]
--> yields a 2-element Tensor (x[1][3][1], x[2][2][0])


**also supports broadcasting, duplicates**

python
x[[2, 3, 2], [0], [1]]
--> yields a 3-element Tensor (x[2][0][1], x[3][0][1], x[2][0][1])


**arbitrary indexer shapes allowed**

python
x[[[1, 0], [0, 1]], [0], [1]].shape
--> yields a 2x2 Tensor [[x[1][0][1], x[0][0][1]],
[x[0][0][1], x[1][0][1]]]


**can use colon, ellipse**

python
x[[0, 3], :, :]
x[[0, 3], ...]
--> both yield a 2x5x5 Tensor [x[0], x[3]]


**also use Tensors to index!**

python
y = torch.LongTensor([0, 2, 4])
x[y, :, :]
--> yields a 3x5x5 Tensor [x[0], x[2], x[4]]


**selection with less than ndim, note the use of comma**

python
x[[1, 3], ]
--> yields a 2x5x5 Tensor [x[1], x[3]]


Higher order gradients

Now you can evaluate higher order differentials in PyTorch. For example, you can compute Hessian-Vector products, penalize the norm of the gradients of your model, implement Unrolled GANs and Improved WGANs, etc.

In the `0.2` release, we've enabled the ability to compute higher order gradients for all of `torch.XXX` functions and the most popular `nn`layers. The rest will be covered in the next release.

Here's a short example that penalizes the norm of the weight gradients of a Resnet-18 model, so that the volume of weights is slow-changing.

python
import torch
from torchvision.models import resnet18
from torch.autograd import Variable

model = resnet18().cuda()

dummy inputs for the example
input = Variable(torch.randn(2,3,224,224).cuda(), requires_grad=True)
target = Variable(torch.zeros(2).long().cuda())

as usual
output = model(input)
loss = torch.nn.functional.nll_loss(output, target)

grad_params = torch.autograd.grad(loss, model.parameters(), create_graph=True)
torch.autograd.grad does not accumuate the gradients into the .grad attributes
It instead returns the gradients as Variable tuples.

now compute the 2-norm of the grad_params
grad_norm = 0
for grad in grad_params:
grad_norm += grad.pow(2).sum()
grad_norm = grad_norm.sqrt()

take the gradients wrt grad_norm. backward() will accumulate
the gradients into the .grad attributes
grad_norm.backward()

do an optimization step
optimizer.step()


We see two new concepts here:

1. [torch.autograd.grad](http://pytorch.org/docs/master/autograd.htmltorch.autograd.grad) is a function that takes in [outputs, list of inputs (for which you want gradients)], and returns the gradients wrt. these inputs as a tuple, rather than accumulating the gradients into the `.grad` attributes. This is useful if you want to further operate on the gradients.
2. You can operate on the gradients, and call `backward()` on them.

The list of `nn` layers that support higher order gradients are:
- `AvgPool*d`, `BatchNorm*d`, `Conv*d`, `MaxPool1d,2d`, `Linear`, `Bilinear`
- `pad`, `ConstantPad2d`, `ZeroPad2d`, `LPPool2d`,  `PixelShuffle`
- `ReLU6`, `LeakyReLU`, `PReLU`, `Tanh`, `Tanhshrink`, `Threshold`, `Sigmoid`, `HardTanh`, `ELU`, `Softsign`, `SeLU`
- `L1Loss`, `NLLLoss`, `PoissonNLLLoss`, `LogSoftmax`, `Softmax2d`
The rest will be enabled in the next release.

To enable higher order gradients, we've introduced a new style of writing `autograd.Function` (the current/old style of writing functions is fully backward compatible). [You can read more about the new style of functions here](http://pytorch.org/docs/0.2.0/notes/extending.html).

Most of you dont write your own `autograd.Function`s, they are low-level primitives that introduce
new operations to the autograd engine, where you specify the forward and backward calls.

Distributed PyTorch

We introduce the [torch.distributed](http://pytorch.org/docs/0.2.0/distributed.html) package that allows you to exchange Tensors among multiple machines. Using this package, you can scale your network training over multiple machines and larger mini-batches. For example, you are given the primitives to implement [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677).

The `distributed` package follows an MPI-style programming model. This means that there are functions provided to you such as `send`, `recv`, `all_reduce` that will exchange Tensors among nodes (machines).

For each of the machines to first identify each other and assign unique numbers to each other (ranks), we provide simple initialization methods:
- shared file system (requires that all processes can access a single file system)
- IP multicast (requires that all processes are in the same network)
- environment variable (requires you to manually assign ranks and know an address of a node reachable from all processes)

Our package documentation contains more details on initialization and available backends, but here's an example of initializing using a multicast address:

python
import torch.distributed as dist

dist.init_process_group(backend='tcp',
init_method='tcp://[ff15:1e18:5d4c:4cf0:d02d:b659:53ba:b0a7]:23456',
world_size=4)

print('Hello from process {} (out of {})!'.format(
dist.get_rank(), dist.get_world_size()))


This would print `Hello from process 2 (out of 4)`on the 3rd machine.

World size is the number of processes that will participate in the job. Each will be assigned a rank, which is a number between 0 and world_size - 1, unique within this job. It will serve as a process identifier and will be used instead of an address to, for example, specify to which process should a tensor be sent.

Here's a snippet that shows how simple point-to-point communication can be performed:

python
All processes (receiving ones too!) need to have tensors of appropriate
size preallocated.
x = torch.Tensor(10)
if dist.get_rank() == 0:
x.normal_()
Send x to process with rank 1
dist.send(x, dst=1)
else:   rank == 1
Receive data from process with rank 0 and save result in x
dist.recv(x, src=0)


Asynchronous p2p functions (`isend`, `irecv`) are available too.

However, some communication patterns appear so often that more efficient collective calls have been developed. They typically engage the whole process group and are much faster than naive algorithms using `send`/`recv`. One example is `all_reduce`:

python
x = torch.Tensor([dist.get_rank()])
Add tensors from all processes such that they all receive the result.
x is an input and output to this operation.
dist.all_reduce(x)


The distributed package is fairly low-level, so that it allows to implement more advanced algorithms and tailor the code to very specific purposes, but data-parallel training is such a common one that we have created high-level helpers for it.

Hence, we've introduced `DistributedDataParallel`, which is meant to be a nearly drop-in replacement for nn.DataParallel.
Here's a code snippet demonstrating changes necessary to add it to existing training code:

python
Wrap model in DistributedDataParallel (CUDA only for the moment)
model = torch.nn.parallel.DistributedDataParallel(model.cuda())

Use a DistributedSampler to restrict each process to a distinct subset
of the dataset.
train_dataset = ...
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, num_workers=args.workers,
pin_memory=True, sampler=train_sampler)

for epoch in range(args.num_epochs):
Use .set_epoch() method to reshuffle the dataset partition at every iteration
train_sampler.set_epoch(epoch)
training loop
...


You can see a fuller [Imagenet training example here](https://github.com/pytorch/examples/tree/master/imagenet)

New nn layers: SpatialTransformers, WeightNorm, EmbeddingBag, etc.

New features
- [forward_pre_hook](http://pytorch.org/docs/master/nn.htmltorch.nn.Module.register_forward_pre_hook) is introduced to execute user-specified closures right before a forward function is called.
- Convenient access to non-leaf gradients:
Currently, to access and inspect gradients of intermediate values, we have to use `hooks`. This is not convenient for doing simple inspections. Hence, we introduce `retain_grad`. It is best explained via an example:

python
input = Variable(torch.rand(1, 3), requires_grad=True)
h1 = input * 3
out = (h1 * h1).sum()

h1.retain_grad()
out.backward()

print(h1.grad)
without calling retain_grad(), h1.grad is None

- DataParallel now supports dicts as inputs

New Layers

- Spatial Transformer Networks via `F.grid_sample` and `F.affine_grid`
- `nn.SeLU` and `nn.AlphaDropout` are introduced, from the paper: [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515)
- `nn.GLU` (Gated Linear Unit) is introduced from the paper [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122)
- [Weight Normalization](https://arxiv.org/abs/1602.07868) is now implemented via [torch.utils.weight_norm](http://pytorch.org/docs/master/nn.htmltorch.nn.utils.weight_norm).
- You can now ignore specific target indices while computing `cross_entropy_loss` and `nll_loss` using the `ignore_index` argument. This is a cheap and useful way of implementing masking, where you can have a `mask` index that is ignored in computing the loss.
- `F.normalize` implements dimension-wise renormalization
- `F.upsample` and `nn.Upsample` consolidate multiple Upsampling layers into one function. It implements 2d and 3d bilinear/trilinear/nearest upsampling.
- `nn.EmbeddingBag`: When build bag-of-words models, doing an `Embedding` followed by `Sum` or `Mean` is common. For variable length sequences, computing bags of embeddings involves masking. We provide a singe `nn.EmbeddingBag` which is much more efficent and faster to compute bags of embeddings, especially for variable length sequences.
- Numerically stable Binary Cross-Entropy loss via `bce_with_logits`
- A negative log-likelihood loss with Poisson distribution of the target via `PoissonNLLLoss`
- `cosine_similarity`: Returns cosine similarity between x1 and x2, computed along dim.

training utilities

*Learning Rate Schedulers:* [torch.optim.lr_scheduler](http://pytorch.org/docs/master/optim.htmlhow-to-adjust-learning-rate) provides several dumb and smart methods to adjust the current learning rate. They are quite convenient while experimenting, giving a proxy for what you as the user would likely want to do.

There are various strategies provided, which can be used depending on the appropriate situation, more can be read in the [package docs](http://pytorch.org/docs/master/optim.htmlhow-to-adjust-learning-rate):
- ReduceLROnPlateau, LambdaLR, StepLR, MultiStepLR, ExponentialLR


*`ConcatDataset`* that is a convenient dataset meta-class that can merge and concatenate two individual datasets.

New in torch and autograd

- All reduce functions such as `sum` and `mean`now default to squeezing the reduced dimension. For example `torch.sum(torch.randn(10, 20), 0)` returns a 1D Tensor.
- `x.shape`, similar to numpy. A convenience `property` that is equivalent to `x.size()`
- `torch.matmul`, similar to np.matmul
- bitwise and, or, xor, lshift, rshift
- autograd support for `inverse`, `gesv`, `cumprod`, `atan2`
- unbiased `var` and `std` now available via keyword argument option
- `torch.scatter_add` - torch.scatter, except when duplicate indices are encountered, the values are summed.
- torch.median behaves similar to torch.sum when no arguments are given, i.e. it reduces all the dimensions and returns a single median value of the flattened Tensor.
- masked_copy_ has been renamed to masked_scatter_ (with deprecation on masked_copy_)
- torch.manual_seed now seeds all CUDA devices as well
- You can now specify the random number generator object via keyword arguments `torch.rand(1000, generator=gen)`

Bug-fixes and small improvements

- Now we emit an error when a Variable is converted to a bool. For example:


b = Variable(torch.zeros(1))
if b[0]:  errors now


- Fix correctness bugs in qr decomposition on CUDA.
- Support for IBM PowerPC64 platform
- Check that the CuDNN version at compile-time is the same version at run-time.
- Improve error message in CUDA forked subprocess
- Faster transposed-copy on CPU
- Improve error messages in InstanceNorm
- Add more argument checking for various routines, especially BatchNorm and Convolution routines.
- Better error messages around shape reporting across the CPU backend.
- Support more than 8 GPUs per machine (work-around a CUDA p2p restriction)
- Improve error message when accessing attributes that don't exist
- t() of Variable consistent with Tensor
- prevent divide-by-zero when dropout p=1
- fix sharing of CUDA tensors on non-current devices
- when BN epsilon < allowed CuDNN value, fallback to THNN
- Fix thread-trashing when using different number of threads for MKL and OMP
- improve memory usage when using CuDNN RNN
- Fix ZeroPad2d backwards with negative padding
- add dummy tensor.data property, to provide interpretable error message to users
- Fix in-place division for Python3
- Raise error when call from_numpy on 0-dim array
- Empty Tensors dont error out when shared across multiprocessing
- fix baddbmm for expanded tensors
- Let parallel_apply accept arbitrary inputs
- keyword arguments in Tensor and Variable are now consistent
- fix torch.inverse when Magma is not available
- Add logical not operator for ByteTensor
- add device asserts in scatter/gather kernels

Important Breakages and Workarounds

As you've read, we've introduced two important changes that are not
backward compatible:
- Numpy-style Broadcasting
- Reduction functions such as `sum(1)` now default to `keepdim=False`

We provide different levels of Python warnings that you can enable to alert you if you are using deprecated behavior or if the behavior of your code has changed.

tl;dr
Here is a code snippet that you can add to the top of your scripts.
Adding this code will generate warnings highlighting incompatible code.

Fix your code to no longer generate warnings.

python
insert this to the top of your scripts (usually main.py)
import sys, warnings, traceback, torch
def warn_with_traceback(message, category, filename, lineno, file=None, line=None):
sys.stderr.write(warnings.formatwarning(message, category, filename, lineno, line))
traceback.print_stack(sys._getframe(2))
warnings.showwarning = warn_with_traceback; warnings.simplefilter('always', UserWarning);
torch.utils.backcompat.broadcast_warning.enabled = True
torch.utils.backcompat.keepdim_warning.enabled = True

Once all warnings disappear, you can remove the code snippet.

More elaborately

Now, let us see the three incompatible changes with examples.

Using the (now deprecated) 1-dimensional view pointwise function

Prior versions of PyTorch allowed certain pointwise functions to execute on tensors with different shapes, as long as the number of elements in each tensor was equal.  The pointwise operation would then be carried out by viewing each tensor as 1-dimensional. PyTorch now supports broadcasting. The “1-dimensional” pointwise behavior is considered deprecated and will generate a Python warning in cases where tensors are not broadcastable, but have the same number of elements.

For example:

python
>>> torch.add(torch.ones(4), torch.ones(2,2))
__main__:1: UserWarning: self and other not broadcastable, but have the same
number of elements.  Falling back to deprecated pointwise behavior.
2
2
2
2
[torch.FloatTensor of size 4]


Broadcasting in code where it didn't happen before
The introduction of broadcasting can cause backwards incompatible changes in the case where two tensors do not have the same shape,
but are broadcastable and have the same number of elements.

For example:

python
>>> torch.add(torch.ones(4,1), torch.randn(4))


would previously produce a Tensor with size: `torch.Size([4,1])`,
but now produces a Tensor with size: `torch.Size([4,4])`.

In order to help identify cases in your code where backwards incompatibilities introduced by broadcasting may exist, you may set `torch.utils.backcompat.broadcast_warning.enabled` to `True`, which will generate a python warning in such cases.

For Example:

python
>>> torch.utils.backcompat.broadcast_warning.enabled=True
>>> torch.add(torch.ones(4,1), torch.ones(4))
__main__:1: UserWarning: self and other do not have the same shape, but are broadcastable, and have the same number of elements.

Note that this setting can trigger warnings for valid uses of broadcasting (including in library code), so you probably want to turn this warning off after migrating your code.

KeepDim=False for Reduction Functions

To get a warning when using a dimensional reduction function with the default keepdim argument, set `torch.utils.backcompat.keepdim_warning.enabled` to `True`.  For example:

python
>>> torch.sum(torch.ones(2,3), 1)
__main__:1: UserWarning: backwards compatibility: call to "sum" uses default value for keepdim which has changed default to False.  Consider passing as kwarg.
3
3
[torch.FloatTensor of size 2]


As with `torch.utils.backcompat.broadcast_warning.enabled`, this warning can trigger from valid code, so you most likely want to disable this warning after migrating your code.

Note also that using `keepdim=False` can cause your existing code to "just work" with broadcasting.  For example:

python
behavior with (old) keepdim=True, causes accidental broadcast
>>> torch.add(torch.ones(4), torch.ones(4,4).sum(dim=1, keepdim=True))
5  5  5  5
5  5  5  5
5  5  5  5
5  5  5  5
[torch.FloatTensor of size 4x4]

new behavior with keepdim=False is equivalent to non-broadcasted result
>>> torch.add(torch.ones(4), torch.ones(4,4).sum(dim=1, keepdim=False))
5
5
5
5
[torch.FloatTensor of size 4]

0.1.12

API Changes
-----------
- `torch.range` is deprecated in favor of `torch.arange` which is consistent with numpy and python range.
- On sparse Tensors, `contiguous` is renamed to `coalesce` and `coalesce` is now made out-of-place.
(a reminder that Sparse API is still experimental and evolving, so we dont provide backward-compability).

New Features
------------

New layers and functions
- `torch.topk` is now supported for all CUDA types, not just `torch.cuda.FloatTensor`.
- Added a three-way ranking loss: [nn.TripletMarginLoss](http://pytorch.org/docs/nn.htmltripletmarginloss)
- Added per-instance normalization layers: [nn.InstanceNorm1d](http://pytorch.org/docs/nn.htmlinstancenorm1d), [nn.InstanceNorm2d](http://pytorch.org/docs/nn.htmlinstancenorm2d), [nn.InstanceNorm3d](http://pytorch.org/docs/nn.htmlinstancenorm3d)
Each channel is treated as an instance to normalize, and mean-subtraction and std-division is done. This is useful when dealing with larger images and smaller mini-batches where BatchNorm like effects are desired.
- `nn.ZeroPad2d` and `nn.ConstantPad2d` are added.
- `nn.Bilinear` is added, which computes `Y = X1 * W * X2 + b`

Negative dimension support for all functions
Every single function that took a dimension argument will also allow taking negative dimensions.

A negative dimension will index the tensor from the last dimension.

For example:


x = torch.randn(10, 20, 30)
y = torch.mean(x, dim = -1)


Here, since `x` has 3 dimensions, and `dim = -1`, the last dimension, i.e. `dim=3` is picked for taking a mean.

The functions with dimension arguments are:

narrow, transpose, size, cat, chunk, gather, index_select, split, squeeze,
stack, unbind, unsqueeze, cumprod, cumsum, mean, median, mode, norm, prod, std,
sum, var, kthvalue, max, min, sort, topk, renorm,
index_add, index_copy, index_fill, scatter, select, unfold


CUDA support for Sparse Tensors, faster CPU sparse

Now a part of the `torch.sparse` API is also supported for `torch.cuda.sparse.*Tensor`.

Functions that are supported on CUDA are:

sparse_mask, to_dense, coalesce, transpose, spaddmm
spcadd, mul, div, cadd, csub, cmul


`nn.Embedding` now supports sparse even on CUDA (with the `sparse=True` flag) leveraging these sparse functions.

A new hybrid matrix-multiply `hspmm` operation that multiplies a sparse matrix with a dense matrix and returns a matrix in the form of a hybrid tensor (i.e. 1 sparse dimension, 1 dense dimension).

Several of the CPU sparse functions have more efficient implementations.

In a quickly hacked up Embedding classifier training script by martinraison we see CUDA sparse performing as well as CUDA dense:
https://gist.github.com/martinraison/1e7c18c6f6eda87f1cb4995b0e6a22a5

Table times of seconds / batch

_      | CPU  | CUDA
-------|------|------
Dense  | 10   | 0.86

0.1.11

Minor API Changes

- in `optim.Adamax`, the default learning rate and epsilon have been made
consistent with Lasagne, Keras and TF.
- Previous: `(lr=1e-2, eps=1e-38)`
- Current : `(lr=2e-3, eps=1e-8)`
- **Make `random_` range exclusive** (it used to be exclusive when only the upper bound was specified, and inclusive when both were given).
- `torch.cat` now **disallows catting along inexistent dimensions**
(to make it consistent with numpy and Variable cat)
- `torch.utils.clip_grad_norm` now returns the total norm (say, for logging purposes).

Performance Improvements
- Reduce DataParallel overhead on >4 GPUs
- Improve broadcast/reduce performance by coalescing tensors
- `nn.Embedding`'s backward performance increased for batch sizes > 1024

New Features
**torch**
- Batch triangular factorization and solves have been interfaced (CPU and GPU) and
are available under `torch.btrifact` and `torch.btrisolve`. [See documentation
for usage](http://pytorch.org/docs/torch.htmltorch.btrifact)
- All RNG functions now have `generator` specifiable via a keyword argument
- `torch.mode` is now supported on the GPU via a high-performance kernel.

**autograd, nn and optim**
- CuDNN v6 integrated:
- Faster Dilated Convolutions (and less memory hungry)
- 1D FFT-based Convolutions
- Significant performance improvement for Softmax layers
- Speedups across many functions
- Improved CuDNN error messages
- We will integrate persistent RNNs in the next release
- `torch.trace`, `torch.cumsum`, `torch.cross` are now implemented in autograd
- `nll_loss` now supports Spatial inputs (i.e. 4d inputs BCHW) and computes
channel-wise cross-entropy.
- `nn.PReLU` now supports all dimensional Tensors, not just 1d and 2d.
- add `nn.PairwiseDistance` and `F.pairwise_distance` that compute batchwise
pairwise distance between two vectors.
- Adaptive Max and Average Pooling added for 1d, 2d inputs via
`nn.AdaptiveMaxPooling1d`, `nn.AdaptiveAvgPooling2d`, etc.
- RMSProp now has `momentum` and a `centered` option. If `centered` is True,
the gradient is normalized by an estimation of it's variance. (Graves 2013)

**utils**
- `WeightedRandomSampler` has been added as a custom sampler for the DataLoader.
It samples elements from `[0,..,len(weights)-1]` with the given probabilities
and is useful to sample from unbalanced datasets where some classes have
many more samples than others. [See the docs](http://pytorch.org/docs/data.html)
for more details
- DataLoader now allows returning of numpy arrays


Bug Fixes
*torch*
- When loading GPU checkpoints from disk with storage location remapping,
`torch.cuda` was still attempted to be imported. This is now fixed, and
you can load GPU checkpoints on machines with no GPUs or CUDA.
- Work around an OSX `fread` bug where loading checkpoints of each Tensor > 1GB
would give an error.
- Fixed a in `torch.cat` where it now does not
accept `reverse` (it's not a `PySequence`)
For example:

l = [Variable(torch.ones(1,3)*i) for i in range(3)]
torch.cat(reversed(l), 0)  errors now

- Fix a memory leak in `torch.from_numpy`
- GPU svd returned a larger matrix than expected in the `some` mode.
This is now fixed to match CPU behavior.
- Fix a bug in CPU max that was introduced in the previous release.

**autograd, nn and optim**
- Reassigning attributes in modules correctly works now.
This example used to not work correctly, `l.a` always remained `None`.
Now it works as one would expect:
python
l = nn.Linear(10, 20)
l.a = None
l.a = nn.Parameter(torch.randn(2))
l.a is correctly updated

- Fix bug where adding a hook could replace an existing hook
- Fix `nn.Embedding` and `nn.CosineEmbeddingLoss` to work without
error on non-float CUDA (half, double)
- Fix a bug in `nn.Embedding` when the `max_norm` option was used. Some of the
indices were not respecting `max_norm` and this is fixed.
- Fix corner-case in `Variable`'s SetItem where gradient was of incorrect shape.
`x.grad` used to be of shape 20, because `y[1]`` was of shape 20.

x = Variable(torch.randn(1, 20), requires_grad=True)
y = Variable(torch.zeros(10, 20))
y[1] = x

- Fix a segfault in Conv1d when input doesn't require grad.
- Assertions in `pack_padded_sequence` to check that sequence is of length > 0
- `torch.prod`'s autograd forumlae were incorrect if the Tensor had 0. This
formula has been fixed.
- Variable `expand` and `expand_as` had incorrect dimension inference when using
broadcasting semantics. The formula has been fixed in these cases.
- Fix a size mismatch in `CosineEmbeddingLoss`. [See this issue](https://github.com/pytorch/pytorch/issues/1058) for more details.
- Fixed a bug in LBFGS that caused it to use uninitialized locals. [See issue](https://github.com/pytorch/pytorch/issues/1039)
- Add assertions for negative padding in `nn.Conv*` functions.
- Fix the sttdev gradient formula for the stochastic function `normal`.

**other**
- Fix issue when returning strings from the DataLoader when `pin_memory=True`
- Binaries no longer dependent on needing a `libcudart.so` at runtime.

0.1.10

New Features
------------

Indexing and Broadcasting Improvements

- Add broadcasting semantics to `expand` / `expand_as`.
- Previously, `expand` had no ability to add new dimensions, and `unsqueeze`
had to be used to first create singleton dimensions before expansion.
- Now, singleton dimensions are automatically prepended to the shape of
the tensor if a matching dimension is found.
Here's an example:
python
x = torch.rand(5)
y = torch.rand(4, 8, 5)
z = x.expand_as(y)  z is of shape (4, 8, 5)

x = torch.rand(1, 8, 1)
z.expand_as(y)  z is of shape (4, 8, 5)

- Unsqueeze dimensions using None indexing
python
a = torch.randn(10)
b = a.unsqueeze(0)
b = a[None, :]      Equivalent operations

- Indexing with steps is supported (only positive steps)
python
In [1]: a = torch.randn(10)
In [2]: a
Out[2]:

0.1338
1.0789
1.2302
-1.3343
-0.4676
1.3511
-0.4374
-1.0611
-0.1528
-1.3994
[torch.FloatTensor of size 10]

In [3]: a[0:10:3]
Out[3]:

0.1338
-1.3343
-0.4374
-1.3994
[torch.FloatTensor of size 4]


Variable-length mini-batches in Recurrent Networks
`nn.RNN`, `nn.LSTM`, `nn.GRU` now support mini-batches where sequences are of variable
lengths.
You can pass an input of type [`PackedSequence`](http://pytorch.org/docs/nn.htmlpackedsequence)
into these layers.
A `PackedSequence` holds data and a list of sequence sizes of a packed sequence batch.
For example, a `PackedSequence` will hold an input mini-batch of such sequences:

a b c d e
a b c d e f g h
a b
a b c d

Here, each input row is of variable length.

You can construct a `PackedSequence` using the provided function
[`pack_padded_sequence`](http://pytorch.org/docs/nn.htmltorch.nn.utils.rnn.pack_padded_sequence)

`pack_padded_sequence` takes a `Variable` containing padded sequences, i.e. a `Tensor`
of `T x B x *`, where `B` is the size of the mini-batch, and each input is either of
length `T` or is padded to length `T`. It also takes a list of lengths of each input.
From these, it constructs a `PackedSequence`

For example, it will take [8, 5, 4, 2] and and an input `8 x 4 x 128`
that corresponds to:

a b c d e f g h
a b c d e 0 0 0
a b c d 0 0 0 0
a b 0 0 0 0 0 0


The output of the RNN layers will also be a `PackedSequence`, which can then be inverted
back to a padded Tensor using the inverse function:
[`pad_packed_sequence`](http://pytorch.org/docs/nn.htmltorch.nn.utils.rnn.pad_packed_sequence)


Sparse Tensors (CPU)
Original goals:
- ability to propagate sparse updates in a network (e.g. for updating an embedding matrix)
- ability to efficiently compute "bag-of-words" sentence embeddings (e.g. weighted average of word embeddings)

Implemented features:
- enable backpropagation of sparse gradients without conversion to dense tensors. In most cases a runtime exception is thrown when mixing different gradient types for the same variable
- add some methods for `THSTensor`: `zero`, elementwise `add` and `mul`, scalar `mul` and `div`
- make `addcmul` method of `THTensor` compatible with sparse operands
- make `spmm` method accessible from Python as `dsmm`
- `sparse_mask` method on `THTensor`. This produces a sparse tensor from a dense tensor,
by using a sparse tensor as a mask. A value is only present in the output sparse
tensor if it also exists in the mask.
- update `optim.Adagrad` to use sparse updates when possible.
- **leave `Variable`'s gradient to `None` by default.**
This is because there is no canonical zero gradient anymore (it could be dense or
sparse, and if it is sparse we don't know how many dimensions are sparse)
- N-dimensional values for sparse tensors:
- Basically for things like applying sparse updates to embedding matrices, only the
first dimension (the one that corresponds to the word index) is sparse. The other
dimension is always dense (only whole embedding vectors are updated). An elegant
solution is to make the `values` tensor N-dimensional instead of 1-dimensional.
For an embedding matrix, the sparse gradient will have a `values` tensor of
size `nnz * embedding_size` instead of just `nnz`.

Common weight initialization methods for neural networks
By default, all `Linear` and `Conv` layers in PyTorch are initialized according to
a scheme proposed by [LeCun'98](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf).

However, there are several other commonly used initialization methods.
We now support many other methods via `torch.nn.init`.
Supported methods include:
[`uniform`, `normal`, `constant`, `xavier_uniform`, `xavier_normal`, `kaiming_uniform`,
`kaiming_normal`, `orthogonal`, `sparse`](http://pytorch.org/docs/nn.htmltorch-nn-init)

Here's an example of using these initialization methods:
python
import math
from torch import nn

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(5, 10, (3, 3))
nn.init.xavier_uniform(self.conv1.weight, gain=math.sqrt(2.0))
nn.init.constant(self.conv1.bias, 0.1)

network = Net()


Other features
- Added a gradient checker utility `torch.autograd.gradcheck` that can
be used to check your implementations. Here's a small example:
python
from torch.autograd import Variable, gradcheck
inputs = Variable(torch.randn(4, 4), requires_grad=True)
gradcheck(lambda x: 2*x.diag(), (inputs,), eps=1e-3)

- Add a [clip_grad_norm](http://pytorch.org/docs/nn.htmltorch.nn.utils.clip_grad_norm) utility to easily clip gradients via constraints on their norms.
- Document `nn.ModuleList` and `nn.ParameterList` that are immensely useful when
storing a list of modules in a `Container`
- Optimizers have backward-compatiblity for old checkpoints.
`__set_state__` and `__get_state__` introduced into optimizers.
- Add Nesterov momentum to `optim.SGD` via [`nesterov=True` kwarg](http://pytorch.org/docs/optim.htmltorch.optim.SGD)
- DataParallel supports multiple inputs and keyword args (which are also scattered)

m = nn.DataParallel(model)
Now valid
m(x, y, option=z)

See the [documentation](http://pytorch.org/docs/nn.html?highlight=dataparalleltorch.nn.DataParallel) for exact behavior.
- DataLoader's `default_collate` now also supports numpy arrays
- Added `F.pad` that supports Constant, Reflection and Replication padding in a single
interface: [http://pytorch.org/docs/nn.htmlpad](http://pytorch.org/docs/nn.htmlpad)
- `train()` now optionally supports a boolean argument. For example `model.train(False)`
will set it to `eval` mode and `model.train(True)` sets it to `train` mode.
- Added a `DataLoader` sampler: `SubsetRandomSampler`that takes a list of indices
in it's constructor and randomly samples from these indices. Useful when you
want to sample only a particular subset of your dataset.
- Transpose supports negative dimensions. For example:
python
a = torch.randn(2, 3)
b = a.transpose(0, 1)    both are equivalent
b = a.transpose(-2, -1)  both are equivalent


Performance Improvements
------------------------
- CPU Tensor backend gets faster
- Explicit AVX, AVX2 and improved SSE intrinsics to speedup copy, fill, add, mul, div
- Much improved speed for all apply and reduce operations to have better cache hits
- Added OpenMP in TH_TENSOR_APPLY* operations
- Overall, 2x to 10x+ faster on a lot of operations, closer to Numpy speeds
- Runtime dispatch of intrinsics based on CPU features (easy to ship binaries)
- Serialization Improvements
- Fixed bugs on serialization for Tensors > 2GB
- 5x to 10x faster serialization (no longer Tarring Tensors)

Bug Fixes
---------
- Multi-GPU CuDNN RNN now has separate dropout descriptors per GPU
- NLLLoss2d has proper shape checks on GPU and stable sizeAverage formulation
- LogSoftmax2d has a more stable formula
- Fix prodall (prod without dim arguments) to not average
- Return correct number of gradients from cuDNN RNN
- NLLLoss2d has support for weights
- Fix Unpooling bug for MaxPool1d
- Fix Indexing when using only an ellipsis
python
x = torch.randn(2,2,2,2)
x[...]  used to fail, fixed now.

- expose stateless methods (`torch.*`` methods) for `torch.cuda.HalfTensor`
- Prevent creation of reference cycles (and hence improve memory usage) when
leaf variables were using in-place operations.
- Fix gradient computation for the indexing operation in the case of sending in
`LongTensor`.
- Fix a reshaping bug in the grad_input of basic operations such as `+, -, *, /` etc.
This used to fail, but is fixed now:
python
x = Variable(torch.randn(4, 6), requires_grad=True)
b = Variable(torch.rand(12, 1) + 1e-2, requires_grad=True)
(x + b.mm(Variable(torch.rand(1, 2) + 1e-2))).sum().backward()

- Revert partial indexing with `LongTensor` to return to numpy-compatibility
- References to some Tensors in `BatchNorm` and `Conv` are now freed to improve
memory usage in certain situations. ResNet-152 finetuning with batch_size 16
used to consume the same amount of memory as batch 256 after this fix.
- Fix a bug where `requires_grad` was being propagated forward differently in
CPU mode and CUDA mode.
- Fix bugs in `torch.multinomial` on CUDA, where in rare cases, the sampling
lead to nonsensical values
- Allow backprop through CuDNN RNN in `eval()` mode.
- Support `np.int16` in conversion to `ShortTensor`
- Enable multithreading in MKL (was disabled previously due to a cmake bug).

Improved error messages
-----------------------
- Print a readable error message when arguments are on different GPUs
- Add better error message for conversion of CUDA tensors to numpy
- Add checks for reward type and size in StochasticFunction

0.1.9

Bug fixes:
- Major bugfix in CuDNN bindings for cases of non-contiguous grad-outputs
- also added better error checking and asserts to cudnn RNN and Conv
- Fixed serialization bugs when serializing Tensors > 2GB
- Enable and half and double THNN backends
- RNNBase and Embedding fixed to be compatible with DataParallel
- Fix bug in torch.cat for multi-GPU settings
- Support bias=False in Conv3d
- Change behavior of `detach()` to actually remove the creator (previously was just detaching compute)

Features and performance
- Refactored autograd internals into python-agnostic C++ (662)
- view, unsqeeze and squeeze moved to C for superior performance
- Allow DataParallel to have tuple inputs
- Add a `torch.__version__` string.

0.1.8

A bugfix release with some small features:

New Features
- THPP now has CUDA Tensors
- autograd functions: repeat, var, std, renorm, comparison ops added.
- Merged an initial version of THD (distributed pytorch)
- Indexing support with LongTensor indices
- Add torch.unbind
- Add `ModuleList` and `ParameterList` to store lists of modules / params in an `nn.Module`

Bug and usability fixes
- Fix a bug in FFI utils
- Fix lua-reader for SpatialConvolution
- Fix backward contiguous check in BatchNorm
- Fix travis builds
- Pep8 enforced for the entire codebase
- CuDNN RNN non-contiguous fixes
- Remove circular references in some Autograd functions
- Add CUDA asserts to various kernels for out-of-bounds checks
- Fix non-contiguous bug in torch.cat
- Fix memory leak in Unpooling

API Changes
- nn.Billinear\* -> nn.Bilinear*
- Return indices as well in autograd for `torch.sort` and `torch.topk`
- `.set_index` -> `._set_index` (made private)
- `normal` and `log_norma`l kwarg changed from `var` to `std`
- `Optimizer.state_dict` now has semantics matching `Module state_dict`

0.1.7

A bugfix release with some small features:

New Features
- LBFGS Optimizer added
- Add `state_dict` for optimizers for easy checkpointing
- Add differential upsampling modules for 2d (bilinear, nearest)

Bug and usability fixes
- Fix multi-GPU bugs in indexing
- Improve error messages for optimizer
- Fix bug in Conv1d
- Fix bug in Conv*d groups
- Add improved error messages for unsupported CuDNN codepaths
- fix bugs in CuDNN bindings
- Workaround bugs in CuDNN itself (batchnorm-backward, non-contiguous weights)
- Fix lua-reader's BatchNorm and Linear layers
- Fix some memory leaks
- Give fatal errors on Variable comparison
- Fix bug in ELU backward
- Fix index_select backward
- Fix BatchNorm backward in evaluate mode (workaround CuDNN bug)

API Changes
- Adadelta's `step_rate` is renamed to `lr`
- Adam's default learning rate the same as LuaTorch

0.1.6

Our last release (v0.1.5) was on November 14th, 2016

We finished, froze and released (v0.1.6) on Jan 21st, 2016.

A lot has happened since 0.1.5.

Summary
- PyTorch public release on 18th Jan, 2016.
- An initial Model Zoo, several common Vision models can be initialized with pretrained weights downloaded from the zoo.
- All the 100+ torch.\* functions bar 3 (topk, mode and kthvalue) are GPU-ready, and performance improvements across board for several existing ones.
- All relevant neural network modules are now CuDNN bound.
- Stochastic functions added to Autograd, for use in reinforcement learning
- A functional interface of the nn library is added
- GPU device initialization has been made lazy (improvement in CUDA initialization time on multi-GPU machines)
- Pinned memory support, and leveraging it in DataLoader
- Made error messages across board more informative, especially around shape checks
- A rich set of examples and tutorials added to pytorch/examples and pytorch/tutorials
- API Reference at pytorch.org/docs
- Multiprocessing support for CUDA (Python3 only)
- An initial version of CPU Sparse Tensors is added and used in nn.Embedding(sparse=True). More to come on this side.
- Added a lua reader to load existing .t7 files with Torch models
- Various bug-fixes.
- Allow returning of changed gradients in hooks

API Changes
- `Conv*d` and `*Pool*d` layers now take a tuple of kernel sizes/strides/padding instead of `kh`/`kw`.
- `Unpooling*` layers have a changed API
- `Variable.grad` is now a `Variable` (was a `Tensor`)
- `nn.Container` is deprecated and merged into `nn.Module`. Replace all instances of `nn.Container` in your code with `nn.Module`
- `torch.cat` changed API to take an iterable of tensors, along with a dimension (previously varargs of Tensors). Also `torch.cat`'s default dimension is changed. It's been made an inverse transform for `torch.split` and `torch.chunk`.
- `Variable.no_grad` has been renamed to `Variable.detach`
- RMSProp's initialization of gradients changed from ones to zeros (485)
- Removed `cmin`, `cmax` and `cinv` (functionality of `cmin`, `cmax` split between `max`/`min` and `clamp`; `cinv` renamed to `reciprocal`)
- `register_hook` API changed, names are removed. See: https://github.com/pytorch/pytorch/pull/446
- `torch.*(..., out=Tensor)` is adopted for output arguments

Model Zoo

A model zoo has been started with several pre-trained vision models available such as AlexNet, ResNet50, etc. The download and usage of the models is seamless with a keyword argument.

python
import torchvision.models as models
models.alexnet(pretrained=True)


The models are hosted on Amazon S3, and we look forward to more models from the community.
Basic documentation is found here:

http://pytorch.org/docs/model_zoo.html

You can find specific models listed in the README of torchvision and torchtext

Stochastic Functions in Autograd

We introduced Stochastic functions that needed to be provided with a `reward` for their backward.
This feature was inspired by [Gradient Estimation Using Stochastic Computation Graphs by Schulman et. al.](https://arxiv.org/abs/1506.05254) and is helpful to implement reinforcement learning techniques.
Documentation is here: http://pytorch.org/docs/autograd.htmltorch.autograd.Variable.reinforce
A showcase of using these nodes is in the REINFORCE example: https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.pyL70

Functional interface to nn

PyTorch neural networks have so far been modeled around `nn.Module`. However, for most simple functions such as ReLU, using this is a bit cumbersome.
To simplify this, we've introduced a functional interface to nn, and modified the tutorials to use this API where appropriate.

For example:

python
import torch.nn as nn
import torch.nn.functional as F

module style
relu = nn.ReLU()
y = relu(x)

functional style
y = F.relu(x)


The functional style is convenient when using non-parametric and non-learnable functions.

Documentation for these functions is here: http://pytorch.org/docs/nn.htmltorch-nn-functional

Faster GPU code

The initialization of the GPU backend has been made lazy. This means that it will automatically be
imported and initialized when needed (and not before-hand). Doing this has improved startup times (especially for multi-GPU systems) and reduced boilerplate code.

We've also integrated support for pinned memory, which accelerates CPU to GPU transfers for specially marked buffers. Using this, we accelerated the multiprocessing data loaders.

A rich set of examples

With the help of some of you, we've added a rich set of examples from Image Super-resolution to Neural Machine Translation.
You can explore more here: https://github.com/pytorch/examples

API Reference and Notes

We've fleshed out a full API reference that is mostly complete at docs.pytorch.org
Contributions are welcome :)

We've also added notes such has CUDA Semantics, Extending PyTorch, etc.

Multiprocessing support for CUDA

Uptil now, Tensor sharing using multiprocessing only worked for CPU Tensors.
We've now enabled Tensor sharing for CUDA tensors when using python-3.
You can read more notes here: http://pytorch.org/docs/notes/multiprocessing.html

Lua Reader

A "lua reader" has been integrated, that can load most LuaTorch .t7 files, including `nn` models.
nngraph models are not supported.

Example usage can be found here: https://discuss.pytorch.org/t/convert-import-torch-model-to-pytorch/37/2

0.1.5

What's new in Alpha-5?

Usability
- keyword arguments, improved indexing for all torch and autograd functions!
- Deterministic data loader even under multiple workers
- LAPACK bindings with full CUDA support via MAGMA
- Easier numpy2torch conversion with torch.from_numpy(x)
- Lot more documentation
- fully covered neural networks
- fully covered optim package
- partly covered torch documentation
- Tutorials:
- Increased depth, length and clarity of the tutorials

New Features and modules
- PyTorch Vision: a package to hold common dataloaders, transforms and utilities for images and videos
- Data loaders for: COCO (captioning and detection), Imagenet, CIFAR10/100, LSUN etc.
- Image Transforms: commonly used data augmentation transforms such as random-cropping, normalization
- Unit-tested
- Utilities: saving Tensors as images, creating grids of images from a mini-batch of tensors.
- Recurrent Neural Networks
- A complete and robust implementation of efficient Stacked LSTMs, RNNs, GRUs (bidirectional and otherwise)
- Seamlessly integrated CuDNN is used whenever possible for maximum performance
- A complete word-level language modeling example on the PennTreeBank dataset
- verification that the perplexity matches the reference Torch implementation
- an example of Generative Adversarial Networks:
- DCGAN example in < 250 lines (includes everything)
- Verified the results to match reference implementations
- Multi-GPU ready!
- A redesigned Optim package with the following optimization methods:
- SGD, AdaDelta, Adagrad, Adam, AdaMax, Averaged SGD, RProp, RMSProp
- Fully unit tested against their reference implementations
- Fully documented
- Improved Multi-GPU performance (and more is coming)
- Integrated NVIDIA NCCL for maximizing multi-GPU communication performance

Plans for Alpha-6
- docstrings support and finishing torch and autograd documentation
- Fully verifying the convergence of ResNet / Imagenet training
- More examples around:
- Reinforcement Learning / OpenAI Gym
- Object Detection
- Sequence to Sequence methods
- WaveNet / ByteNet
- More adversarial networks (text2image, etc.)
- More gains in performance, and fully flesh out CuDNN integration
- Half-precision training for GPUs
- A Lua-Torch model loader, and improved legacy.nn support
- Lua bridge, to call your existing lua code

Usability

Keyword arguments

All torch and autograd functions used to only support arguments in the correct order.
For example:

python

0.1.4

Some interesting stats

On Resnets

Because of our aggressive freeing and allocating resources, ResNets in PyTorch take lesser memory than torch-nn
- 4.4GB in PyTorch
- 6.5GB in Torch-nn
- 4.6GB in Torch-nn with a hacky sharing of gradinput buffers
- On 1-GPU, PyTorch speed is 10s of milliseconds faster than Torch-nn
- On 2-GPUs, PyTorch is the same speed as Torch-nn
- On 4-GPUs, PyTorch is about 10 to 20% slower, but it's because we have just finished implementing Multi-GPU and we will be plugging this perf difference in the next week.

FFI-based C extension

On a small benchmark of adding a constant to a 5x5 tensor at 1000 calls:
- LuaJIT FFI: 0.001 seconds
- Lua 5.2 FFI: 0.003 seconds
- PyTorch CFFI: 0.003 seconds
- Raw Python CFFI / CTypes: 0.001 seconds

What's new in Alpha-4?

Usability
- Two Tutorials, now located at: [https://github.com/pytorch/tutorials](https://github.com/pytorch/tutorials)
- Tutorial 1: [Introduction to PyTorch for former Torchies](https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb)
- Tutorial 2: [Write your own C code that interfaces into PyTorch via FFI](https://github.com/pytorch/tutorials/blob/master/Creating%20Extensions%20using%20FFI.md)
- Examples:
- A full Imagenet / ResNet example is now located at: https://github.com/pytorch/examples/tree/master/imagenet
- it works! :)
- Has performant Multi-GPU support
- More improved error messages and shape checks across the board in pytorch, TH, THNN
- `torch.*` functions now dont use `CamelCase`, but use `underscore_case`. Example: `torch.index_add_`

New Features and modules
- Multi-GPU primitives
- A custom CUDA allocator to maximize autograd performance (backported to Torch too)
- More autograd functions. Now it's almost API complete for all differentiable `torch.*` functions.
- CuDNN Integration
- Multiprocess DataLoader in `torch.utils` (used in the imagenet example)
- Extensions API to interface to your C code simply via FFI
- [An example extension is provided here](https://github.com/pytorch/extension-ffi)

Plans for Alpha-5
- Revamping and rethinking the Checkpointing API
- Revamping the Optim API to support things like per-layer learning rates and optimizing non-weights (like in NeuralStyle)
- RNN Examples, initially for PennTreeBank language modeling
- Better RNN support in general, improved error messages, multi-GPU etc.
- NCCL Integration for improved multi-GPU performance (already implemented at https://github.com/pytorch/pytorch/pull/78 )
- Documentation / Reference manual for `torch.*` and `autograd`

Usability

Tutorials

We've added two tutorials to get you all started.
- Tutorial 1: [Introduction to PyTorch for former Torchies](https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb)
- In this tutorial we cover the torch, autograd and nn packages from a perspective of former Torch users.
- Going through this tutorial should get you started. Let us know how we can improve it.
- Tutorial 2: [Write your own C code that interfaces into PyTorch via FFI](https://github.com/pytorch/tutorials/blob/master/Creating%20Extensions%20using%20FFI.md)
- In this tutorial, we showcase how you can call your own C code that takes torch tensors as inputs / outputs in a seamless way via FFI
- The tutorial showcases how you can write your own neural network Module that calls in C implementations

Examples

We've added a full imagenet example with ResNets that should be really suited towards “learning by example”.
It is located here: [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet)
The data for the example has to be preprocessed for now in the same way as is specified in [fb.resnet.torch](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.mddownload-the-imagenet-dataset)

The example has Multi-GPU support in a DataParallel fashion.

More improved error messages

We've gone through the TH and THNN C libraries and added much more intuitive error messages that report the mismatched shapes. We will continue to make improvements on this front.
If you have any unintuitive error messages that you encounter, please open an issue at https://github.com/pytorch/pytorch/issues

For example:

Old error message:


bad argument 2 to 'v' (3D or 4D (batch mode) tensor expected for input


New error message:


bad argument 2 to 'v' (3D or 4D (batch mode) tensor expected for input, but got: [100 x 100]


No more CamelCase for functions

All torch functions have been renamed from CamelCase to underscore_case.
indexAdd → index_add_
getRNGState → get_rng_state
etc.

New Features and modules

Multi-GPU primitives
- We've added efficient multi-GPU support in general for neural networks. Instead of building magic blocks that do opaque parallelization for you, we've broken them down into easy to use collectives.
- A pattern like DataParallel is implemented in terms of:
- replicate, scatter, gather, parallel_apply
- These are reusable collectives for implementing other multi-gpu patterns as well
- https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/__init__.pyL24-L38

Performance

With Multi-GPU, we naturally overlap data transfers with compute across the whole graph. This makes multi-GPU much more efficient, and is done in a way that does not interfere with the imperativeness / error reporting.

Another important note is that we now dispatch parallel modules via python threads, which makes the CUDA kernel launches in a breadth-first fashion, getting rid of obvious kernel launch latency bottlenecks.

Custom CUDA allocator to maximize autograd performance

In Torch, we had to write nn modules in a careful way to avoid cuda synchronization points which were a multi-GPU bottleneck and general performance bottleneck. This affected neural networks and autograd sometimes up to 2x in performance penalty.

In PyTorch (and Torch), Sam Gross has written a new Caching CUDA allocator that avoids cuda synchronization points while being really suited towards Tensor use-cases where we typically do short-term and long-term allocations of memory of the same tensor sizes.

This unblocks us from a lot of performance issues.

More autograd functions

Now the torch.\* API should be pretty much be ready for full autograd support (short of 3 functions).
Autograd has been enabled for all the functions with the exception of non-differentiable functions like torch.eq.

CuDNN Integration

We now fully integrate and support CuDNN version 5.1.3, and it is shipped in the binaries (just like CUDA), so you never have to worry about manually downloading and installing it from the NVIDIA website.

Generic Multiprocess DataLoader

We've added a flexible Data Loader that supports multiple data loading workers. This enables a lot of use-cases, and is first used in our Imagenet example.

C Extensions API

We added an easy to use extensions API and an example extension here:
https://github.com/pytorch/extension-ffi

You can call your C functions (that have TH*Tensor inputs / outputs and other fundamental types in the function signature) without writing any manual Python bindings.

One question you might have is, what kind of call overhead these auto-generated FFI bindings have. The answer is “None”, as seen in the numbers in the beginning of the note.

The example extension also covers how you can define your autograd-ready nn module that calls your C function.

0.1.3

What's new?

Usability
- conda binaries for all Linux (as old as RHEL 6 and  Ubuntu 12.04) (we are working on OSX and pip binaries).
- Now installing pytorch is as simple as:
- `conda install pytorch -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith`
- it links against MKL, ships CUDA and MAGMA runtime with it, and justworks
- Human-ready error messages
- Started working on documentation and an API Reference
- pytorch.org/api/0.1.3/en/ (http://pytorch.org/api/0.1.3/en/)
- Continuous integration with GPU support. Never have a broken master again
- https://build.pytorch.org (https://build.pytorch.org/)

New Features and modules
- The (new) neural network module now has 75% of the modules implemented (71 out of 93), and we are powering through the rest
- most of the modules in old-nn have been removed because we do not need Containers and many modules such as CAddTable are covered by Autograd
- autograd now supports all torch functions present in twitter-autograd and a lot more....
- Added Trainer and Dataset abstractions (like in TorchNet)

Plans for Alpha-4
- cudnn integration (and CUDA allocator).
- We have this implemented but are iterating over design https://github.com/pytorch/pytorch/pull/36
- Multi-GPU support in nn
- examples, examples, examples
- we will work on having examples across all domains (vision, NLP, RL, etc.)

Usability

Conda binaries for Linux

PyTorch will be shipped on Linux and OSX (and likely Windows) from the day-1, and we want it to be as simple and intuitive install process.
We have versioned binaries, that do not require the user to install anything (except an NVIDIA Driver if you intend to use the GPU. Not even CUDA is a dependency).

For now, to get started on Linux:

bash
conda install pytorch -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith


We have built OSX binaries, but have some small bugs on OSX, and we'll fix the issues there over the week.
We are working on “pip install” for non Anaconda python installs.

Human-ready error messages

We've gone through how we report type errors and dispatch errors and make it easy for the user to understand what they did wrong. See this small example:

python
In [1]: import torch
In [2]: x = torch.FloatTensor(10)
In [3]: x.addmm(torch.ones(1), 1, 'str')
ValueError                                Traceback (most recent call last)
<ipython-input-3-90eb50ea2e35> in <module>()
----> 1 x.addmm(torch.ones(1), 1, 'str')

ValueError: addmm recieved an invalid combination of argument types - got (torch.DoubleTensor, int, str), but expected one of:
* (torch.FloatTensor mat1, torch.FloatTensor mat2)
* (float beta, torch.FloatTensor mat1, torch.FloatTensor mat2)
* (float beta, float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)


Continuous Builds with GPU support
- All pushes to the _master_ branch are fully built and unit tested
- All Pull Requests are fully built and unit tested
- On Titan-X GPUs in the NIMBIX cloud
- One can go checkout the builds details at: https://build.pytorch.org (https://build.pytorch.org/)

New Features and modules

Neural Network Modules
- Added fully functional and fully unit-tested nn modules and criterions for pretty much everything one would need for their current workflows.
- We have about 25% of the modules missing (mostly exotic and lightly used ones) but will get to those in the coming few days.
- nn modules have been renamed to be simplified in their naming. For example:
- SpatialConvolution → conv2d
- The new naming can be referenced at pytorch.org/api/0.1.3/en/ (http://pytorch.org/api/0.1.3/en/) or via autocomplete.
- Full unit-test coverage for all implemented functions

Autograd
- We've added autograd support for almost all the torch functions (and operators like +, - etc.)
- We have all the functions implemented that are presented in twitter-autograd, and we have many more.
- At this point we have about 75 to 80% of them covered (ball park).
- Full unit-test coverage for all implemented functions

Trainer & Dataset classes

Trainer

We've added a TorchNet style _Trainer_ class that provides a convenient abstraction

python
trainer = Trainer(model, criterion, optimizer, dataset)
trainer.register_plugin(ProgressMonitor())
trainer.register_plugin(LossMonitor())
trainer.register_plugin(AccuracyMonitor())
trainer.register_plugin(Logger(['progress', 'accuracy', 'loss'], interval=(5, 'iterations')))
trainer.run(epochs=5)


progress: 180/60000 (0.30%)     accuracy: 0.00% (3.24%)         loss: 2.3051 (2.2116)
progress: 280/60000 (0.47%)     accuracy: 5.00% (4.84%)         loss: 2.3045 (2.2891)
progress: 380/60000 (0.63%)     accuracy: 25.00% (13.04%)       loss: 2.2974 (2.2992)


Dataset

The data loading is implemented using three abstractions:
- _DataSource_ - a simple object that defines indexing and checking length. Indexing returns a tuple of (sample, label)
- _Sampler_ - an object that defines the data ordering. it has to be iterable, and it’s iterator should return a string of indices in [0; len(data_source)-1] interval. The end of the iterator indicates completing the epoch.
- _Dataset_ - an object which wraps a DataSource and a Sampler. Defines all the data loading logic (e.g. all the multiprocessing code).

The Datsets will accept a list of transforms (like image augmentation) that are given to it, which will run on the data before given out.

0.1.2

What's new?

We've
- built seamless support for multiprocessing with Tensor sharing
- changed the API of the optim engine
- added a complete Hook system for nn and autograd
- added in-place ops to autograd and more neural network modules to nn

Multiprocessing with Tensor sharing

In Torch, or in general, one uses "threads" to build parallel data loaders, as well as to do Hogwild training.
Threads are powerful, as one can share Tensors between threads.
This allows you to:
- transfer data between threads with efficiently with zero memory copy and serialization overhead.
- share tensors among threads for parameter sharing models

Sharing Tensors among threads is very useful when you do Hogwild training, i.e. if you want to train several models in parallel, but want to share their underlying parameters.
This is often used in non ConvNets, like training word embeddings, RL-for-games, etc.

With Python, one cannot use threads because of a few technical issues.
Python has what is called [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock), which does not allow threads to concurrently execute python code.

Hence, the most pythonic way to use multiple CPU cores is [multiprocessing](http://docs.python.org/2/library/multiprocessing.html)

We made PyTorch to seamlessly integrate with python multiprocessing.
This involved solving some complex technical problems to make this an air-tight solution, and more can be read [in this in-depth technical discussion](http://github.com/pytorch/pytorch/wiki/Multiprocessing-Technical-Notes).

What this means for you as the end-user is that you can simply use multiprocessing in this way:

python
loaders.py
Functions from this file run in the workers

def fill(queue):
while True:
tensor = queue.get()
tensor.fill_(10)
queue.put(tensor)

def fill_pool(tensor):
tensor.fill_(10)


python
Example 1: Using multiple persistent processes and a Queue
process.py

import torch
import torch.multiprocessing as multiprocessing
from loaders import fill

torch.multiprocessing.Queue automatically moves Tensor data to shared memory
So the main process and worker share the data
queue = multiprocessing.Queue()
buffers = [torch.Tensor(2, 2) for i in range(4)]
for b in buffers:
queue.put(b)
processes = [multiprocessing.Process(target=fill, args=(queue,)).start() for i in range(10)]


python
Example 2: Using a process pool
pool.py

import torch
from torch.multiprocessing import Pool
from loaders import fill_pool

tensors = [torch.Tensor(2, 2) for i in range(100)]
pool = Pool(10)
pool.map(fill_pool, tensors)


Optim's API changes

Optimizer's step function now accepts a closure that should return a loss variable (similar to `legacy.optim`).

We've realized that to keep Optim flexible for multiple methods, like SGD with nesterov, Conjugate Gradient, LBFGS etc., we need to have the input to optim be a function that evaluates the model.
This is necessary because several optimization methods re-evaluate the function multiple times at different parameters.
To come to this necessary API change, we took into account complicated scenarios like Dynamic RNNs and complex ConvNet models with dynamic branching.

So the API now looks like this:

python
optimizer = optim.SGD(model, lr=1e-3, momentum)
input, target = ...
optimizer.step(lambda: criterion(model(input), target)) sufficient for simple models


To simplify things at the user end for simple or specific common models, we will introduce a Trainer class, that will take a (dataset, model, optim) triple and train the model. This trainer class is planned for alpha-3.

A complete Hook system for nn and autograd

Accessing intermediate values during the forward pass is straightforward, but during backward the buffers can rapidly change their content (for example: when doing in-place optimizations).

If you want to get access to the gradients at a particular Op or Layer inside your model, one uses a hook system.
Hooks can be attached to variables or to modules and are called as soon as the gradient is available:

python
Example in autograd
a, b, c = [Variable(torch.Tensor(5, 5)) for i in range(3)]

def print_norm(grad):
print(grad.norm(2))

y = b * c + a
y.register_hook(print_norm)

z = y * y - b
z.backward(torch.ones(5, 5))

Example in nn
model = ...

def inspect_forward(module, input, output):
...

model.conv2.register_forward_hook(inspect_forward)

def inspect_backward(module, grad_input, grad_output):
...

model.conv2.register_backward_hook(inspect_backward)


We would definitely look forward to comments about the Hook system. Let us know what you think.

Added in-place ops to autograd and more neural network modules to nn
- As part of porting fb.resnet.torch, we've added AveragePool2d and fixed BatchNorm2d
- Now, autograd fully supports in-place operations, with in-place variables immediately marked as dirty.
To illustrate this, let's look at a small example

python
x = Variable(torch.ones(5, 5))
y = Variable(torch.ones(5, 5) * 4)

z = x * y
q = z * y
r = z + y
z.add_(y)
z is a the last expression, so this should succeed
z.backward(torch.ones(5, 5))

r doesn't use the z in it's backward, so it should succeed
r.backward(torch.ones(5, 5))

however, q needs z in it's backward, but z has now been
marked as dirty (because it was used in an in-place operation)
this line will hence raise an error
q.backward(torch.ones(5, 5))


Plans for alpha-3
- Unit tests for multiprocessing
- Add more nn modules and autograd functions ( we're porting fb.resnet.torch )
- New CUDA memory allocator (non-synchronizing CUDA tensors allocations)
- We've made progress on this, but it is not complete yet
- Trainer and Dataset classes
- Continuous builds for CUDA (using Nimbix)
- Binary packages (nightly and versioned)

0.1.1

It's been a week since pytorch alpha-0.
We're excited to now present alpha-1 :)

What's new?

We've built a working and unit-tested version of the new nn and autograd packages (torch.nn, torch.autograd) along with a basic draft optim package (torch.optim). The old packages will continue to be available at torch.legacy.*

We've also built fully working serialization (torch.save / torch.load) with features that one expects out of the box like sharing staying intact.

At this point, you can play around with things and get a feel of the new design.

There's an MNIST example at https://github.com/pytorch/examples

A concern raised about pytorch was that Python is a slow language.

It turns out that the MNIST example runs in exactly the same amount of time / epoch in both pytorch and (lua)Torch, and we haven't yet done any optimizations in the code in pytorch yet.

Another notable thing is that pytorch uses 1500MB of system memory vs (lua)Torch's 2300MB. This is before we've added any in-place optimizations into pytorch. The design of the new nn allows us to add seamless memory optimizations without needing the user to mark things as in-place or out-of-place which will bring us more seamless memory savings in pytorch.

More verbosely:

torch.nn

We've published an early version of the new nn package.
There are only a few modules right now, but we'll be adding more soon.

There are a couple of advantages over to old package:
- Modules no longer hold temporary buffers and short-lived state. This allows to use the same module a number of times in forward pass, and the gradients will be summed automatically. For example, see how we use the same nn.ReLU object multiple times over here: https://github.com/pytorch/examples/blob/master/mnist/main.pyL43
- There's no longer any need for using rigid container modules. Your model is defined by your code. You can select a completely different path across your model just by adding a number of `if`s. Any crazy branching schemes inside your model are allowed by design.
- It's fully compatible with autograd. Instead of using `nn.Add` or `nn.Index` you can just write this in your model definition: `y = module1(x_1)[0] + module2(x_2)`.
- You can register both forward and backward hooks at each module, which allow you to inspect the intermediate outputs and gradients flowing through the network and the graph.
- [Not Yet Implemented] Safe in-place operations. Tensors used in in-place operations are marked as dirty, and trying to use them in any way raises an error.

torch.autograd

Autograd at the core of pytorch. Enabling it is just a matter of wrapping your tensors in `Variable` objects before starting the computation (`x = Variable(x)`). Then, when you have your output you can either call `y.backward()` if it's a scalar, or provide gradient w.r.t. the variable as an argument (`y.backward(grad_output)`). Gradients w.r.t. variables are then available in their `.grad` attributes. Please note that only gradients of leaf variables (i.e. created by the user) are computed. If you want to access any gradients of intermediate values, you'll have to use a hook system.

If you don't want to compute gradient for some variables, you can even mark them in a constructor with `requires_grad=False`, and they will be optimized out from the backward pass.

torch.optim

_Please note that this api is still a bit experimental, and is likely to undergo changes soon._

optim has a different, more object oriented API. First, you have to create an optimizer object `optimizer = optim.sgd(model, lr=1e-3, momentum=0.9)`. If you don't want to merge the model and criterion in a single object, it's also possible to pass a tuple of `(model, criterion)` as the first argument to a constructor. Then, in your training loop you just call `loss = optimizer.step(input)` (in case of separate model and criterion input should be a tuple of `(input, target)`). This accumulates all the gradients and performs a single optimization step on the parameters.

Serialization

Tensors supported `pickle` protocol since the beginning of alpha, but pickle can't handle storage/data sharing properly and requires all the data to be copied before serialization.
We've created `torch.load` and `torch.save`, that have the same interface and solve both of these problems.

Tensor operators

Thanks to bart we've added support for `` operator for matrix multiplication, and changes the `*` to elementwise multiplication.

Plans for alpha-2:
- Hook system for nn and autograd (for accessing intermediate values)
- More nn modules, autograd options, and optim algorithms
- Inter-process sharing of tensors (for multiprocess data loading or hogwild training)
- New CUDA memory allocator (non-synchronizing CUDA tensors allocations)

0.1

This is often unreadable, especially for LAPACK usage where one declares booleans such as upper=True

Now, one can simply do:

python
torch.clamp(x, min=-0.1, max=0.1)


We've also implemented ellipsis indexing similar to NumPy

Deterministic Data Loader

The data loader now generates indices on the main process and regardless of how many workers you use,
the order of data loading will remain consistent if you use the same random seed.

Fully tested LAPACK bindings

Unit tests on both the CPU and CUDA side.
On the CPU, we ship with MKL-integration, and on the GPU, LAPACK is powered by MAGMA

Documentation

We are at a stage where we have converged to stable APIs.
Hence, documentation is going at a rapid pace, and we have covered:
- nn
- optim
- part of torch / Tensors

As always, you can check out the documentation here: [pytorch.org/api/latest/en/](http://pytorch.org/api/latest/en/)

Tutorials

We added one new tutorial: **[Creating extensions using numpy and scipy](https://github.com/pytorch/tutorials/blob/master/Creating%20extensions%20using%20numpy%20and%20scipy.ipynb)**
- This covers the case where you would want to quickly write some modules of your neural network using familiar scipy tools like scipy.sparse for example.

We improved the existing tutorials to cover more of the basics, and improved them.

New Features and modules

PyTorch Vision

A one-stop repository for all of your image (and soon) video needs, whether that be data loaders, common neural network definitions (such as alexnet, inception, resnet etc.) or data augmentation routines.
Our plan is to put some serious engineering firepower into this module, with GPU loaders and augmentation routines, especially for video processing. Contributions welcome :)

So far, we have:

Data loaders
- COCO (Captioning and Detection) (https://github.com/pytorch/visioncoco)
- LSUN Classification (https://github.com/pytorch/visionlsun)
- ImageFolder (https://github.com/pytorch/visionimagefolder)
- Imagenet-12 (https://github.com/pytorch/visionimagenet-12)
- CIFAR10 and CIFAR100 (https://github.com/pytorch/visioncifar)

All the data loaders are fully documented, and share a basic interface.
They are fully compatible with torch.utils.DataLoader to be parallelized in fetching.

Common Image Transforms
- Convertors from PIL Image to Torch Tensors
- Random Cropping, Scaling, Normalization transforms
- Unit tested

**The Imagenet example has been updated to use this package**

Recurrent Neural Networks

One of the biggest strengths of PyTorch's new design is to be able to seamlessly share weights and do recurrent nets.
We've emphasized this, and also deeply integrated CuDNN in a way that as a user you do not notice a thing, while having the full power and speed.

nn.RNN, nn.LSTM and nn.GRU are the stacked RecurrentNet modules that you would want to use, and for generally crazy research, we've also given implementations of individual cells: nn.LSTMCell and nn.GRUCell

A fully tested and verified example is provided in https://github.com/pytorch/examples/tree/master/word_language_model
This example does word-level language modeling on the PennTreeBank dataset.

Adversarial Networks

A concise example of Generative Adversarial Networks for Image Generation is provided, integrating multiple datasets (showcasing the power of the vision package).
The example is < 250 lines of code, and gives a lot more clarity towards the usage of PyTorch.
Multiple data loader threads, checkpointing, saving generated images to disk and much more is showcased.

A stable and fleshed out Optim package

It took us some time to design a good and stable Optim API, but now we have converged to a clean design.
The Optim package is fully Multi-GPU and Multi-device ready out of the box.
Now we've implemented and unit tested the following algorithms:
- SGD, AdaDelta, Adagrad, Adam, AdaMax, Averaged SGD, RProp, RMSProp

Setting per-layer learning rates, or optimizing only part of your neural network is now very trivial.

It is fully documented here: http://pytorch.org/api/latest/en/torch-optim
It's usage can be seen both in the DCGAN and Imagenet examples.

Improved Multi-GPU performance (and more is coming)

We've improved the Multi-GPU performance since alpha-4, and we are close to squeezing out full performance.
We are working closely with NVIDIA to squeeze out the last drops of performance and make PyTorch future-proof for the P100 and new cards.