Torchvision

Latest version: v0.18.0

Safety actively analyzes 630305 Python packages for vulnerabilities to keep your Python projects secure.

Page 21 of 23

0.5.0

This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.

**Note: this is the last version of torchvision that officially supports Python 2.**

Breaking changes

Updated KeypointRCNN pre-trained weights

The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (1609)

Corrected the implementation for MNASNet

The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (1224)

Highlights

TorchScript support for all models

All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN.
Using torchscript with torchvision models is easy:
python
get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

convert to torchscript
model_script = torch.jit.script(model)
model_script.eval()

compute predictions
predictions = model_script([torch.rand(3, 300, 300)])

**Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.**

ONNX

All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.
python
get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
inputs = [torch.rand(3, 300, 300)]
predictions = model(inputs)

convert to ONNX
torch.onnx.export(model, inputs, "model.onnx",
do_constant_folding=True,
opset_version=11 opset_version 11 required for Mask R-CNN
)

**Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.**

Quantized models

torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:
python
model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model.eval()

run the model with quantized inputs and weights
out = model(torch.rand(1, 3, 224, 224))

We provide pre-trained quantized weights for the following models:

| Model | Acc1 | Acc5 |
| --- | --- | --- |

0.4.2

This minor release introduces an optimized `video_reader` backend for torchvision. It is implemented in C++, and uses FFmpeg internally.

The new `video_reader` backend can be up to 6 times faster compared to the `pyav` backend.
- When decoding all video/audio frames in the video, the new `video_reader` is 1.2x - 6x faster depending on the codec and video length.
- When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), `video_reader` runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128]).

Using the optimized video backend

Switching to the new backend can be done via `torchvision.set_video_backend('video_reader')` function. By default, we use a backend based on top of [PyAV](https://github.com/mikeboers/PyAV).

Due to packaging issues with FFmpeg, in order to use the `video_reader` backend one need to first have `ffmpeg` available on the system, and then compile torchvision from source using the instructions from https://github.com/pytorch/vision#installation

Deprecations
In torchvision 0.4.0, the `read_video` and `read_video_timestamps` functions used `pts` relative to the video stream. This could lead to unaligned video-audio being returned in some cases.

torchvision now allow to specify a `pts_unit` argument in those functions. The default value is `'pts'` (with same behavior as before), and the user can now specify `pts_unit='sec'`, which produces consistently aligned results for both video and audio. The `'pts'` value is deprecated for now, and kept for backwards-compatibility.

In the next release, the default value of `pts_unit` will change to `'sec'`, so that calling `read_video` without specifying `pts_unit` returns consistently aligned audio-video results. This will require users to update their `VideoClips` checkpoints, which used to store the information in `pts` by default.

Changelog
- [video reader] inception commit (1303) 31fad34
- Expose frame-rate and cache to video datasets (1356) 85ffd93
- Expose num_workers in VideoClips (1359) 02a8c0a
- Fix randomresized params flaky (1282) 7c9bbf5
- Video transforms (1353) 64917bc
- add _backend argument to init() of class VideoClips (1363) 7874374
- Video clips workers (1369) 0982395
- modified code of io.read_video and io.read_video_timestamps to intepret pts values in seconds (1331) 17e355f
- add metadata to video dataset classes. bug fix. more robustness (1376) 49b01e3
- move sampler into TV core. Update UniformClipSampler (1408) f0d3daa
- remove hardcoded video extension in kinetics400 dataset (1418) 929c81d
- Fix hmdb51 and ucf101 typo (1420) b13931a
- fix a bug related to audio_end_pts (1431) 1258bb7
- expose more io api (1423) e48b958
- Make video transforms private (1429) 79daca1
- extend video reader to support fast video probing (1437) ed5b2dc
- Better handle corrupted videos (1463) da89dad
- Temporary fix to remove ffmpeg from build time (1475) ed04dee
- fix a bug when video decoding fails and empty frames are returned (1506) 2804c12
- extend DistributedSampler to support group_size (1512) 355e9d2
- Unify video backend (1514) 97b53f9
- Unify video metadata in VideoClips (1527) 7d509c5
- Fixed compute_clips docstring (1543) b438d32

0.4.1

This minor release provides binaries compatible with PyTorch 1.3.

Compared to version 0.4.0, it contains a single bugfix for `HMDB51` and `UCF101` datasets, fixed in https://github.com/pytorch/vision/pull/1240

0.4.0

This release adds support for video models and datasets, and brings several improvements.

**Note**: torchvision 0.4 requires PyTorch 1.2 or newer

Highlights

Video and IO

Video is now a first-class citizen in torchvision. The 0.4 release includes:

* efficient IO primitives for reading and writing video files
* Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with `torch.utils.data.DataLoader`
* Pre-trained models for action recognition, trained on Kinetics-400
* Training and evaluation scripts for reproducing the training results.

Writing your own video dataset is easy. We provide an utility class `VideoClips` that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

python
from torchvision.datasets.video_utils import VideoClips

class MyVideoDataset(object):
def __init__(self, video_paths):
self.video_clips = VideoClips(video_paths,
clip_length_in_frames=16,
frames_between_clips``=1,
frame_rate=15)

def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
return video, audio

def __len__(self):
return self.video_clips.num_clips()

We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

|model |clip 1 |
|--- |--- |
|r3d_18 |52.748 |
|mc3_18 |53.898 |
|r2plus1d_18 |57.498 |

Bugfixes

* change aspect ratio calculation formula in `references/detection` (1194)
* bug fixes in ImageNet (1149)
* fix save_image when height or width equals 1 (1059)
* Fix STL10 `__repr__` (969)
* Fix wrong behavior of `GeneralizedRCNNTransform` in Python2. (960)

Datasets

New

* Add USPS dataset (961)(1117)
* Added support for the QMNIST dataset (995)
* Add HMDB51 and UCF101 datasets (1156)
* Add Kinetics400 dataset (1077)

Improvements

* Miscellaneous dataset fixes (1174)
* Standardize str argument verification in datasets (1167)
* Always pass `transform` and `target_transform` to abstract dataset (1126)
* Remove duplicate transform assignment in FakeDataset (1125)
* Automatic extraction for Cityscapes Dataset (1066) (1068)
* Use joint transform in Cityscapes (1024)(1045)
* CelebA: track attr names, support split="all", code cleanup (1008)
* Add folds option to STL10 (914)

Models

New

* Add pretrained Wide ResNet (912)
* Memory efficient densenet (1003) (1090)
* Implementation of the MNASNet family of models (829)(1043)(1092)
* Add VideoModelZoo models (1130)

Improvements

* Fix resnet fpn backbone for resnet18 and resnet34 (1147)
* Add checks to `roi_heads` in detection module (1091)
* Make shallow copy of input list in `GeneralizedRCNNTransform` (1085)(1111)(1084)
* Make MobileNetV2 number of channel divisible by 8 (1005)
* typo fix: ouput -> output in Inception and GoogleNet (1034)
* Remove empty proposals from the RPN (1026)
* Remove empty boxes before NMS (1019)
* Reduce code duplication in segmentation models (1009)
* allow user to define residual settings in MobileNetV2 (965)
* Use `flatten` instead of `view` (1134)

Documentation

* Consistency in detection box format (1110)
* Fix Mask R-CNN docs (1089)
* Add paper references to VGG and Resnet variants (1088)
* Doc, Test Fixes in `Normalize` (1063)
* Add transforms doc to more datasets (1038)
* Corrected typo: 5 to 0.5 (1041)
* Update doc for `torchvision.transforms.functional.perspective` (1017)
* Improve documentation for `fillcolor` option in `RandomAffine` (994)
* Fix `COCO_INSTANCE_CATEGORY_NAMES` (991)
* Added models information to documentation. (985)
* Add missing import in `faster_rcnn.py` documentation (979)
* Improve `make_grid` docs (964)

Tests

* Add test for SVHN (1086)
* Add tests for Cityscapes Dataset (1079)
* Update CI to Python 3.6 (1044)
* Make `test_save_image` more robust (1037)
* Add a generic test for the datasets (1015)
* moved fakedata generation to separate module (1014)
* Create imagenet fakedata on-the-fly (1012)
* Minor test refactorings (1011)
* Add test for CIFAR10(0) (1010)
* Mock MNIST download for less flaky tests (1004)
* Add test for ImageNet (976)(1006)
* Add tests for datasets (966)

Transforms

New

* Add Random Erasing for image augmentation (909) (1060) (1087) (1095)

Improvements

* Allowing 'F' mode for 1 channel FloatTensor in `ToPILImage` (1100)
* Add shear parallel to y-axis (1070)
* fix error message in `to_tensor` (1000)
* Fix TypeError in `RandomResizedCrop.get_params` (1036)
* Fix `normalize` for different `dtype` than `float32` (1021)

Ops

* Renamed `vision.h` files to `vision_cpu.h` and `vision_cuda.h` (1051)(1052)
* Optimize `nms_cuda` by avoiding extra `torch.cat` call (945)

Reference scripts

* Expose data-path in the detection reference scripts (1109)
* Make `utils.py` work with pytorch-cpu (1023)
* Add mixed precision training with Apex (972)(1124)
* Add reference code for similarity learning (1101)

Build

* Add windows build steps and wheel build scripts (998)
* add packaging scripts (996)
* Allow forcing GPU build with `FORCE_CUDA=1` (927)

Misc

* Misc lint fixes (1020)
* Reraise error on failed downloading (1013)
* add more hub models (974)
* make C extension lazy-import (971)

0.3.0

This release brings several new features to torchvision, including models for semantic segmentation, object detection, instance segmentation and person keypoint detection, and custom C++ / CUDA ops specific to computer vision.

**Note: torchvision 0.3 requires PyTorch 1.1 or newer**

Highlights

Reference training / evaluation scripts

We now provide under the `references/` folder scripts for training and evaluation of the following tasks: classification, semantic segmentation, object detection, instance segmentation and person keypoint detection.
Their purpose is twofold:

* serve as a log of how to train a specific model.
* provide baseline training and evaluation scripts to bootstrap research

They all have an entry-point `train.py` which performs both training and evaluation for a particular task. Other helper files, specific to each training script, are also present in the folder, and they might get integrated into the torchvision library in the future.

We expect users should copy-paste and modify those reference scripts and use them for their own needs.

TorchVision Ops

TorchVision now contains custom C++ / CUDA operators in `torchvision.ops`. Those operators are specific to computer vision, and make it easier to build object detection models.
Those operators currently do not support PyTorch script mode, but support for it is planned for future releases.

List of supported ops

* `roi_pool` (and the module version `RoIPool`)
* `roi_align` (and the module version `RoIAlign`)
* `nms`, for non-maximum suppression of bounding boxes
* `box_iou`, for computing the intersection over union metric between two sets of bounding boxes

All the other ops present in `torchvision.ops` and its subfolders are experimental, in particular:

* `FeaturePyramidNetwork` is a module that adds a FPN on top of a module that returns a set of feature maps.
* `MultiScaleRoIAlign` is a wrapper around `roi_align` that works with multiple feature map scales

Here are a few examples on using torchvision ops:
python
import torch
import torchvision

create 10 random boxes
boxes = torch.rand(10, 4) * 100
they need to be in [x0, y0, x1, y1] format
boxes[:, 2:] += boxes[:, :2]
create a random image
image = torch.rand(1, 3, 200, 200)
extract regions in `image` defined in `boxes`, rescaling
them to have a size of 3x3
pooled_regions = torchvision.ops.roi_align(image, [boxes], output_size=(3, 3))
check the size
print(pooled_regions.shape)
torch.Size([10, 3, 3, 3])

or compute the intersection over union between
all pairs of boxes
print(torchvision.ops.box_iou(boxes, boxes).shape)
torch.Size([10, 10])

Models for more tasks

The 0.3 release of torchvision includes pre-trained models for other tasks than image classification on ImageNet.
We include two new categories of models: region-based models, like Faster R-CNN, and dense pixelwise prediction models, like DeepLabV3.

Object Detection, Instance Segmentation and Person Keypoint Detection models

**Warning: The API is currently experimental and might change in future versions of torchvision**

The 0.3 release contains pre-trained models for Faster R-CNN, Mask R-CNN and Keypoint R-CNN, all of them using ResNet-50 backbone with FPN.
They have been trained on COCO train2017 following the reference scripts in `references/`, and give the following results on COCO val2017

Network | box AP | mask AP | keypoint AP
-- | -- | -- | --

0.2.2

This version introduces several improvements and fixes.

Support for arbitrary input sizes for models

It is now possible to feed larger images than 224x224 into the models in torchvision.
We added an adaptive pooling just before the classifier, which adapts the size of the feature maps before the last layer, allowing for larger input images.
Relevant PRs: 744 747 746 672 643

Bugfixes

* Fix invalid argument error when using lsun method in windows (508)
* Fix FashionMNIST loading MNIST (640)
* Fix inception v3 input transform for trace & onnx (621)

Datasets

* Add support for webp and tiff images in ImageFolder 736 724
* Add K-MNIST dataset 687
* Add Cityscapes dataset 695 725 739 700
* Add Flicker 8k and 30k datasets 674
* Add VOCDetection and VOCSegmentation datasets 663
* Add SBU Captioned Photo Dataset (665)
* Updated URLs for EMNIST 726
* MNIST and FashionMNIST now have their own 'raw' and 'processed' folder 601
* Add metadata to some datasets (501)

Improvements

* Allow RandomCrop to crop in the padded region 564
* ColorJitter now supports min/max values 548
* Generalize resnet to use block.extension 487
* Move area calculation out of for loop in RandomResizedCrop 641
* Add option to zero-init the residual branch in resnet (498)
* Improve error messages in to_pil_image 673
* Added the option of converting to tensor for numpy arrays having only two dimensions in to_tensor (686)
* Optimize _find_classes in DatasetFolder via scandir in Python3 (559)
* Add padding_mode to RandomCrop (489 512)
* Make DatasetFolder more generic (527)
* Add in-place option to normalize (699)
* Add Hamming and Box interpolations to transforms.py (693)
* Added the support of 2-channel Image modes such as 'LA' and adding a mode in 4 channel modes (688)
* Improve support for 'P' image mode in pad (683)
* Make torchvision depend on pillow-simd if already installed (522)
* Make tests run faster (745)
* Add support for non-square crops in RandomResizedCrop (715)

Breaking changes

* save_images now round to nearest integer 754

Misc

* Added code coverage to travis 703
* Add downloads and docs badge to README (702)
* Add progress to download_url 497 524 535
* Replace 'residual' with 'identity' in resnet.py (679)
* Consistency changes in the models
* Refactored MNIST and CIFAR to have data and target fields 578 594
* Update torchvision to newer versions of PyTorch
* Relax assertion in `transforms.Lambda.__init__` (637)
* Cast MNIST target to int (605)
* Change default target type of FakedDataset to long (581)
* Improve docs of functional transforms (602)
* Docstring improvements
* Add is_image_file to folder_dataset (507)
* Add deprecation warning in MNIST train[test]_labels[data] (742)
* Mention TORCH_MODEL_ZOO in models documentation. (624)
* Add scipy as a dependency to setup.py (675)
* Added size information for inception v3 (719)

Page 21 of 23

Releases

Has known vulnerabilities

Previous Next

Torchvision

Page 21 of 23

0.5.0

0.4.2

0.4.1

0.4.0

0.3.0

0.2.2

Page 21 of 23

Links

Releases