PyTorch Support
We want to make our toolkit agnostic to deep learning frameworks so that it is available for everyone. From this release, we start to support PyTorch. All PyTorch code and models are under `torch` folder inside `gluoncv`, arranged in the same hierarchy as before: `model`, `data`, `nn` and `utils`. `model` folder contains our model zoo with model definitions, `data` folder contains dataset definition and dataloader, `nn` defines new operators and `utils` provide utility functions to help model training, evaluation and visualization.
To get started, you can find [installation instructions](https://cv.gluon.ai/install.html), [model zoo](https://cv.gluon.ai/model_zoo/index.html) and [tutorials](https://cv.gluon.ai/tutorials_torch/index.html) on our website. In order to make our toolkit easier to use and customize, we provide model definitions separately for each method without extreme abstraction and modularization. In this manner, you can play with each model without jumping across multiple files, and you can modify individual model implementation without affecting other models. At the same time, we adopt `yaml` for easier configuration. We thrive to make our toolkit more user friendly for students and researchers.
Video Action Recognition PyTorch Model Zoo
We have **46** PyTorch models for video action recognition, with better I3D models, more recent TPN family, faster training (DDP support and multi-grid) and K700 pretrained weights. Finetuning and feature extraction can never be easier.
Details of our model zoo can be seen at [here](https://cv.gluon.ai/model_zoo/action_recognition.html). In terms of models, we cover TSN, I3D, I3D_slow, R2+1D, Non-local, CSN, TSN and TPN. In terms of datasets, we cover Kinetics400, Kinetics700 and Something-something-v2. All of our models have similar or better performance compared to numbers reported in original paper.
We provide several tutorials to get you started, including [how to make predictions using a pretrained model](https://cv.gluon.ai/build/examples_torch_action_recognition/demo_i3d_kinetics400.html), [how to extract video features from a pretrained model](https://cv.gluon.ai/build/examples_torch_action_recognition/extract_feat.html), [how to finetune a model on your dataset](https://cv.gluon.ai/build/examples_torch_action_recognition/finetune_custom.html), [how to measure a model's flops/speed](https://cv.gluon.ai/build/examples_torch_action_recognition/speed.html), and [how to use our DDP framework](https://cv.gluon.ai/build/examples_torch_action_recognition/ddp_pytorch.html).
Since video models are slow to train (due to slow IO and large model), we also support distributed dataparallel (DDP) training and [multi-grid training](https://arxiv.org/abs/1912.00998). DDP can provide 2x speed up and multi-grid training can provide 3-4x speed up. Combining these two techniques can significantly shorten the training process. In addition, both techniques are provided as helper functions. You can easily add your model definitions to GluonCV (a single python file like [this](https://github.com/dmlc/gluon-cv/blob/master/gluoncv/torch/model_zoo/action_recognition/i3d_resnet.py)) and enjoy the speed brought by our framework. More details can be read in this [tutorial](https://cv.gluon.ai/build/examples_torch_action_recognition/ddp_pytorch.html).
Bug fixes and Improvements
- Refactored table in csv form. (1465 )
- Added DeepLab ResNeSt200 pretrained weights (1456 )
- StyleGAN training instructions (1446 )
- More settings for Monodepth2 and bug fix (1459 1472 )
- Fix RCNN target generator (1508)
- Revise DANet (1507 )
- New docker image is added which is ready for GluonCV applications and developments(1474)
Acknowledgement
Special thanks to Arthurlxy ECHO960 zhreshold yinweisu for their support in this release. Thanks to coocoo90 for contributing the CSN and R2+1D models. And thanks to other contributors for the bug fixes and improvements.