Gluoncv

Latest version: v0.10.5.post0

Safety actively analyzes 630305 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 5

0.345

0.10.0

- simpler and better custom dataset loading experience with pandas DataFrame visualization. Comparing with obsolete code based dataset composition, it allows you to load arbitrary datasets faster and more reliable.

- one liner `fit` function with configuration file support(yaml configuration file)

- built-in HPO support, for effortless tuning of hyper-parameters

gluoncv.auto

This release includes a new module called `gluoncv.auto`, with `gluoncv.auto` you can access many high-level APIs such as `data`, `estimators` and `tasks`.

gluoncv.auto.data

`auto.data` module is designed to load arbitrary web datasets you find on the internet, such as Kaggle competition datasets.
You may refer to this [tutorial](https://cv.gluon.ai/build/examples_auto_module/demo_auto_data.html) or check out the fully compatible [d8 dataset](http://preview.d2l.ai/d8/main/image_classification/getting_started.html) for loading custom datasets.

Loading data:
<img src="https://user-images.githubusercontent.com/3307514/109356038-02c50c00-7835-11eb-83ba-2dfc070b4ef6.png" height="150">

The dataset has internal `DataFrame` storage for easier access and analysis

<img src="https://user-images.githubusercontent.com/3307514/109357043-6bf94f00-7836-11eb-9db1-7628218fba43.png" height="300">

Visualization:

<img src="https://user-images.githubusercontent.com/3307514/109357126-8cc1a480-7836-11eb-88f5-071c9a817aac.png" height="150">

similar for object detection:

<img src="https://user-images.githubusercontent.com/3307514/109356385-7b2bcd00-7835-11eb-86e3-8fabc6557464.png" height="200">

gluoncv.auto.estimators

In this release, we packed the following high-level estimators for training and predicting images for image classification and object detection.

- gluoncv.auto.estimators.ImageClassificationEstimator
- gluoncv.auto.estimators.SSDEstimator
- gluoncv.auto.estimators.CenterNetEstimator
- gluoncv.auto.estimators.FasterRCNNEstimator
- gluoncv.auto.estimators.YOLOv3Estimator

Highlighted usages

- `fit` function:

![image](https://user-images.githubusercontent.com/3307514/109357783-a57e8a00-7837-11eb-9c99-f2a2fac99112.png)

- `predict`, `predict_proba`(for image classification), `predict_feature`(for image classification)

<img src="https://user-images.githubusercontent.com/3307514/109358005-ed9dac80-7837-11eb-811c-7690981066ab.png" height="40">
<img src="https://user-images.githubusercontent.com/3307514/109358236-4a00cc00-7838-11eb-8256-2a6e6f51fab8.png" height="150">

- `save` and `load`.

You may visit the tutorial website for more detailed [examples](https://cv.gluon.ai/build/examples_auto_module/train_image_classifier_basic.html).

gluoncv.auto.tasks

In this release, the following auto tasks are supported and have been massively tested on many datasets to ensure HPO performance:

- gluoncv.auto.tasks.ImageClassification
- gluoncv.auto.tasks.ObjectDetection

Comparing with pure algorithm-based estimators, the auto tasks provide identical APIs and functionalities but allow you to `fit` with hyper-parameter optimization(HPO) with specified `num_trials` and `time_limit`. For object detection, it allows multiple algorithms(e.g., SSDEstimator and FasterRCNNEstimator) to be tuned as a categorical search space.

The tutorial is available [here](https://cv.gluon.ai/build/examples_auto_module/demo_auto_detection.html)

Bug fixes and improvements

- Improved training speed for mask-rcnn script (1595, 1609)
- Fix an issue in classification dataset (1599)
- Fix a batch-size issue for mask-rcnn validation during training (1594)
- Fix an os directory issue for model zoo folder (1591)
- Improved CI stability (1581)

0.9.0

PyTorch Support
We want to make our toolkit agnostic to deep learning frameworks so that it is available for everyone. From this release, we start to support PyTorch. All PyTorch code and models are under `torch` folder inside `gluoncv`, arranged in the same hierarchy as before: `model`, `data`, `nn` and `utils`. `model` folder contains our model zoo with model definitions, `data` folder contains dataset definition and dataloader, `nn` defines new operators and `utils` provide utility functions to help model training, evaluation and visualization.

To get started, you can find [installation instructions](https://cv.gluon.ai/install.html), [model zoo](https://cv.gluon.ai/model_zoo/index.html) and [tutorials](https://cv.gluon.ai/tutorials_torch/index.html) on our website. In order to make our toolkit easier to use and customize, we provide model definitions separately for each method without extreme abstraction and modularization. In this manner, you can play with each model without jumping across multiple files, and you can modify individual model implementation without affecting other models. At the same time, we adopt `yaml` for easier configuration. We thrive to make our toolkit more user friendly for students and researchers.

Video Action Recognition PyTorch Model Zoo
We have **46** PyTorch models for video action recognition, with better I3D models, more recent TPN family, faster training (DDP support and multi-grid) and K700 pretrained weights. Finetuning and feature extraction can never be easier.

Details of our model zoo can be seen at [here](https://cv.gluon.ai/model_zoo/action_recognition.html). In terms of models, we cover TSN, I3D, I3D_slow, R2+1D, Non-local, CSN, TSN and TPN. In terms of datasets, we cover Kinetics400, Kinetics700 and Something-something-v2. All of our models have similar or better performance compared to numbers reported in original paper.

We provide several tutorials to get you started, including [how to make predictions using a pretrained model](https://cv.gluon.ai/build/examples_torch_action_recognition/demo_i3d_kinetics400.html), [how to extract video features from a pretrained model](https://cv.gluon.ai/build/examples_torch_action_recognition/extract_feat.html), [how to finetune a model on your dataset](https://cv.gluon.ai/build/examples_torch_action_recognition/finetune_custom.html), [how to measure a model's flops/speed](https://cv.gluon.ai/build/examples_torch_action_recognition/speed.html), and [how to use our DDP framework](https://cv.gluon.ai/build/examples_torch_action_recognition/ddp_pytorch.html).

Since video models are slow to train (due to slow IO and large model), we also support distributed dataparallel (DDP) training and [multi-grid training](https://arxiv.org/abs/1912.00998). DDP can provide 2x speed up and multi-grid training can provide 3-4x speed up. Combining these two techniques can significantly shorten the training process. In addition, both techniques are provided as helper functions. You can easily add your model definitions to GluonCV (a single python file like [this](https://github.com/dmlc/gluon-cv/blob/master/gluoncv/torch/model_zoo/action_recognition/i3d_resnet.py)) and enjoy the speed brought by our framework. More details can be read in this [tutorial](https://cv.gluon.ai/build/examples_torch_action_recognition/ddp_pytorch.html).

Bug fixes and Improvements

- Refactored table in csv form. (1465 )
- Added DeepLab ResNeSt200 pretrained weights (1456 )
- StyleGAN training instructions (1446 )
- More settings for Monodepth2 and bug fix (1459 1472 )
- Fix RCNN target generator (1508)
- Revise DANet (1507 )
- New docker image is added which is ready for GluonCV applications and developments(1474)

Acknowledgement
Special thanks to Arthurlxy ECHO960 zhreshold yinweisu for their support in this release. Thanks to coocoo90 for contributing the CSN and R2+1D models. And thanks to other contributors for the bug fixes and improvements.

0.8.0

Monodepth2 (thanks KuangHaofei )

We provide GluonCV implementation of [Monodepth2](https://arxiv.org/abs/1806.01260) and the results are fully reproducible. To try out on your own images, please see our [demo tutorial](https://gluon-cv.mxnet.io/build/examples_depth/demo_monodepth2.html). To train a Monodepth2 model on your own dataset, please see our [dive deep tutorial](https://gluon-cv.mxnet.io/build/examples_depth/train_monodepth2.html).

Following table shows its performance on the KITTI dataset.
| Name | Modality | Resolution | Abs. Rel. Error | delta < 1.25 | Hashtag |
| -- | -- | -- | -- | -- | -- |
| monodepth2_resnet18_kitti_stereo_640x192 1 | Stereo | 640x192 | 0.114 | 0.856 | 92871317 |

![](https://github.com/nianticlabs/monodepth2/raw/master/assets/teaser.gif)

More Semantic Segmentation Models (thanks xdeng7 and ytian8 )

We include two new semantic segmentation models in this release, one is [DANet](https://arxiv.org/abs/1809.02983), the other is [FastSCNN](https://arxiv.org/abs/1902.04502).

Following table shows their performance on the Cityscapes validation set.
| Model | Pre-Trained Dataset | Dataset | pixAcc | mIoU |
|---------------------------|--------|-----|--------|-------|
| danet_resnet50_citys | ImageNet | Cityscapes | 96.3 | 78.5 |
| danet_resnet101_citys | ImageNet | Cityscapes | 96.5 | 80.1 |
| fastscnn_citys | - | Cityscapes | 95.1 | 72.3 |

Our FastSCNN is an improved version from a [recent paper](https://arxiv.org/abs/2004.14960) using semi-supervised learning. To our best knowledge, `72.3` mIoU is the highest-scored implementation of FastSCNN and one of the best real-time semantic segmentation models.

StyleGAN (thanks xdeng7 )

![](https://github.com/dmlc/gluon-cv/blob/master/scripts/gan/stylegan/sample.jpg?raw=true)

A GluonCV implementation of [StyleGAN](https://arxiv.org/abs/1812.04948) "A Style-Based Generator Architecture for Generative Adversarial Networks": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/stylegan

Bug fixes and Improvements

- We now officially deprecated python2 support, the minimum required python 3 version is 3.6. (1399)
- Fixed Faster-RCNN training script (1249)
- Allow SRGAN to be hybridized (1281)
- Fix market1501 dataset (1227)
- Added Visdrone dataset (1267)
- Improved video action recognition task's `train.py` (1339)
- Added jetson object detection tutorial (1346)
- Improved guide for contributing new algorithms to GluonCV (1354)
- Fixed amp parameter that required in class ForwardBackwardTask (1404)

0.7

Image Classification

GluonCV now provides the state-of-art image classification backbones that can be used by various downstream tasks. Our ResNeSt outperforms EfficientNet in accuracy-speed trade-off as shown in the following figures. You can now swap in our new ResNeSt in your research or product to get immediate performance improvement. Checkout the detail in our paper: [ResNeSt: Split Attention Network](https://arxiv.org/pdf/2004.08955.pdf)

Here is a comparison between ResNeSt and EfficientNet. The average latency is computed using a single V100 on a p3dn.24xlarge machine with a batch size of 16.

<img width="514" alt="resnest_vs_efficientnet" src="https://user-images.githubusercontent.com/4907789/79623404-f6c6d480-80d0-11ea-84e5-fbbf4c1558a2.png">

Model | input size | top-1 acc (%) | avg latency (ms) |
-- | -- | -- | -- | --
SENet_154 | 224x224 | 81.26 | 5.07 | previous
ResNeSt50 | 224x224 | 81.13 | 1.78 | v0.7
ResNeSt101 | 256x256 | 82.83 | 3.43 | v0.7
ResNeSt200 | 320x320 | 83.90 | 9.49 | v0.7
ResNeSt269 | 416x416 | **84.54** | 19.50 | v0.7

Object Detection

We add two new ResNeSt based Faster R-CNN model. Noted that our model is trained using 2x learning rate schedule instead of the 1x schedule used in our paper. Our two new models are 2-4% higher on COCO mAP than our previous best model “faster_rcnn_fpn_resnet101_v1d_coco”. Notebly, our ResNeSt-50 based model has a 4.1% higher mAP than our previous ResNet-101 based model.

Model | Backbone | mAP |
-- | -- | -- | --
Faster R-CNN | ResNet-101 | 40.8 | previous
Faster R-CNN | ResNeSt-50 | 42.7 | v0.7
Faster R-CNN | ResNeSt-101 | **44.9** | v0.7

Semantic Segmentation

We add ResNeSt-50 and ResNeSt-101 based DeepLabV3 for semantic segmentation task on ADE20K dataset. Our new models are 1-2.8% higher than our previous best. Similar to our detection result, ResNeSt-50 performs better than ResNet-101 based model. DeepLabV3 with ResNeSt-101 backbone achieves **a new state-of-the-art of 46.9 mIoU** on ADE20K validation set, which outperform previous best by more than 1%.

Model | Backbone | pixel Accuracy | mIoU |
-- | -- | -- | -- | --
DeepLabV3 | ResNet-101 | 81.1 | 44.1 | previous
DeepLabV3 | ResNeSt-50 | 81.2 | 45.1 | v0.7
DeepLabV3 | ResNeSt-101 | 82.1 | **46.9** | v0.7

Bug fixes and Improvements

* Instructions for achieving 25.7 min Mask R-CNN training.
* Fix R-CNNs export

0.7.0

Highlights

Page 3 of 5

Releases

Has known vulnerabilities

Previous Next

Gluoncv

Page 3 of 5

0.345

0.10.0

0.9.0

0.8.0

0.7

0.7.0

Page 3 of 5

Links

Releases