Gluoncv

Latest version: v0.10.5.post0

Safety actively analyzes 621854 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 5

1706.05587

| [Mask-RCNN](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#instance-segmentation) | mask AP on COCO | N/A | **33.1%** | 32.8% ([Detectron](https://github.com/facebookresearch/Detectron)) |

Interactive visualizations for pre-trained models

For [image classification](https://gluon-cv.mxnet.io/model_zoo/classification.html):

<a href="https://gluon-cv.mxnet.io/model_zoo/classification.html"><img src="https://user-images.githubusercontent.com/3307514/47051128-ca3aa100-d157-11e8-8b50-08841c8cdf5f.png" width="400px" /></a>

and for [object detection](https://gluon-cv.mxnet.io/model_zoo/detection.html)

<a href="https://gluon-cv.mxnet.io/model_zoo/detection.html"><img src="https://user-images.githubusercontent.com/421857/47048450-4d0b2e00-d14f-11e8-9338-bb20bb69655b.png" width="400px"/></a>

Deploy without Python

All models are hybridiziable. They can be deployed without Python. See [tutorials](https://github.com/dmlc/gluon-cv/tree/master/scripts/deployment/cpp-inference) to deploy these models in C++.

New Models with Training Scripts

DenseNet, DarkNet, SqueezeNet for [image classification](https://gluon-cv.mxnet.io/model_zoo/classification.html#imagenet)

We now provide a broader range of model families that are good for out of box usage and various research purposes.

[YoloV3](https://gluon-cv.mxnet.io/model_zoo/detection.html#id44) for object detection

Significantly more accurate than original paper. For example, we get 37.0% mAP on CoCo versus the original [paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf)'s 33.0%. The techniques we used will be included in a paper to be released later.

[Mask-RCNN](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#instance-segmentation) for instance segmentation

Accuracy now matches Caffe2 Detectron without FPN, e.g. 38.3% box AP and 33.1% mask AP on COCO with ResNet50.

FPN support will come in future versions.

[DeepLabV3](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#semantic-segmentation) for semantic segmentation.

Slightly more accurate than original paper. For example, we get 86.7% mIoU on voc versus the original paper's 85.7%.

WGAN

Reproduced [WGAN](https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/wgan) with ResNet

Person Re-identification

Provide a [baseline model](https://github.com/dmlc/gluon-cv/tree/master/scripts/re-id/baseline) which achieved 93.1 best rank1 score on Market1501 dataset.

Enhanced Models with Better Accuracy

[Faster R-CNN](https://gluon-cv.mxnet.io/model_zoo/detection.html#id37)

* Improved Pascal VOC model accuracy. mAP improves to 78.3% from previous version's 77.9%. VOC models with 80%+ mAP will be released with the tech paper.
* Added models trained on COCO dataset.
* Now Resnet50 model achieves 37.0 mAP, out-performs Caffe2 Detectron without FPN (36.5 mAP).
* Resnet101 model achieves 40.1 mAP, out-performs Caffe2 Detectron with FPN(39.8 mAP)
* FPN support will come in future versions.

[ResNet](https://gluon-cv.mxnet.io/model_zoo/classification.html#resnet), [MobileNet](https://gluon-cv.mxnet.io/model_zoo/classification.html#mobilenet), [DarkNet](https://gluon-cv.mxnet.io/model_zoo/classification.html#others), [Inception](https://gluon-cv.mxnet.io/model_zoo/classification.html#others) for image classifcation

* Significantly improved accuracy for some models. For example, ResNet50_v1b gets 78.3% versus previous version's ResNet50_v1b's 77.07%.
* Added models trained with mixup and distillation. For example, ResNet50_v1d has 3 versions: ResNet50_v1d_distill (78.67%), ResNet50_v1d_mixup (79.16%), ResNet50_v1d_mixup_distill (79.29%).

[Semantic Segmentation](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#semantic-segmentation)

* Synchronized Batch Normalization training.
* Added Cityscapes dataset and pretrained models.
* Added training details for reproducing state-of-the-art on Pascal VOC and Provided COCO pre-trained models for VOC.

Dependency

1704.06904

![figure2](https://user-images.githubusercontent.com/3307514/54580045-83c9c680-49c3-11e9-9f44-b2f40d337bb0.png)

New application: Human Pose Estimation

https://gluon-cv.mxnet.io/model_zoo/pose.html

![sphx_glr_demo_simple_pose_001](https://user-images.githubusercontent.com/3307514/54579196-98a45b00-49bf-11e9-9257-0a91b6240575.png)

Human Pose Estimation in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

| Model | OKS AP | OKS AP (with flip) |
|--------------------------------------------------|----------------|--------------------|
| simple_pose_resnet18_v1b | 66.3/89.2/73.4 | 68.4/90.3/75.7 |
| simple_pose_resnet18_v1b | 52.8/83.6/57.9 | 54.5/84.8/60.3 |
| simple_pose_resnet50_v1b | 71.0/91.2/78.6 | 72.2/92.2/79.9 |
| simple_pose_resnet50_v1d | 71.6/91.3/78.7 | 73.3/92.4/80.8 |
| simple_pose_resnet101_v1b | 72.4/92.2/79.8 | 73.7/92.3/81.1 |
| simple_pose_resnet101_v1d | 73.0/92.2/80.8 | 74.2/92.4/82.0 |
| simple_pose_resnet152_v1b | 72.4/92.1/79.6 | 74.2/92.3/82.1 |
| simple_pose_resnet152_v1d | 73.4/92.3/80.7 | 74.6/93.4/82.1 |
| simple_pose_resnet152_v1d | 74.8/92.3/82.0 | 76.1/92.4/83.2 | 2f544338 |

Feature Pyramid Network for Faster/Mask-RCNN

| Model | bbox/seg mAP | Caffe bbox/seg |
|--------------------------------------------------|----------------|--------------------|

90.99

ssd_300_vgg16_atrous_voc* | VOC | 224 | 21.55 | 31.47 | 1.46 | 77.4 | 77.46
ssd_512_vgg16_atrous_voc* | VOC | 224 | 7.63 | 11.69 | 1.53 | 78.41 | 78.39
ssd_512_resnet50_v1_voc* | VOC | 224 | 17.81 | 34.55 | 1.94 | 80.21 | 80.16

83.5

New application: Video Action Recognition

https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

![](https://raw.githubusercontent.com/bryanyzhu/tiny-ucf101/master/action_basketball_anno.gif)

Video Action Recognition in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

| Model | Pre-Trained Dataset | Clip Length | Num of Segments | Metric | Dataset | Accuracy |
|---------------------------|--------|-----|--------|-----|---|-----|
| vgg16_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 81.5 |
| vgg16_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 83.4 |
| inceptionv3_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 85.6 |
| inceptionv3_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 88.1 |
| inceptionv3_kinetics400 | ImageNet | 1 | 3 | Top-1 | Kinetics400 | 72.5 |

The tutorial for how to prepare UCF101 and Kinetics400 dataset: https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html and https://gluon-cv.mxnet.io/build/examples_datasets/kinetics400.html .

The demo for using the pre-trained model to predict human actions: https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_ucf101.html.

The tutorial for how to train your own action recognition model: https://gluon-cv.mxnet.io/build/examples_action_recognition/dive_deep_ucf101.html.

More state-of-the-art models (I3D, SlowFast, etc.) are coming in the next release. Stay tuned.

New model: AlphaPose

https://gluon-cv.mxnet.io/model_zoo/pose.html#alphapose

![](https://raw.githubusercontent.com/MVIG-SJTU/AlphaPose/master/doc/pose.gif)

| Model | Dataset |OKS AP | OKS AP (with flip) |
|---------------------------|---|----|-------|
| alpha_pose_resnet101_v1b_coco | COCO Keypoint | 74.2/91.6/80.7 | 76.7/92.6/82.9 |

The demo for using the pre-trained AlphaPose model: https://gluon-cv.mxnet.io/build/examples_pose/demo_alpha_pose.html.

New model: MobileNetV3

https://gluon-cv.mxnet.io/model_zoo/classification.html#mobilenet

![](https://raw.githubusercontent.com/bryanyzhu/tiny-ucf101/master/mobilenetv3.jpg)

| Model | Dataset | Top-1 | Top-5 | Top-1 (original paper) |
|---------------------|------|-------|-------|--------|
| MobileNetV3_Large | ImageNet | 75.3 | 92.3 | 75.2 |
| MobileNetV3_Small | ImageNet | 67.7 | 87.5 | 67.4 |

New model: Semantic Segmentation VPLR

https://gluon-cv.mxnet.io/model_zoo/segmentation.html#cityscapes-dataset

![](https://raw.githubusercontent.com/bryanyzhu/tiny-ucf101/master/vplr_lossy2.gif)

| Model | Pre-Trained Dataset | Dataset | mIoU | iIoU |
|---------------------------|--------|-----|--------|-------|
| deeplab_v3b_plus_wideresnet_citys | ImageNet, Mapillary Vista | Cityscapes | 83.5 | 64.4 |

[Improving Semantic Segmentation via Video Propagation and Label Relaxation](https://arxiv.org/pdf/1812.01593.pdf) ported in GluonCV. State-of-the-art method on several driving semantic segmentation benchmarks (Cityscapes, CamVid and KITTI), and generalizes well to other scenes.

New model: More Int8 quantized models

https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html
Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores.
**Note that you will need nightly build of MXNet to properly use these new features.**

![](https://user-images.githubusercontent.com/34727741/64021961-a9105280-cb67-11e9-989e-76a29e58530d.png)

Model | Dataset | Batch Size | C5.12xlarge FP32 | C5.12xlarge INT8 | Speedup | FP32 Acc | INT8 Acc
-- | -- | -- | -- | -- | -- | -- | --
FCN_resnet101 | VOC | 1 | 5.46 | 26.33 | 4.82 | 97.97% | 98.00%
PSP_resnet101 | VOC | 1 | 3.96 | 10.63 | 2.68 | 98.46% | 98.45%
Deeplab_resnet101 | VOC | 1 | 4.17 | 13.35 | 3.20 | 98.36% | 98.34%
FCN_resnet101 | COCO | 1 | 5.19 | 26.22 | 5.05 | 91.28% | 90.96%
PSP_resnet101 | COCO | 1 | 3.94 | 10.60 | 2.69 | 91.82% | 91.88%
Deeplab_resnet101 | COCO | 1 | 4.15 | 13.56 | 3.27 | 91.86% | 91.98%

For segmentation models, the accuracy metric is pixAcc. Usage of int8 quantized model is identical to standard GluonCV models, simple use suffix `_int8`.

Bug fixes and Improvements

- RCNN added automatic mix precision and horovod integration. Close to 4x improvements in training throughput on 8 V100 GPU.
- RCNN added multi-image per device support.

75.32

75.04

**\*nms_thresh=0.45, nms_topk=200**

![](https://user-images.githubusercontent.com/17897736/54540947-dc08c480-49d3-11e9-9a0d-a97d44f9792c.png)

Usage of `int8` quantized model is identical to standard GluonCV models, simple use suffix `_int8`.
For example, use `resnet50_v1_int8` as `int8` quantized version of `resnet50_v1`.

Pruned ResNet

https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet

Pruning channels of convolution layers is an very effective way to reduce model redundency which aims to speed up inference without sacrificing significant accuracy. GluonCV 0.4 has included several pruned resnets from original GluonCV SoTA ResNets for ImageNet.

| Model | Top-1 | Top-5 | Hashtag | Speedup (to original ResNet) |
|-------------------|-------|-------|----------|------------------------------|
| resnet18_v1b_0.89 | 67.2 | 87.45 | 54f7742b | 2x |
| resnet50_v1d_0.86 | 78.02 | 93.82 | a230c33f | 1.68x |
| resnet50_v1d_0.48 | 74.66 | 92.34 | 0d3e69bb | 3.3x |
| resnet50_v1d_0.37 | 70.71 | 89.74 | 9982ae49 | 5.01x |
| resnet50_v1d_0.11 | 63.22 | 84.79 | 6a25eece | 8.78x |
| resnet101_v1d_0.76 | 79.46 | 94.69 | a872796b | 1.8x |
| resnet101_v1d_0.73 | 78.89 | 94.48 | 712fccb1 | 2.02x |

Scripts for pruning resnets will be release in the future.

More GANs(thanks husonchen)

SRGAN

![](https://github.com/dmlc/gluon-cv/blob/master/scripts/gan/srgan/pred.png?raw=true)

A GluonCV SRGAN of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/srgan

CycleGAN

![teaser](https://user-images.githubusercontent.com/3307514/54579701-efab2f80-49c1-11e9-8a90-e9170f21dc8a.jpg)

Image-to-Image translation reproduced in GluonCV: https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/cycle_gan

Residual Attention Network(thanks PistonY)

Page 1 of 5

Releases

Has known vulnerabilities

Gluoncv

Page 1 of 5

1706.05587

1704.06904

90.99

83.5

75.32

75.04

Page 1 of 5

Links

Releases