Changelogs » Gluoncv

Gluoncv

1706.05587

| [Mask-RCNN](https://gluon-cv.mxnet.io/model_zoo/segmentation.htmlinstance-segmentation)   | mask AP on COCO       | N/A    | **33.1%**  | 32.8% ([Detectron](https://github.com/facebookresearch/Detectron)) |


Interactive visualizations for pre-trained models

For [image classification](https://gluon-cv.mxnet.io/model_zoo/classification.html):

<a href="https://gluon-cv.mxnet.io/model_zoo/classification.html"><img src="https://user-images.githubusercontent.com/3307514/47051128-ca3aa100-d157-11e8-8b50-08841c8cdf5f.png" width="400px" /></a>

and for [object detection](https://gluon-cv.mxnet.io/model_zoo/detection.html)

<a href="https://gluon-cv.mxnet.io/model_zoo/detection.html"><img src="https://user-images.githubusercontent.com/421857/47048450-4d0b2e00-d14f-11e8-9338-bb20bb69655b.png" width="400px"/></a>

Deploy without Python

All models are hybridiziable. They can be deployed without Python. See [tutorials](https://github.com/dmlc/gluon-cv/tree/master/scripts/deployment/cpp-inference) to deploy these models in C++.


New Models with Training Scripts

DenseNet, DarkNet, SqueezeNet for [image classification](https://gluon-cv.mxnet.io/model_zoo/classification.htmlimagenet)

We now provide a broader range of model families that are good for out of box usage and various research purposes.


[YoloV3](https://gluon-cv.mxnet.io/model_zoo/detection.htmlid44) for object detection

Significantly more accurate than original paper. For example, we get 37.0% mAP on CoCo versus the original [paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf)'s 33.0%. The techniques we used will be included in a paper to be released later.

[Mask-RCNN](https://gluon-cv.mxnet.io/model_zoo/segmentation.htmlinstance-segmentation) for instance segmentation

Accuracy now matches Caffe2 Detectron without FPN, e.g. 38.3% box AP and 33.1% mask AP on COCO with ResNet50.

FPN support will come in future versions.

[DeepLabV3](https://gluon-cv.mxnet.io/model_zoo/segmentation.htmlsemantic-segmentation) for semantic segmentation.

Slightly more accurate than original paper. For example, we get 86.7% mIoU on voc versus the  original paper's 85.7%.

WGAN

Reproduced [WGAN](https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/wgan) with ResNet

Person Re-identification

Provide a [baseline model](https://github.com/dmlc/gluon-cv/tree/master/scripts/re-id/baseline) which achieved 93.1 best rank1 score on Market1501 dataset.

Enhanced Models with Better Accuracy

[Faster R-CNN](https://gluon-cv.mxnet.io/model_zoo/detection.htmlid37)

* Improved Pascal VOC model accuracy. mAP improves to 78.3% from previous version's 77.9%. VOC models with 80%+ mAP will be released with the tech paper.
* Added models trained on COCO dataset.
* Now Resnet50 model achieves 37.0 mAP, out-performs Caffe2 Detectron without FPN (36.5 mAP).
* Resnet101 model achieves 40.1 mAP, out-performs Caffe2 Detectron with FPN(39.8 mAP)
* FPN support will come in future versions.

[ResNet](https://gluon-cv.mxnet.io/model_zoo/classification.htmlresnet), [MobileNet](https://gluon-cv.mxnet.io/model_zoo/classification.htmlmobilenet), [DarkNet](https://gluon-cv.mxnet.io/model_zoo/classification.htmlothers), [Inception](https://gluon-cv.mxnet.io/model_zoo/classification.htmlothers) for image classifcation

* Significantly improved accuracy for some models. For example, ResNet50_v1b gets 78.3% versus previous version's ResNet50_v1b's 77.07%.
* Added models trained with mixup and distillation. For example, ResNet50_v1d has 3 versions: ResNet50_v1d_distill (78.67%), ResNet50_v1d_mixup (79.16%), ResNet50_v1d_mixup_distill (79.29%).

[Semantic Segmentation](https://gluon-cv.mxnet.io/model_zoo/segmentation.htmlsemantic-segmentation)

* Synchronized Batch Normalization training.
* Added Cityscapes dataset and pretrained models.
* Added training details for reproducing state-of-the-art on Pascal VOC and Provided COCO pre-trained  models for VOC.

Dependency

1704.06904

![figure2](https://user-images.githubusercontent.com/3307514/54580045-83c9c680-49c3-11e9-9f44-b2f40d337bb0.png)


New application: Human Pose Estimation

https://gluon-cv.mxnet.io/model_zoo/pose.html

![sphx_glr_demo_simple_pose_001](https://user-images.githubusercontent.com/3307514/54579196-98a45b00-49bf-11e9-9257-0a91b6240575.png)

Human Pose Estimation in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.

| Model                                            | OKS AP         | OKS AP (with flip) |
|--------------------------------------------------|----------------|--------------------|
| simple_pose_resnet18_v1b                     | 66.3/89.2/73.4 | 68.4/90.3/75.7     |
| simple_pose_resnet18_v1b    | 52.8/83.6/57.9 | 54.5/84.8/60.3     |
| simple_pose_resnet50_v1b                     | 71.0/91.2/78.6 | 72.2/92.2/79.9     |
| simple_pose_resnet50_v1d                   | 71.6/91.3/78.7 | 73.3/92.4/80.8     |
| simple_pose_resnet101_v1b                   | 72.4/92.2/79.8 | 73.7/92.3/81.1     |
| simple_pose_resnet101_v1d                  | 73.0/92.2/80.8 | 74.2/92.4/82.0     |
| simple_pose_resnet152_v1b                   | 72.4/92.1/79.6 | 74.2/92.3/82.1     |
| simple_pose_resnet152_v1d                   | 73.4/92.3/80.7 | 74.6/93.4/82.1     |
| simple_pose_resnet152_v1d  | 74.8/92.3/82.0 | 76.1/92.4/83.2     | 2f544338 |


Feature Pyramid Network for Faster/Mask-RCNN

| Model | bbox/seg mAP | Caffe bbox/seg |
|--------------------------------------------------|----------------|--------------------|

90.99

ssd_300_vgg16_atrous_voc* | VOC | 224 | 21.55 | 31.47 | 1.46 | 77.4 | 77.46
ssd_512_vgg16_atrous_voc* | VOC | 224 | 7.63 | 11.69 | 1.53 | 78.41 | 78.39
ssd_512_resnet50_v1_voc* | VOC | 224 | 17.81 | 34.55 | 1.94 | 80.21 | 80.16

75.04

**\*nms_thresh=0.45, nms_topk=200**

![](https://user-images.githubusercontent.com/17897736/54540947-dc08c480-49d3-11e9-9a0d-a97d44f9792c.png)

Useage of `int8` quantized model is identical to standard GluonCV models, simple use suffix `_int8`.
For example, use `resnet50_v1_int8` as `int8` quantized version of `resnet50_v1`.

Pruned ResNet

https://gluon-cv.mxnet.io/model_zoo/classification.htmlpruned-resnet

Pruning channels of convolution layers is an very effective way to reduce model redundency which aims to speed up inference without sacrificing significant accuracy. GluonCV 0.4 has included several pruned resnets from original GluonCV  SoTA ResNets for ImageNet.

| Model             | Top-1 | Top-5 | Hashtag  | Speedup (to original ResNet) |
|-------------------|-------|-------|----------|------------------------------|
| resnet18_v1b_0.89 | 67.2  | 87.45 | 54f7742b | 2x                           |
| resnet50_v1d_0.86 | 78.02 | 93.82 | a230c33f | 1.68x                        |
|  resnet50_v1d_0.48    | 74.66 | 92.34 | 0d3e69bb | 3.3x  |
|  resnet50_v1d_0.37    | 70.71 | 89.74 | 9982ae49 | 5.01x    |
|  resnet50_v1d_0.11    | 63.22 | 84.79 | 6a25eece | 8.78x      |
|  resnet101_v1d_0.76   | 79.46 | 94.69 | a872796b | 1.8x          |
|   resnet101_v1d_0.73   | 78.89 | 94.48 | 712fccb1 | 2.02x     |

Scripts for pruning resnets will be release in the future.

More GANs(thanks husonchen)

SRGAN

![](https://github.com/dmlc/gluon-cv/blob/master/scripts/gan/srgan/pred.png?raw=true)

A GluonCV SRGAN of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/srgan

CycleGAN

![teaser](https://user-images.githubusercontent.com/3307514/54579701-efab2f80-49c1-11e9-8a90-e9170f21dc8a.jpg)

Image-to-Image translation reproduced in GluonCV: https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/cycle_gan

Residual Attention Network(thanks PistonY)

28.6

\* Using Ground-Truth person detection results

Int8 Quantization with Intel Deep Learning boost

GluonCV is now integrated with Intel's vector neural network instruction(vnni) to accelerate model inference speed.
**Note that you will need a capable Intel Skylake CPU to see proper speed up ratio.**

Model | Dataset | Batch Size | C5.18x FP32 | C5.18x INT8 | Speedup | FP32 Acc | INT8 Acc
-- | -- | -- | -- | -- | -- | -- | --
resnet50_v1 | ImageNet | 128 | 122.02 | 276.72 | 2.27 | 77.21%/93.55% | 76.86%/93.46%

0.398


      

0.393


      

0.379


      

0.364

Bug fixes and Improvements

- Now all resnet definitions in GluonCV support Synchronized BatchNorm
- Now pretrained object detection models support `reset_class` for reuse partial category knowledge so some task may not need to finetune models anymore: https://gluon-cv.mxnet.io/build/examples_detection/skip_fintune.htmlsphx-glr-build-examples-detection-skip-fintune-py
- Fix some dataloader issue(need mxnet >= 1.4.0)
- Fix some segmentation models that won't hybridize
- Fix some detection model random Nan problems (require mxnet latest nightly build, >= 20190315)
- Various other minor bug fixes

0.345


      

0.4

| Model                     | Metric | 0.4 |
|---------------------------|--------|-----|
| simple_pose_resnet152_v1b |  OKS AP*   |   74.2  |
| simple_pose_resnet50_v1b |    OKS AP*    |  72.2   |
| ResNext50_32x4d           |  ImageNet Top-1  |  79.32   |
| ResNext101_64x4d          |  ImageNet Top-1   |  80.69  |
| SE_ResNext101_32x4d       |  ImageNet Top-1   |  79.95   |
| SE_ResNext101_64x4d       |  ImageNet Top-1  |  81.01   |

0.4.0

Highlights

0.3

Highlights

Added 5 new algorithms and updated 38 pre-trained models with improved accuracy
Compare 7 selected models

| Model               | Metric                | 0.2    | 0.3    | Reference                                                    |
| ------------------- | --------------------- | ------ | ------ | ------------------------------------------------------------ |
| [ResNet-50](https://gluon-cv.mxnet.io/model_zoo/classification.htmlresnet)           | top-1 acc on ImageNet | 77.07% | **79.15%** | 75.3% ([Caffe impl](https://github.com/KaimingHe/deep-residual-networks)) |
| [ResNet-101](https://gluon-cv.mxnet.io/model_zoo/classification.htmlresnet)           | top-1 acc on ImageNet | 78.81% | **80.51%** | 76.4% ([Caffe impl](https://github.com/KaimingHe/deep-residual-networks)) |
| [MobileNet 1.0](https://gluon-cv.mxnet.io/model_zoo/classification.htmlmobilenet) | top-1 acc on ImageNet |  N/A | **73.28%** | 70.9% ([tensorflow impl)](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md) |
| [Faster-RCNN](https://gluon-cv.mxnet.io/model_zoo/detection.htmlid37)         | mAP on COCO           | N/A    | **40.1%**  | 39.6% ([Detectron](https://github.com/facebookresearch/Detectron)) |
| [Yolo-v3](https://gluon-cv.mxnet.io/model_zoo/detection.htmlid44)             | mAP on COCO           | N/A    | **37.0%**  | 33.0% ([paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf)) |

0.3.0


      

0.2

Image Classification
Highlight: [Much more accurate pre-trained ResNet models on ImageNet classification](https://gluon-cv.mxnet.io/model_zoo/index.htmlimage-classification)

These high accuracy models are updated to [Gluon Model Zoo](https://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html).

- ResNet50 v1b achieves over 77% accuracy, ResNet101 v1b at 78.8%, and ResNet152 v1b over 79%.
- Training with large batchsize, with float16 data type
- Speeding up training with ImageRecordIter interface
- [ResNeXt for ImageNet and CIFAR10 classification](resnext)
- SE-ResNet(v1b) for ImageNet

Object Detection
Highlight: Faster-RCNN model with training/testing scripts

- Faster-RCNN
- RPN (region proposal network)
- Region Proposal
- ROI Align operator

- Train SSD on COCO dataset

Semantic Segmentation
Highlight: PSPNet for Semantic Segmentation
- PSPNet
- [ResNetV1b for ImageNet classification and Semantic Segmentation](resnetv1b)
- Network `dilation` is an option

Datasets
Added the following datasets and usage tutorials
- MS COCO
- ADE20k

[New Pre-trained Models in GluonCV](https://gluon-cv.mxnet.io/model_zoo/index.html)

- cifar_resnext29_16x64d
- resnet{18|34|50|101}_v1b
- ssd_512_mobilenet1.0_voc
- faster_rcnn_resnet50_v2a_voc
- ssd_300_vgg16_atrous_coco
- ssd_512_vgg16_atrous_coco
- ssd_512_resnet50_v1_coco
- psp_resnet50_ade

Breaking changes
- Rename `DilatedResnetV0` to `ResNetV1b`

0.2.0

Gluon CV Toolkit v0.2 Release Notes

**Note: This release rely on some features of mxnet 1.3.0. You can early access these features by installing nightly build of mxnet.**

You can update mxnet with pip:

bash
pip install mxnet --upgrade --pre
or
pip install mxnet-cu90 --upgrade --pre

0.1

Gluon CV Toolkit v0.1 Release Notes

GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It is designed for helping engineers, researchers, and students to quickly prototype products, validate new ideas, and learning computer vision.

Table of Contents
- New Features
- Tutorials
- Image Classification (CIFAR + ImageNet demo + divedeep)
- Object Detection (SSD demo + train + divedeep)
- Semantic Segmentation (FCN demo + train)

- Model Zoo
- ResNet on ImageNet and CIFAR-10
- SSD on VOC
- FCN on VOC
- Dilated ResNet
- Training Scripts
- Image Classification:
Train ResNet on ImageNet and CIFAR-10, including Mix-Up training
- Object Detection:
Train SSD on PASCAL VOC
- Semantic Segmentation
Train FCN on PASCAL VOC
- Util functions
- Image Visualization:
- plot_image
- get_color_pallete for segmentation
- Bounding Box Visualization
- plot_bbox
- Training Helpers
- PolyLRScheduler