How is the support for quantization in MXNet?

yxchng · April 9, 2019, 8:22am

How is the support for quantization in MXNet? I am trying to find a better framework than TensorLite because its support is abysmal.

Are the quantization methods available here (https://github.com/apache/incubator-mxnet/tree/master/example/quantization) only for research and experimentation purpose? Or will it lead to actual speed increase when deployed in mobile? There is no time comparison made so I presume it is just for research and experimentation?

ThomasDelteil · April 9, 2019, 11:45pm

Hi @yxchng,

For quantization:

There is good support for fp16 https://mxnet.incubator.apache.org/versions/master/faq/float16.html and there is current work to enable Nvidia Automatic Mixed Precision: https://developer.nvidia.com/automatic-mixed-precision in the next releases
There is some support for int8 quantization, have a look at the gluon-cv tutorial for int8 inference in on CPU thanks to MKLDNN https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html up to 2.7x speedup with minimal accuracy lost

For faster performance, you can also look at pruning, we released some pruned resnet in the gluon-cv model zoo in the latest release: https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet you can see up to 8x speedup with some accuracy lost.

yxchng · April 10, 2019, 12:10am

What about performance on mobile not PC? @ThomasDelteil

ThomasDelteil · April 10, 2019, 12:15am

Pruning will lead to performance improvement on PC and mobile, quantization support for mobile performance improvement (non intel CPU or nvidia GPU) is not available at the moment as far as I know. However I would recommend looking at deep learning compiler like TVM https://tvm.ai/about for compiling your model to your specific platform (arm CPU for example).

ThomasDelteil · April 17, 2019, 8:10am

You can read about int8 quantization for intel CPU here: https://medium.com/apache-mxnet/model-quantization-for-production-level-neural-network-inference-f54462ebba05

yxchng · April 17, 2019, 9:15am

@ThomasDelteil Thanks. I am actually only concerned with the performance on mobile. There is no point in optimizing performance for PC because compute is easily available.

Topic		Replies	Views
Huge performance decrease by quantization Performance	4	987	June 4, 2019
Quantization questions Performance	9	1898	June 24, 2019
Mxnet 1.3.1: speed/performance differences between the mxnet gluon and module/symbol APIs of at least a factor of 2 Performance	11	1378	February 27, 2019
MXNET_SUBGRAPH_BACKEND=MKLDNN performance issue Performance	4	925	July 31, 2019
Support TensorRT in MXNET Performance	2	988	July 24, 2019

How is the support for quantization in MXNet?

Related Topics