How is the support for quantization in MXNet?

#1

How is the support for quantization in MXNet? I am trying to find a better framework than TensorLite because its support is abysmal.

Are the quantization methods available here (https://github.com/apache/incubator-mxnet/tree/master/example/quantization) only for research and experimentation purpose? Or will it lead to actual speed increase when deployed in mobile? There is no time comparison made so I presume it is just for research and experimentation?

#2

Hi @yxchng,

For quantization:

For faster performance, you can also look at pruning, we released some pruned resnet in the gluon-cv model zoo in the latest release: https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet you can see up to 8x speedup with some accuracy lost.

#3

What about performance on mobile not PC? @ThomasDelteil

#4

Pruning will lead to performance improvement on PC and mobile, quantization support for mobile performance improvement (non intel CPU or nvidia GPU) is not available at the moment as far as I know. However I would recommend looking at deep learning compiler like TVM https://tvm.ai/about for compiling your model to your specific platform (arm CPU for example).

#5

You can read about int8 quantization for intel CPU here: https://medium.com/apache-mxnet/model-quantization-for-production-level-neural-network-inference-f54462ebba05

#6

@ThomasDelteil Thanks. I am actually only concerned with the performance on mobile. There is no point in optimizing performance for PC because compute is easily available.