Many thanks for raising this issue. Could you provide a few more details about how you added quantisation? And to confirm, you’re seeing inference time for a single sample double when you add quantisation? What changed from when you had a x2 speedup with quantisation? Or does it have very high variance?
Yes, I got double speed up by a signle run; but it might just be the first time i ran all the resource was not loaded, and the quantization one had everything ready to go.
And by recent test, I had twice the run time for int8 quantization then fp32 model. If you need a model for testing, i can upload one for comparison, including orginial fp32 model and int8 quantized.
Can you please share a quantized model for testing?
For reasons unknown to me, the quantized mobilenet model predicts 4 times longer than the standard model.