Forward pass performance (for one image) is quite slow. Concerns mxnet 0.11.0


I trained a Densenet from scratch (net = ) and I am satisfied with the @topk accuracy on the test set. However, when I measure the speed for one forward pass (one image sized (224,224)) I am not able to go lower than about 200ms/image. I know from my measurements that the performance significantly is influenced by the function .asnumpy() when doing the predictions. I understood that NDarray computations are async and .asnumpy() includes waiting the computation be done.

My question is if this 200ms are far off with an Intel® Xeon® CPU E5-2620 v3 @ 2.40GHz or if this is the performance I can expect. Any experience in this direction?


You probably need to warm up the engine and average inference time.
If you are still getting the similar results, checkout the build, e.g. BLAS(openblas, mkl…)


Agree, MKL will be accelerated a lot!