I trained a Densenet from scratch (net = mx.gluon.model_zoo.vision.densenet121(classes=53) ) and I am satisfied with the @topk accuracy on the test set. However, when I measure the speed for one forward pass (one image sized (224,224)) I am not able to go lower than about 200ms/image. I know from my measurements that the performance significantly is influenced by the function .asnumpy() when doing the predictions. I understood that NDarray computations are async and .asnumpy() includes waiting the computation be done.
My question is if this 200ms are far off with an Intel® Xeon® CPU E5-2620 v3 @ 2.40GHz or if this is the performance I can expect. Any experience in this direction?