Using gluon/image_classification.py img/sec speed up when metric update and reset when turned off

I am working with the example/gluon/image_classification.py script in the apache incubator mxnet github and was tinkering with the script. I have observed that when I run the following command :

python image_classification.py --model resnet50_v1 --dataset dummy --epochs 1 --log-interval 1 --batch-size 64 --gpu 0

I get a img/sec value of 323.0 img/sec on a Tesla V100 gpu

The same command if I run with the metrics computations turned off ( lines 212,229) I get about 1400 img/sec.

Why am I seeing this speed up ? Why does turning off the metric computations result in such a massive increase in img/sec

Environment :
miniconda environment with mxnet-cu101 , cuda/10.1 , cudnn/7.4, python 2.7.15
mxnet was pip installed
Tests run on a Tesla V100 gpu

The speedup is kind of expected. Metric.update converts the MXNet Ndarrays into Numpy, which causes the slowdown. You can have a look here on the code: https://mxnet.incubator.apache.org/_modules/mxnet/metric.html#CustomMetric.update It is calling asnumpy() which is a blocking call. There is also a Github issue related to it: https://github.com/apache/incubator-mxnet/issues/9571