Improve the speed of evaluating the metric


As far as I know, the metric in mxnet is implemented in python, which is a little time consuming as a result of the numpy operation (Usually, it is necessary to transfer the ndarray into numpy) in CPU.
I just would like to know if it is possible to implement it using GPU or c++ instead.


Yes that’s correct. You can track that issue here:

My experience has been that you can get tremendous gain by having a non-blocking accuracy computed on the GPU. Depending on your metric it can be straight forward (accuracy) or more complicated.

Here is an example for the accuracy. Only the return statement is blocking. Beware though that if your testing set is large, you could get a memory allocation error. Indeed the “copy to gpu” instructions are enqueued on the backend with no upstream dependency.

import mxnet as mx
from mxnet import nd

ctx = mx.gpu()

def evaluate_accuracy(data_iterator, net):
    metric = nd.zeros(1, ctx)
    num_instance = nd.zeros(1, ctx)
    for data, label in data_iterator:
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        metric += (predictions == label).sum()
        num_instance += data.shape[0]
    return float(metric.asscalar()) / float(num_instance.asscalar())