Hello MxNet’ers!
I am trying to adopt single gpu example https://gluon.mxnet.io/chapter04_convolutional-neural-networks/cnn-gluon.html
to run on multi-gpu machine. I am stuck with properly defining evaluate_accuracy
function. My goal is to split load training data once into GPUs and then evaluate accuracy w/o repeated loading the test dataset. Here is what I have so far:
define context:
ctx = [mx.gpu(i) for i in range(num_gpus)]
load test data into
def transform(data, label):
return nd.transpose(data.astype(np.float32), (2,0,1))/255, label.astype(np.float32)test_data = gluon.data.DataLoader(gluon.data.vision.MNIST(train=False, transform=transform),
batch_size, shuffle=False, num_workers=4)for data, label in test_data:
data_list = gluon.utils.split_and_load(data, ctx)
label_list = gluon.utils.split_and_load(label, ctx)Score on the trained net and try get accuracy numbers from each GPU:
acc = [mx.metric.Accuracy() for i in range(num_gpus)]
for i, (data, label) in enumerate(zip(data_list, label_list)):
data = data.as_in_context(mx.gpu(i))
label = label.as_in_context(mx.gpu(i))
predictions = nd.argmax(net(data), axis=1)
acc[i].update(preds=predictions, labels=label)
acc[i].get()[1]
print(acc[0].get()[1], acc[1].get()[1], acc[2].get()[1], acc[3].get()[1])Output:
1.0 1.0 1.0 1.0
I don’t like that I have to calculate predictions sequentially and not sure if my for loop
is entirely correct. Appreciate any insights.