Cannot predict with context = gpu()

I’ve trained a simple net with batch normalization before activations but when I try to predict with context = gpu() I get the following error:

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1972, in asnumpy

ctypes.c_size_t(data.size)))

File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/base.py", line 252, in check_call

raise MXNetError(py_str(_LIB.MXGetLastError()))

mxnet.base.MXNetError: [15:34:32] src/operator/nn/./cudnn/cudnn_batch_norm-inl.h:157: Check failed: e == CUDNN_STATUS_SUCCESS (9 vs. 0) cuDNN: CUDNN_STATUS_NOT_SUPPORTED

Stack trace returned 10 entries:

[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x36161a) [0x7fcc1f92461a]

[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x361c31) [0x7fcc1f924c31]

[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x335345f) [0x7fcc2291645f]

[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x33571b7) [0x7fcc2291a1b7]

[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a6fe6f) [0x7fcc22032e6f]

[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a76aec) [0x7fcc22039aec]

[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a55354) [0x7fcc22018354]

[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a59623) [0x7fcc2201c623]

[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a59876) [0x7fcc2201c876]

[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2a55a64) [0x7fcc22018a64]

I’m calling the model with:
dataiter = mx.io.NDArrayIter(data=data, label=label, batch_size=data.shape[0])
model = mx.module.Module.load(prefix=modelPrefix, epoch=epoch, context = mx.gpu())
model.bind(data_shapes=dataiter.provide_data,label_shapes=dataiter.provide_label)
preds = model.predict(dataiter)

This error points to an unsupported use of CUDNN in the operator implementation. Would you be able to simplify your network to figure out which operator is causing this error?

Do you mean work backwards, pruning layers? I assumed it was one of the batch_norms because of the “src/operator/nn/./cudnn/cudnn_batch_norm-inl.h:157” in the error output.

Doesn’t it seem odd that I trained this on an Amazon deep learning ami using a gpu with no problems, but using the same set up predict bombs?

Sorry I didn’t read the error very carefully. You’re right that it appears that the error is from batch_norm. Are data sizes or batch sizes any different between train and predict?

Yes they are… The model is a deep matrix factorization so I’m predicting just iterating over the row,column coordinates.

Without having access to your code, I’m just hypothesizing that somehow the change in batch-size is causing this issue. Did you try keeping the batch-size the same or just not equal to 1? (not as a solution, but as a debugging step).

I am having the same issue. It is happening when the model was trained with batchsize=64, but I try to predict with batchsize equal to input samples length.