I wrote a program using MXNet, which ran fine on CPU but started throwing the following errors on GPU:
File "/home/ubuntu/.local/lib/python3.5/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/home/ubuntu/.local/lib/python3.5/site-packages/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:42:11] src/operator/./cudnn_convolution-inl.h:392: Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) cuDNN: CUDNN_STATUS_BAD_PARAM
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x272c4c) [0x7ff356fbac4c]
[bt] (1) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x360d2d2) [0x7ff35a3552d2]
[bt] (2) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x36019cd) [0x7ff35a3499cd]
[bt] (3) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2374fe4) [0x7ff3590bcfe4]
[bt] (4) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2300218) [0x7ff359048218]
[bt] (5) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x341955) [0x7ff357089955]
[bt] (6) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x21395c8) [0x7ff358e815c8]
[bt] (7) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x213dfd8) [0x7ff358e85fd8]
[bt] (8) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2094541) [0x7ff358ddc541]
[bt] (9) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x63) [0x7ff358ddc8e3]
Here is my output from nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 45C P0 74W / 149W | 0MiB / 11439MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I found a similar issue on the Github: https://github.com/NVIDIA/DIGITS/issues/258
I’m using python 3.5, cuda 8.0, mxnet 0.12.0 on ubuntu 16.04 LTS. Could anyone point how to get around with this error? Thanks.