CUDA: Unspecified launch failure

During training, my program suddenly failed raising the following error:

INFO:root:Epoch[18] Batch [17568]	Speed: 50177.37 samples/sec	SumMetric=706.347520
INFO:root:Epoch[18] Batch [18056]	Speed: 52638.94 samples/sec .     SumMetric=704.075347
INFO:root:Epoch[18] Batch [18544]	Speed: 52356.88 samples/sec	SumMetric=709.324801
[09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x29789c8) [0x7f2ba366b9c8]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x295abe6) [0x7f2ba364dbe6]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x170be60) [0x7f2ba23fee60]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x330c23) [0x7f2ba1023c23]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15775ad) [0x7f2ba226a5ad]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157d763) [0x7f2ba2270763]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157d966) [0x7f2ba2270966]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]

[09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15aa688) [0x7f2ba229d688]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x173135e) [0x7f2ba242435e]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17146ec) [0x7f2ba24076ec]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x330c23) [0x7f2ba1023c23]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15775ad) [0x7f2ba226a5ad]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157d763) [0x7f2ba2270763]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157d966) [0x7f2ba2270966]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]

[09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15aa688) [0x7f2ba229d688]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1595218) [0x7f2ba2288218]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15775ad) [0x7f2ba226a5ad]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b6e3) [0x7f2ba226e6e3]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b8e6) [0x7f2ba226e8e6]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2c58c506ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c589863dd]

[09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [09:51:00] src/engine/./threaded_engine.h:347: [09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15aa688) [0x7f2ba229d688]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1595218) [0x7f2ba2288218]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15775ad) [0x7f2ba226a5ad]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b6e3) [0x7f2ba226e6e3]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b8e6) [0x7f2ba226e8e6]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2c58c506ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c589863dd]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1577854) [0x7f2ba226a854]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b6e3) [0x7f2ba226e6e3]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b8e6) [0x7f2ba226e8e6]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2c58c506ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c589863dd]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [09:51:00] src/engine/./threaded_engine.h:347: [09:51:00] /home/travis/build/dmlc/mxnet-distro/mxnet-build/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15aa688) [0x7f2ba229d688]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1595218) [0x7f2ba2288218]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x15775ad) [0x7f2ba226a5ad]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b6e3) [0x7f2ba226e6e3]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b8e6) [0x7f2ba226e8e6]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2c58c506ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c589863dd]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x23e38c) [0x7f2ba0f3138c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1577854) [0x7f2ba226a854]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b6e3) [0x7f2ba226e6e3]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x157b8e6) [0x7f2ba226e8e6]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x1578b4b) [0x7f2ba226bb4b]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2c53c5ec80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2c58c506ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2c589863dd]
1 Like

Have you figured out what happened here?

The current conjecture is that it was the transpose operator not being registered. But regardless, shouldn’t there be a clearer debug message? I haven’t seen this error since moving to mxnet 0.12.0

@madjam This occurred again with mxnet 0.11.1, I think it some what confirms what the issue was

Actually I managed to decompose the issue -

If i do

pip3 list
mxnet (0.12.0, /home/ubuntu/dev-repos/mxnet/python)
mxnet-cu80 (0.11.0)

I think Im getting this error because when I try to install from source (using pip) the mxnet-cu80 install isn’t upgraded. Maybe that’s it? I still get this error periodically @madjam @leopd @reminisce

Here’s the entire error:

[01:24:15] /home/ubuntu/dev-repos/mxnet/dmlc-core/include/dmlc/./logging.h:308: [01:24:15] /home/ubuntu/dev-repos/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3cpuENS_3gpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS2_EE+0x1f8) [0x7f5d796877e8]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3gpuENS2_3cpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x2f4e) [0x7f5d79666f2e]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(+0x27f5051) [0x7f5d78335051]
[bt] (4) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3_+0x4b) [0x7f5d78184c2b]
[bt] (5) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f5d786c2903]
[bt] (6) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786ca89b]
[bt] (7) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (8) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]

[01:24:15] /home/ubuntu/dev-repos/mxnet/dmlc-core/include/dmlc/./logging.h:308: [01:24:15] /home/ubuntu/dev-repos/mxnet/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 9 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f5d782cb418]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2c06bb0) [0x7f5d78746bb0]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f5d786c2903]
[bt] (4) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786cacfb]
[bt] (5) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5d970673dd]

[01:24:15] /home/ubuntu/dev-repos/mxnet/dmlc-core/include/dmlc/./logging.h:308: [01:24:15] src/engine/./threaded_engine.h:370: [01:24:15] /home/ubuntu/dev-repos/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3cpuENS_3gpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS2_EE+0x1f8) [0x7f5d796877e8]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3gpuENS2_3cpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x2f4e) [0x7f5d79666f2e]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(+0x27f5051) [0x7f5d78335051]
[bt] (4) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3_+0x4b) [0x7f5d78184c2b]
[bt] (5) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f5d786c2903]
[bt] (6) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786ca89b]
[bt] (7) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (8) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x332) [0x7f5d786c2ba2]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786ca89b]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5d970673dd]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [01:24:15] src/engine/./threaded_engine.h:370: [01:24:15] /home/ubuntu/dev-repos/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3cpuENS_3gpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS2_EE+0x1f8) [0x7f5d796877e8]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3gpuENS2_3cpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x2f4e) [0x7f5d79666f2e]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(+0x27f5051) [0x7f5d78335051]
[bt] (4) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3_+0x4b) [0x7f5d78184c2b]
[bt] (5) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f5d786c2903]
[bt] (6) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786ca89b]
[bt] (7) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (8) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d7626fb5c]
[bt] (1) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x332) [0x7f5d786c2ba2]
[bt] (2) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5_+0x13b) [0x7f5d786ca89b]
[bt] (3) /home/ubuntu/dev-repos/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f5d786c4d0a]
[bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5d91a3cc80]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5d973316ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5d970673dd]

Very likely due to invalid driver problem. You can test it with a minimal script and see if GPU works well.

I doubt it. Other experiments seem to run fine?

Can you post a minimal script to reproduce the error?

Not externally Im afraid. The error is “random” - in the sense that I cannot deterministically reproduce it

@zhreshold Do you have a way to check this? Its still happening with Python 3 and MXNet 0.12.0

Same problem with cuda8.0, mxnet 1.0 and windows server 2016. I have tested train_cifar10.py, it works well. But when I use a complex model including mx.sym.{arange, slice, tile,take, concat,…}, it will throw unspecified launch failure. The strange thing is that it appears after different iterations, looks like random. It would be helpful if someone can provide some suggestions on how to debug such errors. @zhreshold

1 Like

I have same problem with mxnet 1.1
train_imagenet.py from examples. It works with mxnet 1.0 for resnet50 and resnext50, with mxnet 1.1 for resnet50, but not for resnext50
(Windows 10 x64 1709, GTX1070 dirver 390.65 , CUDA 8.0, late CUDA 9.1 with same result)

d:\work\cpp\mxnet\example\image-classification>python train_imagenet.py --model-prefix=netC/resnext50 --network=resnext --num-classes=12 --num-examples=15048 --gpus=0 --batch-size=32 --num-epochs=100 --data-train=d:/work/cpp/mnist/new/cpeople_train.rec --data-train-idx=d:/work/cpp/mnist/new/cpeople_train.idx --data-val=d:/work/cpp/mnist/new/cpeople_val.rec --data-val-idx=d:/work/cpp/mnist/new/cpeople_val.idx
INFO:root:start with arguments Namespace(batch_size=32, benchmark=0, data_nthreads=4, data_train='d:/work/cpp/mnist/new/cpeople_train.rec', data_train_idx='d:/work/cpp/mnist/new/cpeople_train.idx', data_val='d:/work/cpp/mnist/new/cpeople_val.rec', data_val_idx='d:/work/cpp/mnist/new/cpeople_val.idx', disp_batches=20, dtype='float32', gc_threshold=0.5, gc_type='none', gpus='0', image_shape='3,224,224', initializer='default', kv_store='device', load_epoch=None, lr=0.1, lr_factor=0.1, lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix='netC/resnext50', mom=0.9, monitor=0, network='resnext', num_classes=12, num_epochs=100, num_examples=15048, num_layers=50, optimizer='sgd', pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
[13:05:46] D:\work\cpp\mxnet\src\io\iter_image_recordio_2.cc:170: ImageRecordIOParser2: d:/work/cpp/mnist/new/cpeople_train.rec, use 1 threads for decoding..
[13:05:49] D:\work\cpp\mxnet\src\io\iter_image_recordio_2.cc:170: ImageRecordIOParser2: d:/work/cpp/mnist/new/cpeople_val.rec, use 1 threads for decoding..
[13:05:52] d:\work\cpp\mxnet\src\operator\nn\cudnn\./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [20]   Speed: 66.71 samples/sec        accuracy=0.180060
...
INFO:root:Epoch[1] Batch [320]  Speed: 66.28 samples/sec        accuracy=0.401562
Traceback (most recent call last):
  File "train_imagenet.py", line 58, in <module>
    fit.fit(args, sym, data.get_rec_iter)
  File "d:\work\cpp\mxnet\example\image-classification\common\fit.py", line 285, in fit
    monitor=monitor)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\module\base_module.py", line 496, in fit
    self.update_metric(eval_metric, data_batch.label)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\module\module.py", line 749, in update_metric
    self._exec_group.update_metric(eval_metric, labels)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\module\executor_group.py", line 616, in update_metric
    eval_metric.update_dict(labels_, preds)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\metric.py", line 280, in update_dict
    metric.update_dict(labels, preds)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\metric.py", line 108, in update_dict
    self.update(label, pred)
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\metric.py", line 394, in update
    pred_label = pred_label.asnumpy().astype('int32')
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\ndarray\ndarray.py", line 1801, in asnumpy
    ctypes.c_size_t(data.size)))
  File "C:\Anaconda3\lib\site-packages\mxnet-1.1.0-py3.6.egg\mxnet\base.py", line 148, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [13:12:40] d:\work\cpp\mxnet\mshadow\mshadow\./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unknown error

@zhreshold This seems to be a prevelant issue, any clues?

I have this problem with versons 1.0.1 and 1.1.0, but in 1.0.0 is ok. No matter CUDA 8 or 9.1
May be mshadow is blame? I have downgraded my mxnet to 1.0.0 for now

Maybe there are more than one process that occupies CUDA resources? For any unspecified or unknown CUDA error, there’s not to much we can help diagnose without log and reproducible code.

How exactly do we get you a log? If you chime me, I can give you the code for the error -

Dhruv

:frowning: also getting it - mxnet 1.3, windows 10, nvidia GTX1060, 9.1 drivers.

C:\Users\johnl\Documents\GitHub\gluon-cv\scripts\detection\yolo>python train_yolo3.py --no-random-shape --gpus 0 --batch-size 8
[00:28:05] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\mkldnn\mkldnn_base.cc:74: Allocate 22151168 bytes with malloc directly
[00:28:05] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\mkldnn\mkldnn_base.cc:74: Allocate 73728 bytes with malloc directly
INFO:root:Namespace(batch_size=8, data_shape=416, dataset=‘voc’, epochs=200, gpus=‘0’, label_smooth=False, log_interval=100, lr=0.001, lr_decay=0.1, lr_decay_epoch=‘160,180’, lr_decay_period=0, lr_mode=‘step’, mixup=False, momentum=0.9, network=‘darknet53’, no_mixup_epochs=20, no_random_shape=True, no_wd=False, num_samples=16551, num_workers=4, resume=’’, save_interval=10, save_prefix=‘yolo3_darknet53_voc’, seed=233, start_epoch=0, syncbn=False, val_interval=1, warmup_epochs=0, warmup_lr=0.0, wd=0.0005)
INFO:root:Start training from [Epoch 0]
[00:28:50] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\cudnn./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while… (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:[Epoch 0][Batch 99], LR: 1.00E-03, Speed: 11.693 samples/sec, ObjLoss=8100.137, BoxCenterLoss=8.024, BoxScaleLoss=9.633, ClassLoss=41.770
INFO:root:[Epoch 0][Batch 199], LR: 1.00E-03, Speed: 11.171 samples/sec, ObjLoss=8101.524, BoxCenterLoss=8.041, BoxScaleLoss=9.467, ClassLoss=42.181
INFO:root:[Epoch 0][Batch 299], LR: 1.00E-03, Speed: 11.553 samples/sec, ObjLoss=8099.886, BoxCenterLoss=8.120, BoxScaleLoss=9.518, ClassLoss=42.519
INFO:root:[Epoch 0][Batch 399], LR: 1.00E-03, Speed: 11.438 samples/sec, ObjLoss=8098.142, BoxCenterLoss=7.978, BoxScaleLoss=9.441, ClassLoss=41.729
INFO:root:[Epoch 0][Batch 499], LR: 1.00E-03, Speed: 11.467 samples/sec, ObjLoss=8097.496, BoxCenterLoss=8.178, BoxScaleLoss=9.660, ClassLoss=42.627
INFO:root:[Epoch 0][Batch 599], LR: 1.00E-03, Speed: 11.465 samples/sec, ObjLoss=8096.822, BoxCenterLoss=8.176, BoxScaleLoss=9.637, ClassLoss=42.616
INFO:root:[Epoch 0][Batch 699], LR: 1.00E-03, Speed: 11.627 samples/sec, ObjLoss=8096.308, BoxCenterLoss=8.173, BoxScaleLoss=9.642, ClassLoss=42.540
INFO:root:[Epoch 0][Batch 799], LR: 1.00E-03, Speed: 11.603 samples/sec, ObjLoss=8095.712, BoxCenterLoss=8.163, BoxScaleLoss=9.626, ClassLoss=42.456
INFO:root:[Epoch 0][Batch 899], LR: 1.00E-03, Speed: 11.248 samples/sec, ObjLoss=8095.576, BoxCenterLoss=8.118, BoxScaleLoss=9.581, ClassLoss=42.351
INFO:root:[Epoch 0][Batch 999], LR: 1.00E-03, Speed: 11.079 samples/sec, ObjLoss=8094.974, BoxCenterLoss=8.088, BoxScaleLoss=9.544, ClassLoss=42.225
INFO:root:[Epoch 0][Batch 1099], LR: 1.00E-03, Speed: 11.410 samples/sec, ObjLoss=8094.584, BoxCenterLoss=8.040, BoxScaleLoss=9.503, ClassLoss=42.030
INFO:root:[Epoch 0][Batch 1199], LR: 1.00E-03, Speed: 11.199 samples/sec, ObjLoss=8094.232, BoxCenterLoss=8.042, BoxScaleLoss=9.496, ClassLoss=41.979
INFO:root:[Epoch 0][Batch 1299], LR: 1.00E-03, Speed: 11.185 samples/sec, ObjLoss=8093.770, BoxCenterLoss=7.998, BoxScaleLoss=9.456, ClassLoss=41.770
INFO:root:[Epoch 0][Batch 1399], LR: 1.00E-03, Speed: 10.628 samples/sec, ObjLoss=8093.314, BoxCenterLoss=7.992, BoxScaleLoss=9.447, ClassLoss=41.751
INFO:root:[Epoch 0][Batch 1499], LR: 1.00E-03, Speed: 11.262 samples/sec, ObjLoss=8092.719, BoxCenterLoss=8.008, BoxScaleLoss=9.460, ClassLoss=41.830
INFO:root:[Epoch 0][Batch 1599], LR: 1.00E-03, Speed: 11.407 samples/sec, ObjLoss=8092.281, BoxCenterLoss=8.008, BoxScaleLoss=9.469, ClassLoss=41.886
INFO:root:[Epoch 0][Batch 1699], LR: 1.00E-03, Speed: 10.847 samples/sec, ObjLoss=8091.926, BoxCenterLoss=7.978, BoxScaleLoss=9.447, ClassLoss=41.755
INFO:root:[Epoch 0][Batch 1799], LR: 1.00E-03, Speed: 11.366 samples/sec, ObjLoss=8091.376, BoxCenterLoss=7.978, BoxScaleLoss=9.435, ClassLoss=41.740
INFO:root:[Epoch 0][Batch 1899], LR: 1.00E-03, Speed: 10.905 samples/sec, ObjLoss=8090.827, BoxCenterLoss=7.980, BoxScaleLoss=9.428, ClassLoss=41.762
INFO:root:[Epoch 0][Batch 1999], LR: 1.00E-03, Speed: 11.290 samples/sec, ObjLoss=8090.339, BoxCenterLoss=7.982, BoxScaleLoss=9.426, ClassLoss=41.803
INFO:root:[Epoch 0] Training cost: 1503.565, ObjLoss=8089.980, BoxCenterLoss=7.989, BoxScaleLoss=9.429, ClassLoss=41.820
Traceback (most recent call last):
File “train_yolo3.py”, line 330, in
train(net, train_data, val_data, eval_metric, ctx, args)
File “train_yolo3.py”, line 287, in train
map_name, mean_ap = validate(net, val_data, ctx, eval_metric)
File “train_yolo3.py”, line 174, in validate
eval_metric.update(det_bboxes, det_ids, det_scores, gt_bboxes, gt_ids, gt_difficults)
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\gluoncv\utils\metrics\voc_detection.py”, line 107, in update
gt_bboxes, gt_labels, gt_difficults]]):
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\gluoncv\utils\metrics\voc_detection.py”, line 106, in
*[as_numpy(x) for x in [pred_bboxes, pred_labels, pred_scores,
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\gluoncv\utils\metrics\voc_detection.py”, line 95, in as_numpy
out = [x.asnumpy() if isinstance(x, mx.nd.NDArray) else x for x in a]
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\gluoncv\utils\metrics\voc_detection.py”, line 95, in
out = [x.asnumpy() if isinstance(x, mx.nd.NDArray) else x for x in a]
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\ndarray\ndarray.py”, line 1972, in asnumpy
ctypes.c_size_t(data.size)))
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\base.py”, line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [00:54:12] c:\jenkins\workspace\mxnet-tag\mxnet\3rdparty\mshadow\mshadow./stream_gpu-inl.h:62: Check failed: e == cudaSuccess CUDA: unspecified launch failure

Hello?
Did you solve your problem?
I got the same error.

INFO:root:[Epoch 13][Batch 499], LR: 5.00E-04, Speed: 4.210 samples/sec, ObjLoss=19.960, BoxCenterLoss=8.583, BoxScaleLoss=3.446, ClassLoss=1.989
INFO:root:[Epoch 13][Batch 599], LR: 5.00E-04, Speed: 4.186 samples/sec, ObjLoss=19.913, BoxCenterLoss=8.583, BoxScaleLoss=3.443, ClassLoss=1.987
INFO:root:[Epoch 13][Batch 699], LR: 5.00E-04, Speed: 3.053 samples/sec, ObjLoss=19.862, BoxCenterLoss=8.580, BoxScaleLoss=3.439, ClassLoss=1.983
Traceback (most recent call last):
File “trainModel.py”, line 361, in
train(net, train_data, val_data, eval_metric, ctx, args)
File “trainModel.py”, line 292, in train
obj_metrics.update(0, obj_losses)
File “/root/mxnet/python/mxnet/metric.py”, line 1636, in update
loss = ndarray.sum(pred).asscalar()
File “/root/mxnet/python/mxnet/ndarray/ndarray.py”, line 2114, in asscalar
return self.asnumpy()[0]
File “/root/mxnet/python/mxnet/ndarray/ndarray.py”, line 2096, in asnumpy
ctypes.c_size_t(data.size)))
File “/root/mxnet/python/mxnet/base.py”, line 253, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [09:40:27] /root/mxnet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess: CUDA: unspecified launch failure
Stack trace:

Please help me

What’s the model you are training? Do you have a link to the code?