Error: no kernel image is available for execution on the device


#1

Hello!

Still working on the MXNet 1.1.0 build for the Jetson TX2. The build now works but I can’t use the GPU (using CPU works)

I can run the CUDA9 workload samples from Nvidia.

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Anyone knows what this error means?


#2

Hi @zukoo, this is just a generic error message that means something has gone wrong in the MXNet engine and because the front-end language (e.g. Python) makes asynchronous calls to the engine the true error message hasn’t propagated up for you to see.

Set the MXNET_ENGINE_TYPE variable to NaiveEngine as suggested and you will likely see the true error. You can set the environment variable for a single script temporarily using MXNET_ENGINE_TYPE=NaiveEngine python your_mxnet_file.py. But remember to remove this/change the environment variable back when you’re done debugging as this makes the script run much slower.


#3

Hi!

Thanks for your answer @thomelane I do see more debug now.

It seems like the error is coming from symbol/symbol.py:

RuntimeError: simple_bind error. Arguments:
data: (1, 3, 512, 512)

It looks like it has an issue with the data format. but somehow it all works nice (and slow) on the CPU.

The full log:

[04:13:19] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.10.1. Attempting to upgrade...
[04:13:19] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
[04:13:19] src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
Traceback (most recent call last):
  File "demo.py", line 121, in <module>
    ctx, len(class_names), args.nms_thresh, args.force_nms)
  File "demo.py", line 38, in get_detector
    detector = Detector(net, prefix, epoch, data_shape, mean_pixels, ctx=ctx)
  File "/home/nvidia/Demos/mxnet-ssd/detect/detector.py", line 39, in __init__
    self.mod.bind(data_shapes=[('data', (batch_size, 3, data_shape, data_shape))])
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/module.py", line 429, in bind
    state_names=self._state_names)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 264, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 360, in bind_exec
    shared_group))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 638, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/symbol/symbol.py", line 1518, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 512, 512)
[04:13:22] /home/nvidia/apache-mxnet-src-1.1.0-incubating/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (48 vs. 0) Name: MapPlanKernel ErrStr:no kernel image is available for execution on the device

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x58) [0x7f9f68b3a0]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44) [0x7f9f68be9c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mshadow::cuda::MapPlan<mshadow::sv::saveto, mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::expr::ScalarExp<float>, float>(mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 2, float>, float>, mshadow::expr::Plan<mshadow::expr::ScalarExp<float>, float> const&, mshadow::Shape<2>, CUstream_st*)+0x1b4) [0x7fa1f23064]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::ndarray::Eval<mshadow::gpu>(float const&, mxnet::TBlob*, mxnet::RunContext)+0x1b8) [0x7fa205e0f0]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(+0x23f5c04) [0x7fa1189c04]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::Engine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x54) [0x7fa10d2c04]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)+0x3a8) [0x7fa14d31b0]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::Engine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)+0xf8) [0x7fa10d2fa0]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::SetValueOp(float const&, mxnet::NDArray*)+0x128) [0x7fa1186140]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::NDArray::operator=(float)+0x24) [0x7fa11863c4]


[04:13:22] src/engine/naive_engine.cc:55: Engine shutdown

#4

So on CPU it runs fine, and on GPU it throws this error? If so, you might not have the CUDA compiled version of MXNet. Did you install with pip install mxnet-cu90, or similar depending on your CUDA version?


#5

Yes I only get the error when i use mx.gpu(0) not with mx.cpu.
I can run CUDA apps from the C++ inference demo from Nvidia.

I built MXNet on the device itself (downloaded the github release 1.1) and using:
make -j 4 USE_F16C=0 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1


#6

Was there a solution ever found for this issue? I am getting the same error when trying to use the gpu after building MXNet 1.3 on my Nvidia Jetson TX2 device (CPU also works fine but is slow)


#7

Do you have solved the problem? I met the same problem,would you like to give some suggestions to me? Thanks very much!