Error: no kernel image is available for execution on the device

Hello!

Still working on the MXNet 1.1.0 build for the Jetson TX2. The build now works but I can’t use the GPU (using CPU works)

I can run the CUDA9 workload samples from Nvidia.

A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Anyone knows what this error means?

Hi @zukoo, this is just a generic error message that means something has gone wrong in the MXNet engine and because the front-end language (e.g. Python) makes asynchronous calls to the engine the true error message hasn’t propagated up for you to see.

Set the MXNET_ENGINE_TYPE variable to NaiveEngine as suggested and you will likely see the true error. You can set the environment variable for a single script temporarily using MXNET_ENGINE_TYPE=NaiveEngine python your_mxnet_file.py. But remember to remove this/change the environment variable back when you’re done debugging as this makes the script run much slower.

1 Like

Hi!

Thanks for your answer @thomelane I do see more debug now.

It seems like the error is coming from symbol/symbol.py:

RuntimeError: simple_bind error. Arguments:
data: (1, 3, 512, 512)

It looks like it has an issue with the data format. but somehow it all works nice (and slow) on the CPU.

The full log:

[04:13:19] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.10.1. Attempting to upgrade...
[04:13:19] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
[04:13:19] src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
Traceback (most recent call last):
  File "demo.py", line 121, in <module>
    ctx, len(class_names), args.nms_thresh, args.force_nms)
  File "demo.py", line 38, in get_detector
    detector = Detector(net, prefix, epoch, data_shape, mean_pixels, ctx=ctx)
  File "/home/nvidia/Demos/mxnet-ssd/detect/detector.py", line 39, in __init__
    self.mod.bind(data_shapes=[('data', (batch_size, 3, data_shape, data_shape))])
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/module.py", line 429, in bind
    state_names=self._state_names)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 264, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 360, in bind_exec
    shared_group))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/module/executor_group.py", line 638, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/symbol/symbol.py", line 1518, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 512, 512)
[04:13:22] /home/nvidia/apache-mxnet-src-1.1.0-incubating/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (48 vs. 0) Name: MapPlanKernel ErrStr:no kernel image is available for execution on the device

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x58) [0x7f9f68b3a0]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44) [0x7f9f68be9c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mshadow::cuda::MapPlan<mshadow::sv::saveto, mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::expr::ScalarExp<float>, float>(mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 2, float>, float>, mshadow::expr::Plan<mshadow::expr::ScalarExp<float>, float> const&, mshadow::Shape<2>, CUstream_st*)+0x1b4) [0x7fa1f23064]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(void mxnet::ndarray::Eval<mshadow::gpu>(float const&, mxnet::TBlob*, mxnet::RunContext)+0x1b8) [0x7fa205e0f0]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(+0x23f5c04) [0x7fa1189c04]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::Engine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x54) [0x7fa10d2c04]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)+0x3a8) [0x7fa14d31b0]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::Engine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)+0xf8) [0x7fa10d2fa0]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::SetValueOp(float const&, mxnet::NDArray*)+0x128) [0x7fa1186140]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-1.1.0-py2.7.egg/mxnet/libmxnet.so(mxnet::NDArray::operator=(float)+0x24) [0x7fa11863c4]


[04:13:22] src/engine/naive_engine.cc:55: Engine shutdown

So on CPU it runs fine, and on GPU it throws this error? If so, you might not have the CUDA compiled version of MXNet. Did you install with pip install mxnet-cu90, or similar depending on your CUDA version?

Yes I only get the error when i use mx.gpu(0) not with mx.cpu.
I can run CUDA apps from the C++ inference demo from Nvidia.

I built MXNet on the device itself (downloaded the github release 1.1) and using:
make -j 4 USE_F16C=0 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1

Was there a solution ever found for this issue? I am getting the same error when trying to use the gpu after building MXNet 1.3 on my Nvidia Jetson TX2 device (CPU also works fine but is slow)

Do you have solved the problem? I met the same problem,would you like to give some suggestions to me? Thanks very much!

As of today (May 27th 2019) I too faced this error.
I will attempt to reinstall everything from an earlier MxNet version.
I actually got it to work with GPU on a downgrade to MxNet 1.2.1.
Back at the time I installed 1.2.1, I figured that it would work since its earlier version 1.1.0 introduced MxNet to Jetson TX2, and since 1.2.1 was the last release to update Jetsons features on the Readme.md, I went with it.
I will attempt to install it with a fresh JetPack 4.2 installation on my TX2, alongside the gluoncv.
If I succeed I will post it here.

I confirm in Jetson Xavier: RuntimeError: simple_bind error. Arguments:
data: (1, 3, 12, 12)
[07:01:51] /home/ubuntu/setup/incubator-mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:110: Check failed: err == cudaSuccess (48 vs. 0) Name: MapPlanKernel ErrStr:no kernel image is available for execution on the device. Such a build worked on older Jetsons for us but not on Xavier.
We have built mxnet directly on Xavier, configuring it to use CUDA 72 as supported by the device.

I had encountered that issue too when building for the TX2, the fix was to enable the right architecture in the makefile.

I believe this is the line: https://github.com/apache/incubator-mxnet/blob/master/Makefile#L417

You said setting the architure to 72 didn’t work for Xavier?

Sorry, pipenv did not update the .so library. It started to work after I have replaced the libmxnet.so in ~/.local/share/virtualenvs by hand. KNOWN_CUDA_ARCHS := 72 fixes the problem.

1 Like