Deserializing Gluon Symbol in C++

I am writing the symbol and parameters of a hybridized model in Python as follows

net.hybridize()    
out_sym = net(mx.sym.Variable("data"))
out_sym.save(sym_path)
net.collect_params().save(params_path)

Further, I am able to load the symbol and params in Python again (although there might be a more elegant way to do this) with

sym = mx.sym.load(sym_path)
params = mx.nd.load(params_path)
mod = mx.mod.Module(model, label_names=[])
mod.bind(data_shapes=[('data', data.shape)])
mod.init_params()
(args, auxs) = mod.get_params()
mod.set_params(params, auxs)
mod.forward(mx.io.DataBatch(data=[data]))
out = mod.get_outputs()[0]

However, I failed to load use the symbol and params in C++. I implemented the following code based on the examples in the repository

// Load the network structure and parameters
mxnet::cpp::Context ctx_gpu(mxnet::cpp::kGPU, 0);
mxnet::cpp::Symbol net = mxnet::cpp::Symbol::Load(model_path);
std::map<std::string, mxnet::cpp::NDArray> params;
mxnet::cpp::NDArray::Load(param_path, 0, &params);
std::map<std::string, mxnet::cpp::NDArray> args_map;
std::map<std::string, mxnet::cpp::NDArray> aux_map;
for (const auto &k : params) {
   if (k.first.substr(0, 4) == "aux:") {
     auto name = k.first.substr(4, k.first.size() - 4);
     aux_map[name] = k.second.Copy(ctx_gpu);
  }
  if (k.first.substr(0, 4) == "arg:") {
    auto name = k.first.substr(4, k.first.size() - 4);
    args_map[name] = k.second.Copy(ctx_gpu);
  }
}

//Variant 1
mxnet::cpp::Executor* executor = net.SimpleBind(ctx_gpu, args_map);
executor->Forward(false);

//Variant 2
std::vector<mxnet::cpp::NDArray> arg_arrays;
std::vector<mxnet::cpp::NDArray> grad_arrays; 
std::vector<mxnet::cpp::OpReqType> grad_reqs;
std::vector<mxnet::cpp::NDArray> aux_arrays;
std::map<std::string, mxnet::cpp::NDArray> arg_grad_store;
std::map<std::string, mxnet::cpp::OpReqType> grad_req_type;
net.InferExecutorArrays(ctx_gpu, &arg_arrays, &grad_arrays, &grad_reqs, &aux_arrays, args_map, arg_grad_store, grad_req_type, aux_map); 
auto executor = net.Bind(ctx_gpu, arg_arrays, grad_arrays, grad_reqs, aux_arrays);
executor->Forward(false);

Both variants crash on execution with an CUDNN error: Check failed: e == CUDNN_STATUS_SUCCESS (8 vs. 0) cuDNN: CUDNN_STATUS_EXECUTION_FAILED if started on the GPU, or a SegFault on the CPU. If I change the MXNET_ENGINE_TYPE to NaiveEngine, the example runs, but the output is garbage (containing NaNs, …).
What am I missing in the C++ API?

Edit: It seems that reading the params is already problematic, as the names have no prefix “args:”, or “aux:”. But also loading all params to the args_map still fails with the same errors.

You need to use HybridBlock.export to make saved model readable from c++

That seems to help, i.e., I can now copy the parameters into the args and aux map. However, the program still crashes. I switched the context to CPU and had a look with valgrind. Interestingly, I get either a Conditional jump or move depends on uninitialised value(s) if I try to access a value of the output array via

executor->outputs[0].Copy(ctx_cpu).GetData()[0]

or it reports (non-deterministically) an invalid read in the function

auto params = mxnet::cpp::NDArray::LoadToMap(param_path);

Here the trace:

==17893== Invalid read of size 8
==17893== at 0x5D064E9: mxnet::op::Scalar2Array<mshadow::cpu, float>::~Scalar2Array() (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x5D1AFBB: mxnet::op::SampleMaster<mshadow::cpu, mxnet::op::NormalSamplermshadow::cpu >::op(nnvm::NodeAttrs const&, mxnet::OpContext const&, mxnet::OpReqType const&, mxnet::TBlob*) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x5D1C1BA: void mxnet::op::Sample_<mshadow::cpu, mxnet::op::NormalSamplermshadow::cpu >(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x7CA2C5A: mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::Resource, std::allocatormxnet::Resource > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<unsigned int, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x7CA2E20: std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::engine::Var*, std::allocatormxnet::engine::Var* > const&, std::vector<mxnet::Resource, std::allocatormxnet::Resource > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<unsigned int, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x808CFF2: mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x8095280: std::_Function_handler<void (), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0xEBDEC7F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==17893== by 0xF7986B9: start_thread (pthread_create.c:333)
==17893== by 0xF4CE3DC: clone (clone.S:109)
==17893== Address 0x57a9b900 is 0 bytes inside a block of size 1,648 free’d
==17893== at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17893== by 0x80A4D21: std::shared_ptrmxnet::Storage::~shared_ptr() (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0xF400FF7: __run_exit_handlers (exit.c:82)
==17893== by 0xF401044: exit (exit.c:104)
==17893== by 0xF3E7836: (below main) (libc-start.c:325)
==17893== Block was alloc’d at
==17893== at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17893== by 0x809AF69: mxnet::Storage::_GetSharedRef() (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x809B2B9: mxnet::Storage::Get() (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x7BC6FB4: mxnet::NDArray::NDArray(nnvm::TShape const&, mxnet::Context, bool, int) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x7D0C078: mxnet::NDArray::Load(dmlc::Stream*) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x7D0D415: mxnet::NDArray::Load(dmlc::Stream*, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >) (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x80F1EE4: MXNDArrayLoad (in /home/local/ANT/gernot/Documents/mxnet/lib/libmxnet.so)
==17893== by 0x44E176: mxnet::cpp::NDArray::LoadToMap(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (ndarray.hpp:275)
==17893== by 0x447DE8: main (simple_predict.cpp:33)