MXNet to ONNX, mae error

I am trying to convert a mxnet model to onnx. The final layer of the model is a MAERegressionOutput.
When I run inference using MXNet, the model works as expected.

When I perform inference using MXNet, I load the model as so:

        m_argsMap["data"] = NDArray(Shape(1, NUM_CHANNELS, HEIGHT, WIDTH), m_globalCtx, false);
        m_argsMap["mae_label"] = NDArray(Shape(1), m_globalCtx, false);

        m_executor = m_net.SimpleBind(m_globalCtx, m_argsMap, std::map<std::string, NDArray>(),
                                      std::map<std::string, OpReqType>(), m_auxMap);

I am trying to convert my model over to onnx using the following script:

import mxnet as mx
import numpy as np
from mxnet.contrib import onnx as onnx_mxnet
import logging


sym = "/home/cyrus/model.json"
params = "/home/cyrus/model.params"

logging.basicConfig(level=logging.INFO)
onnx_file = "./model.onnx"
input_shape = (1,3,112,112)

converted_model_path = onnx_mxnet.export_model(sym , params, [input_shape], np.float32, onnx_file)

But when I run the script I get the following error:

infer_shape error. Arguments:
  data: (1, 3, 112, 112)
  stage2_unit6_bn3_gamma: (128,)
  stage3_unit27_bn3_beta: (256,)
  stage1_unit1_bn2_beta: (64,)
  ...
  stage1_unit3_bn2_moving_mean: (64,)
  stage2_unit2_bn3_moving_mean: (128,)
  stage2_unit10_bn3_moving_var: (128,)
  stage3_unit30_bn2_moving_var: (256,)
  stage3_unit27_bn3_moving_mean: (256,)
  stage2_unit9_bn1_moving_var: (128,)
Traceback (most recent call last):
  File "mxnet2onnx.py", line 14, in <module>
    converted_model_path = onnx_mxnet.export_model(sym , params, [input_shape], np.float32, onnx_file)
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/contrib/onnx/mx2onnx/export_model.py", line 83, in export_model
    verbose=verbose)
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 211, in create_onnx_graph_proto
    graph_outputs = MXNetGraph.get_outputs(sym, params, in_shape, output_label)
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 142, in get_outputs
    _, out_shapes, _ = sym.infer_shape(**inputs)
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1076, in infer_shape
    res = self._infer_shape_impl(False, *args, **kwargs)
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1210, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/home/cyrus/.local/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator mae: [13:56:49] include/mxnet/./tuple.h:202: Check failed: i >= 0 && i < ndim(): index = 0 must be in range [0, -1)
Stack trace:
  [bt] (0) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2795cb) [0x7f0cb78905cb]
  [bt] (1) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27bd58) [0x7f0cb7892d58]
  [bt] (2) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x28e0aa2) [0x7f0cb9ef7aa2]
  [bt] (3) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23c8272) [0x7f0cb99df272]
  [bt] (4) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23cab5b) [0x7f0cb99e1b5b]
  [bt] (5) /home/cyrus/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(MXSymbolInferShapeEx+0x103e) [0x7f0cb995168e]
  [bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f0cc98bddae]
  [bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f0cc98bd71f]
  [bt] (8) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2b4) [0x7f0cc9ad15d4]

For refence, I am using mxnet version 1.5.0 and onnx version 1.3.0

Hello @cyrusbehr.
Are you exporting your model to ONNX to be able to use TensorRT or rather to load it into a different DL framework?
In the case of TensorRT you can also rely on the native support by MXNet instead which calls ONNX under the hood.

Regarding the crash. Are you able to share the network structure or a simplified version of your model?

Best,
~QueensGambit

I was trying to convert to ONNX to then conver to another DL framework.

The solution was to remove the MAERegressionOutput layer

1 Like

That makes sense. I assume that you used MXNet’s Symbol API to construct your model.
Here, the loss function is part of the model definition whereas in Gluon the loss is defined in the training loop. One implication of this is for instance that the softmax activation is applied directly when using SoftmaxOutput(), but must be computed outside of the model if you use Gluon’s SoftmaxCrossEntropyLoss().