Loading model from .params and .json fails

Hi,

I have a trained detector model (SSD), for which I have a .params and a .json files.
I’d like to instantiate the model, but this command:net = gluon.nn.SymbolBlock.imports("model_algo_1-symbol.json", ['data'], "model_algo_1-0000.params", ctx=ctx) fails: AssertionError: Parameter 'label' is missing in file 'model_algo_1-0000.params', which contains parameters: 'multi_feat_3_conv_1x1_conv_weight', 'stage2_unit3_bn3_gamma', 'stage1_unit1_sc_weight', ..., 'stage4_unit1_bn3_moving_mean', 'stage1_unit2_conv3_weight', 'stage3_unit4_bn1_beta', 'multi_feat_5_conv_1x1_conv_weight'. Please make sure source and target networks have the same prefix.

what am I missing?

Are you training the model with SageMaker? If so the error could be related to the problem reported here Example SSD cannot load model if trained with resnet50
Which base_network did you use?

I’m using the resnet, but I followed up in the thread I don’t think the issue is model related?

Hey @olivcruche,
I had a similar problem when I tried loading a model trained with MXNet into the Gluon API.
The problem was resolved when the model was constructed manually:

model_arch_path = 'model-1.19246-0.603-symbol.json'
model_params_path = 'model-1.19246-0.603-0223.params'
ctx = mx.cpu()
symbol = mx.sym.load(model_arch_path)
inputs = mx.sym.var('data', dtype='float32')
value_out = symbol.get_internals()['value_tanh0_output']
policy_out = symbol.get_internals()['flatten0_output']
sym = mx.symbol.Group([value_out, policy_out])
net = mx.gluon.SymbolBlock(sym, inputs)
net.collect_params().load(model_params_path, ctx)

Best,
~QueensGambit

thanks! what is this doing?

value_out = symbol.get_internals()['value_tanh0_output']
policy_out = symbol.get_internals()['flatten0_output']
sym = mx.symbol.Group([value_out, policy_out])

why isn’t it enough to load the params in a json graph?..

In need this code segment for my particular mode because it has multiple output heads.
value_tanh0_output is the last output layer for the first head and 'flatten0_output' the output layer name of the second head.

For your model this code might be sufficient:

model_arch_path = 'model_algo_1-symbol.json'
model_params_path = 'model_algo_1-0000.params'

ctx = mx.cpu()  # or mx.gpu()
symbol = mx.sym.load(model_arch_path)
inputs = mx.sym.var('data', dtype='float32')
sym = symbol.get_internals()['<name_of_the_last_output_layer>']
net = mx.gluon.SymbolBlock(sym, inputs)
net.collect_params().load(model_params_path, ctx)

Replace '<name_of_the_last_output_layer>'with the name of you the last output layer of your network.

The reason why you can’t load your model via SymbolBlock.imports() is because the label parameter wasn’t saved in the .params file. This seems to be the case when you train the model with MXNet’s symbol API and later load it in Gluon.

thanks! and how do you know that the model has multiple output heads? and their names? is this visible somehow in the .json file?

Yes, you can print out a summary of your model like this:
https://beta.mxnet.io/api/symbol-related/_autogen/mxnet.visualization.print_summary.html

    mx.viz.print_summary(
        symbol,
        shape={'data':(1, input_shape[0], input_shape[1], input_shape[2])},
    )

Every Layer which has a SoftmaxOutput or for example LinearRegressionOutput is an output head of your model.
In most cases Resnet models have only a single SoftmaxOutput head and are used as classification models.

when I use mx.viz.print_summary(symbol, shape={'data':(1, 3, 500, 500)}) (I have a 500x500 SSD model), I get a MXNetError: Error in operator multibox_target: [19:28:12] src/operator/contrib/./multibox_target-inl.h:225: Check failed: lshape.ndim() == 3 (0 vs. 3) Label should be [batch, num_labels, label_width] tensor

Hmm, are you using the SSD model from here?


If this is the case then the last layer is called 'detection' and the corresponding output 'detection_output'.

Does it also fail if you try loading the model via the MXNet symbol API:

mxnet.model.load_checkpoint('model_algo_1', 0)

https://beta.mxnet.io/api/symbol-related/_autogen/mxnet.model.load_checkpoint.html

I’m getting the model from the sagemaker service, all it tells me in that it is a resnet50-SSD and it returns the .params and .json. The net = mx.model.load_checkpoint('model_algo_1', 0) call is successful but I have no idea of how to go from there to a model that can predict on images…

Good that load_checkpoint() is working.
After loading the model,

sym, arg_params, aux_params = mxnet.model.load_checkpoint('model_algo_1', 0)

you can bind the executor and run an executor object for inference:

executor = sym.simple_bind(ctx=ctx, data=batch_shape, grad_req='null', force_rebind=True)
executor.copy_params_from(arg_params, aux_params)
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()

Here’s another example of creating executors in MXNet:

or if you are using only a single image for inference, you can refer to this tutorial:

thanks!

sym, arg_params, aux_params = mx.model.load_checkpoint('model_algo_1', 0)

executor = sym.simple_bind(
    ctx=mx.cpu(),
    data=(1, 3, 500, 500),
    grad_req='null',
    force_rebind=True)

executor.copy_params_from(arg_params, aux_params)

y_gen = executor.forward(
    is_train=False,
    data=mx.image.resize_short(mx.image.imread('dtes.jpg'), 500).expand_dims(axis=0))

y_gen[0].wait_to_read()

returns a RuntimeError: simple_bind error. Arguments:
data: (1, 3, 500, 500)
force_rebind: True
Error in operator multibox_target: [21:05:47] src/operator/contrib/./multibox_target-inl.h:225: Check failed: lshape.ndim() == 3 (0 vs. 3) Label should be [batch, num_labels, label_width] tensor

This is harder than expected. You can try calling the model directly.

sym, arg_params, aux_params = mx.model.load_checkpoint('model_algo_1', 0)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(for_training=False, data_shapes=[('data', (1,3,500,500))], label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_missing=True)

# define a simple data batch
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

img=mx.image.resize_short(mx.image.imread('dtes.jpg'), 500).expand_dims(axis=0))
mod.forward(Batch([img]))
prob = mod.get_outputs()[0].asnumpy()

You might have to specify label_names correctly instead of using None here.

It should be easier to use gluoncv to have ssd model training, loading, and prediction in a easy way. There’s custom method for you to refer the class names, etc.