Loading model from .params and .json fails

olivcruche · July 19, 2019, 1:46pm

Hi,

I have a trained detector model (SSD), for which I have a .params and a .json files.
I’d like to instantiate the model, but this command:net = gluon.nn.SymbolBlock.imports("model_algo_1-symbol.json", ['data'], "model_algo_1-0000.params", ctx=ctx) fails: AssertionError: Parameter 'label' is missing in file 'model_algo_1-0000.params', which contains parameters: 'multi_feat_3_conv_1x1_conv_weight', 'stage2_unit3_bn3_gamma', 'stage1_unit1_sc_weight', ..., 'stage4_unit1_bn3_moving_mean', 'stage1_unit2_conv3_weight', 'stage3_unit4_bn1_beta', 'multi_feat_5_conv_1x1_conv_weight'. Please make sure source and target networks have the same prefix.

what am I missing?

NRauschmayr · July 19, 2019, 8:39pm

Are you training the model with SageMaker? If so the error could be related to the problem reported here Example SSD cannot load model if trained with resnet50
Which base_network did you use?

olivcruche · July 19, 2019, 10:47pm

I’m using the resnet, but I followed up in the thread I don’t think the issue is model related?

QueensGambit · July 21, 2019, 11:26pm

Hey @olivcruche,
I had a similar problem when I tried loading a model trained with MXNet into the Gluon API.
The problem was resolved when the model was constructed manually:

model_arch_path = 'model-1.19246-0.603-symbol.json'
model_params_path = 'model-1.19246-0.603-0223.params'
ctx = mx.cpu()
symbol = mx.sym.load(model_arch_path)
inputs = mx.sym.var('data', dtype='float32')
value_out = symbol.get_internals()['value_tanh0_output']
policy_out = symbol.get_internals()['flatten0_output']
sym = mx.symbol.Group([value_out, policy_out])
net = mx.gluon.SymbolBlock(sym, inputs)
net.collect_params().load(model_params_path, ctx)

Best,
~QueensGambit

olivcruche · July 22, 2019, 3:05pm

thanks! what is this doing?

value_out = symbol.get_internals()['value_tanh0_output']
policy_out = symbol.get_internals()['flatten0_output']
sym = mx.symbol.Group([value_out, policy_out])

why isn’t it enough to load the params in a json graph?..

QueensGambit · July 22, 2019, 3:21pm

In need this code segment for my particular mode because it has multiple output heads.
value_tanh0_output is the last output layer for the first head and 'flatten0_output' the output layer name of the second head.

For your model this code might be sufficient:

model_arch_path = 'model_algo_1-symbol.json'
model_params_path = 'model_algo_1-0000.params'

ctx = mx.cpu()  # or mx.gpu()
symbol = mx.sym.load(model_arch_path)
inputs = mx.sym.var('data', dtype='float32')
sym = symbol.get_internals()['<name_of_the_last_output_layer>']
net = mx.gluon.SymbolBlock(sym, inputs)
net.collect_params().load(model_params_path, ctx)

Replace '<name_of_the_last_output_layer>'with the name of you the last output layer of your network.

The reason why you can’t load your model via SymbolBlock.imports() is because the label parameter wasn’t saved in the .params file. This seems to be the case when you train the model with MXNet’s symbol API and later load it in Gluon.

olivcruche · July 22, 2019, 4:03pm

thanks! and how do you know that the model has multiple output heads? and their names? is this visible somehow in the .json file?

QueensGambit · July 22, 2019, 5:45pm

Yes, you can print out a summary of your model like this:
https://beta.mxnet.io/api/symbol-related/_autogen/mxnet.visualization.print_summary.html

    mx.viz.print_summary(
        symbol,
        shape={'data':(1, input_shape[0], input_shape[1], input_shape[2])},
    )

Every Layer which has a SoftmaxOutput or for example LinearRegressionOutput is an output head of your model.
In most cases Resnet models have only a single SoftmaxOutput head and are used as classification models.

olivcruche · July 22, 2019, 7:29pm

when I use mx.viz.print_summary(symbol, shape={'data':(1, 3, 500, 500)}) (I have a 500x500 SSD model), I get a MXNetError: Error in operator multibox_target: [19:28:12] src/operator/contrib/./multibox_target-inl.h:225: Check failed: lshape.ndim() == 3 (0 vs. 3) Label should be [batch, num_labels, label_width] tensor

QueensGambit · July 22, 2019, 7:58pm

Hmm, are you using the SSD model from here?

github.com

zhreshold/mxnet-ssd/blob/master/symbol/symbol_builder.py#L165


body = import_module(network).get_symbol(num_classes=num_classes, **kwargs)
layers = multi_layer_feature(body, from_layers, num_filters, strides, pads,
    min_filter=min_filter)


loc_preds, cls_preds, anchor_boxes = multibox_layer(layers, \
    num_classes, sizes=sizes, ratios=ratios, normalization=normalizations, \
    num_channels=num_filters, clip=False, interm_layer=0, steps=steps)


cls_prob = mx.symbol.SoftmaxActivation(data=cls_preds, mode='channel', \
    name='cls_prob')
out = mx.contrib.symbol.MultiBoxDetection(*[cls_prob, loc_preds, anchor_boxes], \
    name="detection", nms_threshold=nms_thresh, force_suppress=force_suppress,
    variances=(0.1, 0.1, 0.2, 0.2), nms_topk=nms_topk)
return out

If this is the case then the last layer is called 'detection' and the corresponding output 'detection_output'.

Does it also fail if you try loading the model via the MXNet symbol API:

mxnet.model.load_checkpoint('model_algo_1', 0)

https://beta.mxnet.io/api/symbol-related/_autogen/mxnet.model.load_checkpoint.html

olivcruche · July 22, 2019, 8:30pm

I’m getting the model from the sagemaker service, all it tells me in that it is a resnet50-SSD and it returns the .params and .json. The net = mx.model.load_checkpoint('model_algo_1', 0) call is successful but I have no idea of how to go from there to a model that can predict on images…

QueensGambit · July 22, 2019, 8:56pm

Good that load_checkpoint() is working.
After loading the model,

sym, arg_params, aux_params = mxnet.model.load_checkpoint('model_algo_1', 0)

you can bind the executor and run an executor object for inference:

executor = sym.simple_bind(ctx=ctx, data=batch_shape, grad_req='null', force_rebind=True)
executor.copy_params_from(arg_params, aux_params)
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()

Here’s another example of creating executors in MXNet:

or if you are using only a single image for inference, you can refer to this tutorial:

olivcruche · July 22, 2019, 9:07pm

thanks!

sym, arg_params, aux_params = mx.model.load_checkpoint('model_algo_1', 0)

executor = sym.simple_bind(
    ctx=mx.cpu(),
    data=(1, 3, 500, 500),
    grad_req='null',
    force_rebind=True)

executor.copy_params_from(arg_params, aux_params)

y_gen = executor.forward(
    is_train=False,
    data=mx.image.resize_short(mx.image.imread('dtes.jpg'), 500).expand_dims(axis=0))

y_gen[0].wait_to_read()

returns a RuntimeError: simple_bind error. Arguments:
data: (1, 3, 500, 500)
force_rebind: True
Error in operator multibox_target: [21:05:47] src/operator/contrib/./multibox_target-inl.h:225: Check failed: lshape.ndim() == 3 (0 vs. 3) Label should be [batch, num_labels, label_width] tensor

QueensGambit · July 22, 2019, 9:29pm

This is harder than expected. You can try calling the model directly.

sym, arg_params, aux_params = mx.model.load_checkpoint('model_algo_1', 0)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(for_training=False, data_shapes=[('data', (1,3,500,500))], label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_missing=True)

# define a simple data batch
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

img=mx.image.resize_short(mx.image.imread('dtes.jpg'), 500).expand_dims(axis=0))
mod.forward(Batch([img]))
prob = mod.get_outputs()[0].asnumpy()

You might have to specify label_names correctly instead of using None here.

kuonangzhe · July 24, 2019, 7:01am

It should be easier to use gluoncv to have ssd model training, loading, and prediction in a easy way. There’s custom method for you to refer the class names, etc.

Topic		Replies	Views
Load AWS built-in object detection params file on gluon Gluon	0	337	July 28, 2020
Loading parameters and architecture Gluon	0	309	April 23, 2020
Example SSD cannot load model if trained with resnet50 Discussion	3	1268	July 19, 2019
Collect_params().load(model_params_file) does not work Gluon	4	2062	August 29, 2018
Loading from saved params - but "params not initialized" error Gluon	2	1791	July 27, 2018

Loading model from .params and .json fails

Related Topics