MXNet Forum

Example SSD cannot load model if trained with resnet50


#1

I trained an SSD model via SageMaker that is AFAIK uses the https://github.com/apache/incubator-mxnet/tree/master/example/ssd code.

After training the model is expected to be converted to a “deployable” state which removes the loss symbols by running the deploy.py script. Afterwards, I load the model with the following code:

sym, arg_params, aux_params = mx.model.load_checkpoint(‘deploy_ssd_vgg16_reduced_512’, 0)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(for_training=False, data_shapes=[(‘data’, (1,3,512,512))],
label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_extra=True)

This works fine as long as the model is trained with a VGG feature extractor. However, Sagemaker (and hence the example code) allows training with resnet50 which produces a model that can be converted with deploy.py but the resulting model cannot be loaded anymore with the above code. The error I am getting is:

RuntimeError: _plus12_cls_pred_conv_bias is not presented

And indeed the BN params and few other are missing from the param file. Maybe the deploy script is bugged with resnet50?


#2

hey,

so looking at the deploy script it seems it gets the network symbols from https://github.com/apache/incubator-mxnet/blob/master/example/ssd/symbol/symbol_factory.py so there might be a bug in the config definitions for resnet. Haven’t been able to pin-point what exactly though


#3

Thanks for the reply. Turns out it was a SageMaker bug producing wrong model files.