Stuck trying to load a model. what am i missing?


#1

I am stuck trying to do something very simple.

I trained a resnet50 model on mxnet in aws - it ran 100 epochs, and saved the model.tar.gz file

I have downloaded the file, and now want to load it locally and run predictions - but i’m stuck
I must be doing something stupid, but I can’t figure it out.
I trained my model using resnet50, it ran to completion and saved model.tar.gz into s3
I download and untarred the file into my working directory:
./model.tar.gz
./model_algo_1-0000.params
./model_algo_1-symbol.json

I’m now trying to load it with the following code:

import mxnet as mx
import numpy as np
import cv2,sys,time
from collections import namedtuple

def loadModel(modelname):
print ("Loading model", modelname)
t1 = time.time()
sym, arg_params, aux_params = mx.model.load_checkpoint(modelname, 0)
t2 = time.time()
t = 1000*(t2-t1)
print("Loaded in %2.2f milliseconds" % t)
arg_params = mx.nd.array([0])
mod = mx.mod.Module(symbol=sym)
mod.bind(for_training=False, data_shapes=)
mod.set_params(arg_params, aux_params)
return mod

loadModel ('model_algo_1')

and fail on:

Loading model model_algo_1
Loaded in 125.40 milliseconds
C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\module\base_module.py:55: UserWarning: e[91mYou created Module with Module(…, label_names=) but input with name ‘softmax_label’ is not found in symbol.list_arguments(). Did you mean one of:
data
labele[0m
warnings.warn(msg)
C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\module\base_module.py:67: UserWarning: Data provided by label_shapes don’t match names specified by label_names ( vs. )
warnings.warn(msg)
Traceback (most recent call last):
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\symbol\symbol.py”, line 1522, in simple_bind
ctypes.byref(exe_handle)))
File “C:\Users\johnl\AppData\Local\Programs\Python\Python36\lib\site-packages\mxnet\base.py”, line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator multibox_target: https://forums.aws.amazon.com/ c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\contrib\multibox_target-inl.h:225: Check failed: lshape.ndim() == 3 (0 vs. 3) Label should be https://forums.aws.amazon.com/ tensor

what am i missing? do i have to cut out a layer or something?
thanks


#2

MXNet is failing because it cannot find softmax_label. MXNet module is a wrapper around Symbol and it is expecting label_names: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/module/module.py#L31 If label_names is not set, then the default will be softmax_label.


#3

Thanks, apparently I had to run deploy.py - it converts a training model to a model for deployment and that solved it.

This simply removes all loss layers, and attach a layer for merging results and non-maximum suppression. Useful when loading python symbol is not available.