MxNet 1.2.1--module get_outputs()


#1

The following code runs fine in MxNet 1.1.0 but gives errors in 1.2.1:

mod2 = mx.mod.Module(symbol=net2)
mod2.bind(for_training=True, data_shapes=[(‘data’, (1, 3,224,224))])
mod2.set_params(new_args, aux_params, allow_missing=True)
mod.forward(Batch([mx.nd.array(img)]))

Errors:


MXNetError Traceback (most recent call last)
/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/IPython/core/formatters.pyc in call(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
–> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
384 if cls in self.type_pprinters:
385 # printer registered in self.type_pprinters
–> 386 return self.type_pprinters[cls](obj, self, cycle)
387 else:
388 # deferred printer

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/IPython/lib/pretty.pyc in inner(obj, p, cycle)
567 p.text(’,’)
568 p.breakable()
–> 569 p.pretty(x)
570 if len(obj) == 1 and type(obj) is tuple:
571 # Special case for 1-item tuples.

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
401 if cls is not object
402 and callable(cls.dict.get(‘repr’)):
–> 403 return _repr_pprint(obj, self, cycle)
404
405 return _default_pprint(obj, self, cycle)

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/IPython/lib/pretty.pyc in repr_pprint(obj, p, cycle)
701 “”“A pprint that just redirects to the normal repr function.”""
702 # Find newlines and replace them with p.break
()
–> 703 output = repr(obj)
704 for idx,output_line in enumerate(output.splitlines()):
705 if idx:

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.pyc in repr(self)
187 “”“Returns a string representation of the array.”""
188 shape_info = ‘x’.join([’%d’ % x for x in self.shape])
–> 189 return ‘\n%s\n<%s %s @%s>’ % (str(self.asnumpy()),
190 self.class.name,
191 shape_info, self.context)

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/ndarray/ndarray.pyc in asnumpy(self)
1874 self.handle,
1875 data.ctypes.data_as(ctypes.c_void_p),
-> 1876 ctypes.c_size_t(data.size)))
1877 return data
1878

/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/base.pyc in check_call(ret)
147 “”"
148 if ret != 0:
–> 149 raise MXNetError(py_str(_LIB.MXGetLastError()))
150
151

MXNetError: [22:48:30] src/ndarray/ndarray.cc:767: Check failed: !IsMKLDNNData() We can’t generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2ee4c2) [0x7fa89300e4c2]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2eea88) [0x7fa89300ea88]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2ac7b35) [0x7fa8957e7b35]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4ba792) [0x7fa8931da792]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4be6e5) [0x7fa8931de6e5]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x29414e4) [0x7fa8956614e4]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2923374) [0x7fa895643374]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2926ec1) [0x7fa895646ec1]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2923a3b) [0x7fa895643a3b]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/zmq/backend/cython/…/…/…/…/…/./libstdc++.so.6(+0xb8678) [0x7fa8ea64d678]

Can anyone help me figure out what might be wrong?


#2

Look like there is a bug in one of the operators. Are you using mxnet-mkl or mxnet-cuxxmkl in both mxnet 1.1 and 1.2 case? If so, does the problem go away if you use mxnet or mxnet-cuxx version?


#3

What’s mxnet-mkl or mxnet-cuxxmkl?


#4

There are twelve different mxnet packages on PyPI that you can install using pip install:

pip install mxnet
pip install mxnet-mkl
pip install mxnet-cu75
pip install mxnet-cu80
pip install mxnet-cu90
pip install mxnet-cu91
pip install mxnet-cu92
pip install mxnet-cu75mkl
pip install mxnet-cu80mkl
pip install mxnet-cu90mkl
pip install mxnet-cu91mkl
pip install mxnet-cu92mkl

If you have any of the packages with mkl installed (default on AWS DLAMI is mxnet-cu90mkl) then MKLDNN accelerator is used for operators. MKLDNN requires layout conversion from default MXNet tensor memory layout. However not all operators are implemented by MKLDNN, so if your network is mixing operators that are implemented by MKLDNN with ones that are not, then every time you cross this boundary, a layout conversion must happen. Looks like there is an operator in MXNet that is not expecting the input tensor to the operator to be in MKLDNN layout and it does not call Reorder2Default() to convert from MKLDNN layout to default MXNet layout.


#5

I am trying to upgrade to 1.2.1. I tried one of the newest version of DLAMI available on an AWS p3.2xlarge instance. Is this a bug that you guys are already aware of? Is it going to be fixed soon? Or what is the next safe choice of DLAMI to use?

I have tried to run https://mxnet.incubator.apache.org/tutorials/vision/cnn_visualization.html and it gives exactly the same error.


#6

Hey @xinwang-issaquah I could reproduce an issue with this tutorial, I’ll have a look over the weekend how to best fix it. Thanks for bringing it up and a good incentive for us to reactivate ASAP the tutorial automated tests that were disabled during the CI migration.


#7

in the tutorial, please replace the get_vgg function with the following:

def get_vgg(num_layers, ctx=mx.cpu(), root=os.path.join('~', '.mxnet', 'models'), **kwargs):

    # Get the number of convolution layers and filters
    layers, filters = vgg_spec[num_layers]

    # Build the modified VGG network
    net = VGG(layers, filters, **kwargs)
    net.initialize(ctx=ctx)
    
    # Get the pretrained model
    vgg = mx.gluon.model_zoo.vision.get_vgg(num_layers, pretrained=True, ctx=ctx)
    
    # Set the parameters in the new network
    params = vgg.collect_params()
    for key in params:
        param = params[key]
        net.collect_params()[net.prefix+key.replace(vgg.prefix, '')].set_data(param.data())

    return net

#8

Thomas, thanks a lot for helping out with the tutorial. Can you take a look at the original post as well? What do I need to change to make it work as well?


#9

Hi @xinwang-issaquah, I have issued a PR to update the tutorial, thanks for reporting that it wasn’t working.
Have you tried upgrading to 1.3.0 to see if your original problem is solved?

mod2 = mx.mod.Module(symbol=net2)
mod2.bind(for_training=True, data_shapes=[(‘data’, (1, 3,224,224))])
mod2.set_params(new_args, aux_params, allow_missing=True)
mod.forward(Batch([mx.nd.array(img)]))

can you provide more information as to how you obtained new_args, aux_params and net2 variables?


#10

I tried the fix, but it is still producing the same error:


MXNetError Traceback (most recent call last)
in ()
----> 1 show_images(*visualize(network, “hummingbird.jpg”, last_conv_layer_name))

in visualize(net, img_path, conv_layer_name)
3 preprocessed_img = preprocess(orig_img)
4 preprocessed_img = preprocessed_img.expand_dims(axis=0)
----> 5 pred_str = get_class_name(run_inference(net, preprocessed_img))
6
7 orig_img = mx.image.imresize(orig_img, image_sz[0], image_sz[1]).asnumpy()

in run_inference(net, data)
11 def run_inference(net, data):
12 out = net(data)
—> 13 return out.argmax(axis=1).asnumpy()[0].astype(int)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self)
1874 self.handle,
1875 data.ctypes.data_as(ctypes.c_void_p),
-> 1876 ctypes.c_size_t(data.size)))
1877 return data
1878

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
147 “”"
148 if ret != 0:
–> 149 raise MXNetError(py_str(_LIB.MXGetLastError()))
150
151

MXNetError: [18:00:49] src/ndarray/ndarray.cc:767: Check failed: !IsMKLDNNData() We can’t generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2ee4c2) [0x7f1b7421b4c2]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2eea88) [0x7f1b7421ba88]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2ac7b35) [0x7f1b769f4b35]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4ba792) [0x7f1b743e7792]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4be6e5) [0x7f1b743eb6e5]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x29923e1) [0x7f1b768bf3e1]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2923374) [0x7f1b76850374]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2926ec1) [0x7f1b76853ec1]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2923a3b) [0x7f1b76850a3b]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/zmq/backend/cython/…/…/…/…/…/./libstdc++.so.6(+0xb8678) [0x7f1bcb9d4678]


#11

Solution was to use mxnet-cu90, mxnet-cu90mkl has a bug with custom operator.
Looks like https://github.com/apache/incubator-mxnet/pull/11005 was never merged.


#12

Thanks a lot for the tip! It solved my problem. :slight_smile: