Reposting from Github Issue Upon Request:
Description
Getting an error in the split0 operator when training an image captioning network in mxnet.
Package used (Python/R/Scala/Julia): I’m using Python
Error Message:
---------------INFO-----------------------
vocab_size:663
sentence_length:46
-----------------------------------------
Creating Iterators...
Initiating Training...
INFO:root:Epoch[0] Train-perplexity=655.513238
INFO:root:Epoch[0] Time cost=1.261
infer_shape error. Arguments:
image_feature: (50, 1024)
word_data: (50, 77)
softmax_label: (50,)
Traceback (most recent call last):
File "2_train_val.py", line 102, in <module>
epoch_end_callback=mx.callback.do_checkpoint(checkpoints_prefix, period=10)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/base_module.py", line 528, in fit
batch_end_callback=eval_batch_end_callback, epoch=epoch)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/base_module.py", line 244, in score
self.forward(eval_batch, is_train=False)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/module.py", line 608, in forward
self.reshape(new_dshape, new_lshape)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/module.py", line 470, in reshape
self._exec_group.reshape(self._data_shapes, self._label_shapes)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 381, in reshape
self.bind_exec(data_shapes, label_shapes, reshape=True)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/module/executor_group.py", line 357, in bind_exec
allow_up_sizing=True, **dict(data_shapes_i + label_shapes_i))
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/executor.py", line 402, in reshape
arg_shapes, _, aux_shapes = self._symbol.infer_shape(**kwargs)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 989, in infer_shape
res = self._infer_shape_impl(False, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1119, in _infer_shape_impl
ctypes.byref(complete)))
File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator split0: [18:11:40] src/operator/./slice_channel-inl.h:208: Check failed: dshape[real_axis] % param_.num_outputs == 0U (31 vs. 0) You are trying to split the 1-th axis of input tensor with shape [50,78,256] into num_outputs=47 evenly sized chunks, but this is not possible because 47 does not evenly divide 78
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276938) [0x7f446310f938]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276d48) [0x7f446310fd48]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x286ccb7) [0x7f4465705cb7]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25eab07) [0x7f4465483b07]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x244274f) [0x7f44652db74f]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2445268) [0x7f44652de268]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(MXSymbolInferShape+0x1539) [0x7f4465260659]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f4487cd1ec0]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f4487cd187d]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f4487ee6dee]
Minimum reproducible example
I’m using the code from the following repository: https://github.com/saicoco/mxnet_image_caption
Everything is identical, except I use my own dataset. I’ve preprocessed the data identical to what this implementation expects (originally used the Flickr8k dataset).
What have you tried to solve it?
I basically need some help trying to understand where this error is coming from – in particular, why param_.num_output
is set to 47.
-
The error message is thrown here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/slice_channel-inl.h#L208
-
num_outputs
seems to be set here: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/module/base_module.py#L385 … although this happens after self.forward is called, but the error message seems to be thrown beforenum_outputs
is set