MXNet CudaMalloc Error on Testing


#1

I get a very strange error while testing a trained model:

    self.mod.bind( data_shapes=self.data_shapes, label_shapes=self.label_shapes )
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/module/module.py", line 388, in bind
    state_names=self._state_names)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/module/executor_group.py", line 216, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/module/executor_group.py", line 312, in bind_exec
    shared_group))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/module/executor_group.py", line 653, in _bind_ith_exec
    grad_req=self.grad_req, shared_exec=shared_exec)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/symbol.py", line 1407, in bind
    ctypes.byref(handle)))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/base.py", line 84, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [23:43:05] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

The strange thing here is that this happens during testing, while training completed successfully. Is this possible behavior? Can a model be this much larger on testing than training?


#2

Could you please update to a current version of MxNet. Probably best done by

pip install -U pip setuptools
pip install --upgrade --pre mxnet

Please let us know whether the problem persists.


#3

I upgraded MXNet to 0.11.0 on my python3 install, still gives me the same issue:

 Loading a model from  stacked/exp5/exp5-0016
[00:27:48] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.10.0. Attempting to upgrade...
[00:27:48] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
[00:27:51] src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x1d57cc) [0x7fda45bb57cc]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x1242238) [0x7fda46c22238]
[bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x1244c0a) [0x7fda46c24c0a]
[bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0xe4d4db) [0x7fda4682d4db]
[bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0xe549cd) [0x7fda468349cd]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0xe59f95) [0x7fda46839f95]
[bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0xe5d6ee) [0x7fda4683d6ee]
[bt] (7) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0xe5dcd4) [0x7fda4683dcd4]
[bt] (8) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2261) [0x7fda467bc291]
[bt] (9) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7fd9c6f7fe20]

If it helps, the model looks like this:

 {'causal_0': <NDArray 7x1 @cpu(0)>,
  'causal_1': <NDArray 16x1 @cpu(0)>,
  'causal_10': <NDArray 16x1 @cpu(0)>,
  'causal_11': <NDArray 16x1 @cpu(0)>,
  'causal_12': <NDArray 16x1 @cpu(0)>,
  'causal_13': <NDArray 16x1 @cpu(0)>,
  'causal_14': <NDArray 16x1 @cpu(0)>,
  'causal_15': <NDArray 16x1 @cpu(0)>,
  'causal_16': <NDArray 16x1 @cpu(0)>,
  'causal_17': <NDArray 16x1 @cpu(0)>,
  'causal_18': <NDArray 16x1 @cpu(0)>,
  'causal_19': <NDArray 16x1 @cpu(0)>,
  'causal_2': <NDArray 16x1 @cpu(0)>,
  'causal_20': <NDArray 16x1 @cpu(0)>,
  'causal_21': <NDArray 16x1 @cpu(0)>,
  'causal_22': <NDArray 16x1 @cpu(0)>,
  'causal_23': <NDArray 16x1 @cpu(0)>,
  'causal_24': <NDArray 16x1 @cpu(0)>,
  'causal_25': <NDArray 16x1 @cpu(0)>,
  'causal_26': <NDArray 16x1 @cpu(0)>,
  'causal_27': <NDArray 16x1 @cpu(0)>,
  'causal_28': <NDArray 16x1 @cpu(0)>,
  'causal_29': <NDArray 16x1 @cpu(0)>,
  'causal_3': <NDArray 16x1 @cpu(0)>,
  'causal_30': <NDArray 16x1 @cpu(0)>,
  'causal_31': <NDArray 16x1 @cpu(0)>,
  'causal_32': <NDArray 16x1 @cpu(0)>,
  'causal_33': <NDArray 16x1 @cpu(0)>,
  'causal_34': <NDArray 16x1 @cpu(0)>,
  'causal_35': <NDArray 16x1 @cpu(0)>,
  'causal_36': <NDArray 16x1 @cpu(0)>,
  'causal_37': <NDArray 16x1 @cpu(0)>,
  'causal_38': <NDArray 16x1 @cpu(0)>,
  'causal_39': <NDArray 16x1 @cpu(0)>,
  'causal_4': <NDArray 16x1 @cpu(0)>,
  'causal_40': <NDArray 16x1 @cpu(0)>,
  'causal_41': <NDArray 16x1 @cpu(0)>,
  'causal_42': <NDArray 16x1 @cpu(0)>,
  'causal_43': <NDArray 16x1 @cpu(0)>,
  'causal_44': <NDArray 16x1 @cpu(0)>,
  'causal_45': <NDArray 16x1 @cpu(0)>,
  'causal_46': <NDArray 16x1 @cpu(0)>,
  'causal_47': <NDArray 16x1 @cpu(0)>,
  'causal_5': <NDArray 16x1 @cpu(0)>,
  'causal_6': <NDArray 16x1 @cpu(0)>,
  'causal_7': <NDArray 16x1 @cpu(0)>,
  'causal_8': <NDArray 16x1 @cpu(0)>,
  'causal_9': <NDArray 16x1 @cpu(0)>,
  'fullyconnected0_bias': <NDArray 512 @cpu(0)>,
  'fullyconnected0_weight': <NDArray 512x215 @cpu(0)>,
  'fullyconnected100_bias': <NDArray 374 @cpu(0)>,
  'fullyconnected100_weight': <NDArray 374x64 @cpu(0)>,
  'fullyconnected10_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected10_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected11_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected11_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected12_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected12_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected13_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected13_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected14_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected14_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected15_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected15_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected16_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected16_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected17_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected17_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected18_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected18_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected19_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected19_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected1_bias': <NDArray 512 @cpu(0)>,
  'fullyconnected1_weight': <NDArray 512x512 @cpu(0)>,
  'fullyconnected20_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected20_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected21_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected21_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected22_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected22_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected23_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected23_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected24_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected24_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected25_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected25_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected26_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected26_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected27_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected27_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected28_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected28_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected29_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected29_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected2_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected2_weight': <NDArray 10x519 @cpu(0)>,
  'fullyconnected30_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected30_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected31_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected31_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected32_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected32_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected33_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected33_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected34_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected34_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected35_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected35_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected36_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected36_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected37_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected37_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected38_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected38_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected39_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected39_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected3_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected3_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected40_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected40_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected41_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected41_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected42_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected42_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected43_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected43_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected44_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected44_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected45_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected45_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected46_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected46_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected47_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected47_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected48_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected48_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected49_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected49_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected4_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected4_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected50_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected50_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected51_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected51_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected52_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected52_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected53_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected53_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected54_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected54_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected55_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected55_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected56_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected56_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected57_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected57_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected58_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected58_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected59_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected59_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected5_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected5_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected60_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected60_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected61_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected61_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected62_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected62_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected63_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected63_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected64_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected64_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected65_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected65_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected66_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected66_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected67_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected67_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected68_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected68_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected69_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected69_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected6_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected6_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected70_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected70_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected71_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected71_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected72_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected72_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected73_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected73_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected74_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected74_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected75_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected75_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected76_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected76_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected77_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected77_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected78_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected78_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected79_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected79_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected7_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected7_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected80_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected80_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected81_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected81_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected82_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected82_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected83_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected83_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected84_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected84_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected85_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected85_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected86_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected86_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected87_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected87_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected88_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected88_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected89_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected89_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected8_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected8_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected90_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected90_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected91_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected91_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected92_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected92_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected93_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected93_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected94_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected94_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected95_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected95_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected96_bias': <NDArray 10 @cpu(0)>,
  'fullyconnected96_weight': <NDArray 10x528 @cpu(0)>,
  'fullyconnected97_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected97_weight': <NDArray 2x10 @cpu(0)>,
  'fullyconnected98_bias': <NDArray 64 @cpu(0)>,
  'fullyconnected98_weight': <NDArray 64x608 @cpu(0)>,
  'fullyconnected99_bias': <NDArray 64 @cpu(0)>,
  'fullyconnected99_weight': <NDArray 64x64 @cpu(0)>,
  'fullyconnected9_bias': <NDArray 2 @cpu(0)>,
  'fullyconnected9_weight': <NDArray 2x10 @cpu(0)>,
  'lossWeights': <NDArray 1x235 @cpu(0)>,
  'reorderIndex': <NDArray 235x1 @cpu(0)>},
 {})

(I appreciate that it’s a bit large, which is why I was wondering why this doesn’t occur during training)


#4

One possibility that it only occurs during testing: the batch size during testing is larger than that during training. Have you checked that?


#5

Its not much different. Twice the size, but maybe that’s the issue - Ill fix that


#6

That did it! Thanks @astonzhang @smolix!