NDarray fails silently on large array size


#1

It’s known that ndarrays are not scalable to large sizes, for example the following commands will give an error:

>>> a = mx.nd.ones((1000, 5065309))
>>> b = a.asnumpy()

However, it seems that when the size of the ndarray is slightly larger than the max value of int32, no error is raised, but later operations may give incorrect results:

>>> 214748364 * 11 > np.iinfo('int32').max
True
>>> a = mx.nd.ones((214748364, 11))
>>> b = a.asnumpy()
>>> b.sum()
0.0

The expected correct result should be the size of the array, which works fine here:

>>> 214748364 * 10 > np.iinfo('int32').max
False
>>> a = mx.nd.ones((214748364, 10))
>>> b = a.asnumpy()
>>> b.sum()
2147483600.0
>>> 214748364 * 10
2147483640

In the second case, the answer is approximately correct (slightly off due to floating point arithmetic) but in the first case, something seems to be completely wrong.


#2

I tried your examples on different versions of MXNet on my laptop, and I could reproduced it using mu current version of MXNet - 1.4.0.

If I upgrade MXNet to the master version with pip install mxnet --pre to get mxnet-1.5.0b20181227 the issues are not reproducible anymore. I assume this bug got fixed between these versions.

>>> a = m.nd.ones((214748364, 11))
>>> a # in 1.4.0 this prints matrix full of zeroes instead of ones. That's why later sum() returned 0

[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
<NDArray 214748364x11 @cpu(0)>
>>> b = a.asnumpy()
>>> b.sum()
2362232000.0
>>>

Try to install --pre version of MXNet and let me know if you still see the problem.


#3

It seems that I’m unable to install pre-release versions of mxnet using the --pre command. Is there anything else I need to do first?

I’m running python 2.7.15 in a conda virtual environment:

$ pip install --pre mxnet
Requirement already satisfied: mxnet in /home...
<some lines omitted>

$ python -c "import mxnet; print(mxnet.__version__)"
1.3.1

#4

You need to uninstall your previous version of mxnet with pip uninstall mxnet