Integer labels in NDArrayIter


#1

Is there a way to make NDArrayIter spit out integer labels? It always converts labels to float, and in the example below I can’t convert back as some significant digits get lost. Thanks!

from mxnet import nd
import numpy as np

X_train = np.array([1,2,3,4,5])
Y_train = np.array([1,2,3,4,5111111122])

train_data = NDArrayIter(X_train, Y_train, 2, shuffle=False)

train_data.label[0]

Output:
(‘softmax_label’,
[ 1.00000000e+00 2.00000000e+00 3.00000000e+00 4.00000000e+00
5.11111117e+09]
<NDArray 5 @cpu(0)>)


#2

You can use mxnet ndarray instead of numpy ndarray for Y_train and get integer labels. Below is the example:

import mxnet as mx
import numpy as np


X_train = mx.nd.array([1,2,3,4,5])
Y_train = mx.nd.array([1,2,3,4,5111111122], dtype='int64')

 train_data = mx.io.NDArrayIter(X_train, Y_train, 2, shuffle=False)
 print train_data.label[0]

AFAIK, This is not possible with numpy ndarrays since the dtype for the numpy ndarray is not passed when contructing the mxnet ndarray from numpy ndarray.


#3

Thanks, it worked! However, for some strange reason, it switched to floats again when I changed shuffle=True


#4

This is because when shuffle is True, it doesn’t pass the dtype of the ndarray to array util. Thanks for letting us know, I have opened a MXNet issue for this. (https://github.com/apache/incubator-mxnet/issues/8430)


#5

Thanks @anirudh2290! What do you think of this workaround:

import mxnet as mx
from mxnet.gluon.data.dataset import ArrayDataset

X_train = nd.array([[1],[2],[3],[4],[5]], dtype=‘int64’)
Y_train = nd.array([[1],[2],[3],[4],[5111111122]], dtype=‘int64’)

dl = mx.gluon.data.DataLoader(ArrayDataset(X_train, Y_train), 2, shuffle=True)

for (d1, d2) in dl:
print(d2)

Output:
[[5111111122]
[ 3]]
<NDArray 2x1 @cpu(0)>

[[1]
[4]]
<NDArray 2x1 @cpu(0)>

[[2]]
<NDArray 1x1 @cpu(0)>


#6

@avolozin Yep, That works too! You can also use a 1D array that you used in your initial example.