What happen? One of the data from the same iterator is on the CPU and the other is on the GPU. Is this a bug?


#1

One of the data from the same iterator is on the CPU and the other is on the GPU. Is this a bug?
I don’t know where the problem is, please someone help me!


#2

If you look at the code of the ArrayDataset, line 151 you see:

            if isinstance(data, ndarray.NDArray) and len(data.shape) == 1:
                data = data.asnumpy()

Which means if one of the NDArray has a single dimension it will be converted to numpy. Then the DataLoader will load it as ndarray and by default it will be on the CPU.

Usually that’s what you want since the metrics are usually calculated on CPU. If you want to keep the y on GPU, I recommend adding a dummy extra dimension using .expand_dims().

x = nd.ones((5,6), mx.gpu())
y = nd.ones((5), mx.gpu())
y = y.expand_dims(axis=1)
print(y.shape)
dataset = mx.gluon.data.ArrayDataset(x, y)
dataloader = mx.gluon.data.DataLoader(dataset, batch_size=1)
for data in dataloader:
    print(data)
    break
(5, 1)
[
[[ 1.  1.  1.  1.  1.  1.]]
<NDArray 1x6 @gpu(0)>, 
[[ 1.]]
<NDArray 1x1 @gpu(0)>]

#4

Thank you very much for helping me solve this problem. The explanation is very detailed. Thank you.:+1: