TypeError: 'DataBatch' object is not iterable


#1

Hi all,
I want to run a simple ConvNet on a dataset but I am not sure what is happening with the data iterator.
Here the tutorial I am following.
The linked code runs on MNIST, but I don’t have a clean set of images to process so I had to re-invent the data ingestion part. I managed to turn my inputs in numpy.ndarrays of the following shape (basically pictures with 2 channels only)

>>> label.shape
(1604,)
>>> data.shape
(1604, 2, 75, 75)

then I run the following

train_data = mx.io.NDArrayIter(data=data, label=label, batch_size=64)

and after that, basically, copy-paste from the tutorial

num_fc = 512
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
    net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
    # The Flatten layer collapses all axis, except the first one, into one axis.
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_fc, activation="relu"))
    net.add(gluon.nn.Dense(num_outputs))

net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1})

epochs = 1
smoothing_constant = .01

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(data.shape[0])

        ##########################
        #  Keep a moving average of the losses
        ##########################
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0))
                       else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)

which returns TypeError: 'DataBatch' object is not iterable
I have checked around but cannot figure out what is going wrong.
According to the doc, NDArrayIter is indeed an iterator and indeed the following works

for batch in train_data:
    print batch.data[0].asnumpy()
    batch.data[0].shape

I am sure I am doing something very silly here.
Any ideas please?


#2

train_data = mx.io.NDArrayIter(data=data, label=label, batch_size=64)

Is your data numpy array, you should use MXNet NDArray which is different from numpy array


#3

The thing is that when you go through items in NDArrayIter in a for loop, you get not a tuple of data and label, but DataBatch object. DataBatch contains both data and label, so you can fix your code if you replace the loop part with:

 for i, batch in enumerate(train_data):
    data = batch.data[0].as_in_context(ctx)
    label = batch.label[0].as_in_context(ctx)

#4

Of course! I should have figured that out from the last piece of code I posted

for batch in train_data:
    print batch.data[0].asnumpy()
    batch.data[0].shape

Thanks for the help!