Understanding NDArrayIter


I am trying to understand the implementation of NDArrayIter when using multiple inputs. In the documentation, the following example is given:

data = {‘data1’:np.zeros(shape=(10,2,2)), ‘data2’:np.zeros(shape=(20,2,2))}
label = {‘label1’:np.zeros(shape=(10,1)), ‘label2’:np.zeros(shape=(20,1))}
dataiter = mx.io.NDArrayIter(data, label, 3, True, last_batch_handle=‘discard’)

It seems that the batching dimension is always assumed to be there first, is this correct? How about if I wanted to batch some of the inputs but not others? If I wanted to do inference using the Scala interface, would I use NDArrayIter to build the DataBatch[s] and then use those to construct a custom DataBatch that included the non-batched input? I guess this could be done in a customer DataIter?

  • Yes, the first dimension is assumed to be batch.
  • Gluon lets you change batch size without any effort. For example, the following would work just fine:
# a given batch size (1)
data1 = mx.nd.ones((1, C, W, H))
output1 = net(data1)

# a different batch size (5)
data2 = mx.nd.ones((5, C, W, H))
output2 = net(data2)

You can also implement a custom DataLoader with your own batch_sampler if you want to use different batch sizes.


I don’t believe this is true in the Scala API, or am I incorrect about this?


@nswamy: can you comment on the Scala API?


In Scala the batch size is fixed if you use NDArrayIter, also since In Scala they are symbolic graphs I don’t think you can arbitrarily change the batch size unless you want to create a CustomIterator that pads < batchsize or you rebind on every execution which will take a performance hit.


In a static graph (which applies to Gluon too if you hybridize), the batch size is fixed and memory for the entire forward and backward operation is pre-allocated (or allocated while running the first batch) and doesn’t change during the rest of the execution.

You could either fix a batch size and pad the input with zeros to fill the batch or you can set batch size to 1 and run inference one by one. Which of those will work better for you depends on factors like the batch size of the graph, the average expected batch size while running inference and your model size. It is best to experiment both and pick the one that performs better.


Thanks for your input. This was my understanding, so we break our input into smaller batches so that we can process highly variable input sizes via NDArrayIter; this is all working fine. Can you shed light on why NDArrayIter doesn’t make use of the data layout to understand the batch dimension? It seems to me that this is the exact purpose of data layout. In this case, I could have two inputs, one with a batch layout of, for example, “NTC” and one of “TC” such that only the first is subdivided into batches and the second is given to all prediction calls. I suppose the right was is to implement my own DataIter to do exactly this, is that right?