Where is mxnet.io.ImageRecordIter implementation/ documentation?


#1

Hi,

I’m using mxnet.io.ImageRecordIter, which is fast yet not very flexible…

During training I shuffle the batches, for evaluation I want to get specific batch (or images) for visualizations.

In https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.io.ImageRecordIter

I can only find instantiation params, and there is no reference to the source code.

Pointers anyone :slight_smile:?
-Oron


#2

The ImageRecordIter is implemented in C++, you can find the source code here

I would suggest using the Dataset and Dataloader APIs as they are much more flexible. Instead of multi-threading it uses a multi-processing paradigm. Check this tutorial out to learn more, or the API docs

Overall I would suggest splitting your training and evaluation datasets by creating two Datasets and Dataloader, or in your case two ImageRecordIter, one for training and one for evaluation. So that you can specify shuffle=True on the training one, but not on the testing one. And you control what goes in the evaluation dataset in the first place.

Does that answer your question?


#3

Thanks!

P.S
I’ve compared the gluon DataLoader and it turned out to be about 4x slower…


#4

Have you set your Dataloader parameter to num_workers=multiprocessing.cpu_count()?

The 4 times slower seems on point because by default the ImageRecordIter uses 4 threads whilst the DataLoader use a single worker, or more precisely, does not use workers and the processing is done in your current process.