Efficiently accessing arbitrary NDArray batches in mxnet

pbruneau · October 22, 2018, 9:04am

Fitting convnets such as Resnet and VGG benefits from the ImageRecordIter python class, that allows efficiently loading batches from large collections of RGB images stored in RecordIO .rec files,

Does anybody know about equivalent facilities for large arbitrary input 2D or 3D matrices (for 2D, rows = items and cols = features, + channels in 3D)?

NDArrayIter requires loading the whole dataset in memory, which is to be avoided in my case (>40Gb data file). CSVIter does not allow straightforward shuffling, and works only for 2D matrices.

x110 · October 22, 2018, 9:58am

You can develop your own data iterator. Check out this tutorial

pbruneau · October 22, 2018, 11:23am

Without extensive details, let’s say my input data is generated by expansion out of some simpler (not image-y) data table.

I already created my own iterator by adapting SimpleIter (from https://mxnet.incubator.apache.org/tutorials/basic/data.html) to my needs. Its next method generates the expected data batch on the fly from the simpler table.

The problem is that this is highly inefficient - a learning algorithm using this iterator overall spends almost all its time in the next method.

Implementing on the basis of mxnet.recordio.MXRecordIO (saving expanded .rec file, and loading it when learning) looks like the way to go then, indeed - as confirmed by your link. However adapting it to a general NDArray context seems to require a good deal of implementation, while multi-threaded ready-to-use facilities are available for image collections.

So instead of going for reinventing the wheel right away, my question was rather if I missed the equivalent of im2rec.py + ImageRecordIter for general NDArrays, or if indeed the only way was DIY.

x110 · October 22, 2018, 11:46am

I am not sure if this is the only way. When I had a similar problem, my approach was on the basis of mxnet.recordio.MXRecordIO

Topic		Replies	Views
Array newbie Discussion	0	296	February 21, 2022
Is there a way to convert Dataset to NDArray?	3	769	July 3, 2018
DataBatch index field, random shuffling and custom iterators Discussion	2	1089	November 27, 2017
How to pass a vector(ndarray) using iterator without including it as input data for the neural network? Discussion	2	332	April 24, 2019
Reuse memory of mxnet::cpp::NDArray Discussion cpp , performance	20	1637	March 4, 2019

Efficiently accessing arbitrary NDArray batches in mxnet

Related Topics