Append records to recordio dataset?

how-to
gluon-cv
#1

Hi dear MXNet community,
Is it possible to append records to a recordIO dataset?
Use case is an image dataset that is continually growing. I’d like to append records to existing recordio dataset, instead of im2rec’ing the whole collection every day
Cheers

#2

Hi,

I don’t think it can be done out of the box, but the im2rec script is short and relatively simple python code so adding an --append option (to .lst, .idx and .rec) seems quite easy.

The drawback of this approach is that the new images will not be shuffled (or only locally), so if your daily delta does not come from a randomly distributed source the training accuracy will be negatively impacted. You can reshuffle the .idx file completely after adding the delta to avoid it, at a (probably) minor performance impact.

Lieven

#3

Alternatively, you could try opening the file as MXRecordIO and write to the end of the file:

record = mx.recordio.MXRecordIO('file.rec', 'w')
record.write(new_data)
record.close()