Append records to recordio dataset?

Hi dear MXNet community,
Is it possible to append records to a recordIO dataset?
Use case is an image dataset that is continually growing. I’d like to append records to existing recordio dataset, instead of im2rec’ing the whole collection every day
Cheers

Hi,

I don’t think it can be done out of the box, but the im2rec script is short and relatively simple python code so adding an --append option (to .lst, .idx and .rec) seems quite easy.

The drawback of this approach is that the new images will not be shuffled (or only locally), so if your daily delta does not come from a randomly distributed source the training accuracy will be negatively impacted. You can reshuffle the .idx file completely after adding the delta to avoid it, at a (probably) minor performance impact.

Lieven

Alternatively, you could try opening the file as MXRecordIO and write to the end of the file:

record = mx.recordio.MXRecordIO('file.rec', 'w')
record.write(new_data)
record.close()

When you open with “a” mode , the write position will always be at the end of the file (an append). There are other permutations of the mode argument for updating (+), truncating (w) and binary (b) mode but starting with just “a” is your best. If you want to seek through the file to find the place where you should insert the line, use ‘r+’.

The following code append a text in the existing file:

with open("index.txt", "a") as myfile:
    myfile.write("text appended")