How to split recordIO file?

How to split an existing .rec file into a train and val files?

I think there is no easy way to split an existing .rec file. You can have a look here: https://mxnet.incubator.apache.org/architecture/note_data_loading.html under " Access Arbitrary Parts Of Data" You could customize a data loader and then use InputSplit from dmlc-core. Or you could use mxnet.recordio.MXIndexedRecordIO and read.idx() to only access records with a certain id. https://mxnet.incubator.apache.org/api/python/io/io.html#mxnet.recordio.MXIndexedRecordIO
But the easiest would be to recreate the rec file with for instance im2rec --train-ratio 0.7 --test-ratio 0.2.

1 Like