.rec ImageRecordIter returning different images than the original JPGs

FraPochetti · November 17, 2017, 2:19pm

I am having trouble understanding how .rec files are generated and what they return.
Here what I mean.
I have followed this wiki page, i.e. downloaded the caltech data, generated a .lst and .rec file and start iterating over it.

So, here what happens

train_iter = mx.io.ImageRecordIter(path_imgrec="./caltech.rec", 
                                    data_shape=(3, 227, 227), 
                                    batch_size=4,
                                    resize=256) 

for batch in train_iter:
    print(batch.data[0].shape)
    print(batch.label[0])
    break

which correctly returns

(4L, 3L, 227L, 227L)

[ 42.  91.  62.   4.]
<NDArray 4 @cpu_pinned(0)>

but then when I visualize one image (the fourth in the batch)

plt.imshow(batch.data[0][3].asnumpy().reshape(227, 227, 3))

I get this thing

Screenshot_24

which is very different than the original 274 x 184 pixels JPG

Screenshot_25

Now, except for the obviously different shape, I have no clue why ImageRecordIter returns the first image.
I tried digging into im2rec.py (which is the script translating JPGs into .lst and then .rec) but I cannot figure out what is going on there.

Can anybody help, please?

eric-haibin-lin · November 17, 2017, 10:45pm

Hi, did you look at the note for data loading? https://mxnet.incubator.apache.org/architecture/note_data_loading.html

FraPochetti · November 18, 2017, 2:53pm

Thanks for the link. The doc is really interesting and helpful. Still I cannot find an explanation to why an image gets processed into a record containing 9 cropped copies of the original (with completely different colors too). I apologize for the silly questions here, but I am rather new to CV.
@eric-haibin-lin, am I missing anything?
Thanks again!

safrooze · November 21, 2017, 8:33pm

I think this is a common reshape and moveaxis confusion (and its variants swapaxes, rollaxes etc.). batch.data[0][3].asnumpy() returns a numpy ndarray with shape (3, 277, 277). numpy.reshape maintains the order of data and only interprets the data differently. In this case, your data is in column first, row second, color third order (if that makes sense!). By reshaping it to (277, 277, 3), you’re telling numpy to simply reinterpret the data as color first, column second, row third order. So, for example, pixels at locations (0,0), (0,1), and (0,2) for Red channel in original image will be interpreted as Red, Green and Blue channels for location (0,0) and you’d get a nice 3x3 image in your imshow!

To fix this, you’d need to call:
plt.imshow(np.moveaxis(batch.data[0][3].asnumpy(), 0, -1))

P.S. I haven’t actually tried this, but I’m fairly confident it should work. Let me know if it doesn’t.

Topic		Replies	Views
Read images with rec format	1	488	November 9, 2018
Documentation for Im2Rec.py? Discussion	3	1355	October 28, 2018
Image list file format reqd for Object Detection Dataset	0	669	May 11, 2018
A strange bug when loading image record file Discussion	1	405	August 20, 2018
About the data translating tool im2rec, is there any option to add new images to the existing rec file?	1	477	June 29, 2018

.rec ImageRecordIter returning different images than the original JPGs

Related Topics