.rec ImageRecordIter returning different images than the original JPGs


#1

I am having trouble understanding how .rec files are generated and what they return.
Here what I mean.
I have followed this wiki page, i.e. downloaded the caltech data, generated a .lst and .rec file and start iterating over it.

So, here what happens

train_iter = mx.io.ImageRecordIter(path_imgrec="./caltech.rec", 
                                    data_shape=(3, 227, 227), 
                                    batch_size=4,
                                    resize=256) 

for batch in train_iter:
    print(batch.data[0].shape)
    print(batch.label[0])
    break

which correctly returns

(4L, 3L, 227L, 227L)

[ 42.  91.  62.   4.]
<NDArray 4 @cpu_pinned(0)>

but then when I visualize one image (the fourth in the batch)

plt.imshow(batch.data[0][3].asnumpy().reshape(227, 227, 3))

I get this thing

Screenshot_24

which is very different than the original 274 x 184 pixels JPG

Screenshot_25

Now, except for the obviously different shape, I have no clue why ImageRecordIter returns the first image.
I tried digging into im2rec.py (which is the script translating JPGs into .lst and then .rec) but I cannot figure out what is going on there.

Can anybody help, please?


#2

Hi, did you look at the note for data loading? https://mxnet.incubator.apache.org/architecture/note_data_loading.html


#3

Thanks for the link. The doc is really interesting and helpful. Still I cannot find an explanation to why an image gets processed into a record containing 9 cropped copies of the original (with completely different colors too). I apologize for the silly questions here, but I am rather new to CV.
@eric-haibin-lin, am I missing anything?
Thanks again!


#4

I think this is a common reshape and moveaxis confusion (and its variants swapaxes, rollaxes etc.). batch.data[0][3].asnumpy() returns a numpy ndarray with shape (3, 277, 277). numpy.reshape maintains the order of data and only interprets the data differently. In this case, your data is in column first, row second, color third order (if that makes sense!). By reshaping it to (277, 277, 3), you’re telling numpy to simply reinterpret the data as color first, column second, row third order. So, for example, pixels at locations (0,0), (0,1), and (0,2) for Red channel in original image will be interpreted as Red, Green and Blue channels for location (0,0) and you’d get a nice 3x3 image in your imshow!

To fix this, you’d need to call:
plt.imshow(np.moveaxis(batch.data[0][3].asnumpy(), 0, -1))

P.S. I haven’t actually tried this, but I’m fairly confident it should work. Let me know if it doesn’t.