Clarifications about Image Data Augmentation

FraPochetti · November 19, 2017, 10:44pm

As far as I understand, image data augmentation techniques (i.e. random cropping, mirroring, shearing etc) are commonly used in DL to artificially increase the training set size.
This concept is pretty clear to me.
Now, putting that in practice, say I have a 100-image dataset. If I apply mirroring to every one of them, I would expect the dataset to double in size, to 200 in total.
This is apparently not happening when I use ImageRecordIter.
So, iterating through the training set in this way

train_iter = mx.io.ImageRecordIter(path_imgrec='./train.rec',
                                   data_shape=(3, 75, 75),
                                   shuffle=True,
                                   batch_size=1)

or in this way

train_iter = mx.io.ImageRecordIter(path_imgrec='./train.rec',
                                   data_shape=(3, 75, 75),
                                   rand_crop=True,
                                   shuffle=True,
                                   batch_size=1,
                                   max_random_scale=1.5,
                                   min_random_scale=0.75,
                                   rand_mirror=True)

returns exactly the same number of images.
My confusion is clearly due to the fact that I don’t have a clear understanding of what happens under the hood when calling a batch and applying augmentation techniques over it.
Can someone provide help me with this one, please?

safrooze · November 21, 2017, 5:15pm

According to the ImageRecordIter API documentation:
rand_mirror (boolean, optional, default=0) – Whether to randomly mirror images or not. If true, 50% of the images will be randomly mirrored (flipped along the horizontal axis)

Setting this argument to true, simply means that for every image a random boolean is drawn and the image is mirrored based on the outcome. It does not add to your dataset. However, statistically speaking, iterating over your dataset twice would be equivalent of doubling your data with mirroring. Given that neural networks are all about statistics, there is absolutely no need to double your data, just iterate through more epochs.

LucinyaLi · May 6, 2019, 11:35am

Actually, I think your understanding is right when the data augmentation is done in the Process of Network Iterative Training. I mean, on each iterative update, the dataset (the train data or the mini-batch train data from the train data) is extract from the original data firstly, and then does rand_mirror or something else, finally, the transformed data is used to train the net and update the parameters.
But, some codes I saw is not like that.
For example, the code of SSD:
~/mxnet-ssd-master/train/train_net.py

     train_iter = DetRecordIter(train_path, batch_size, data_shape, mean_pixels=mean_pixels,
                           label_pad_width=label_pad_width, path_imglist=train_list, **cfg.train)
     net = get_symbol_train(network, data_shape[1], num_classes=num_classes,
                       nms_thresh=nms_thresh, force_suppress=force_suppress, nms_topk=nms_topk, minimum_negative_samples=min_neg_samples)
     mod.fit(train_iter, val_iter, eval_metric=MultiBoxMetric(), validation_metric=valid_metric, batch_end_callback=batch_end_callback, eval_end_callback=eval_end_callback, epoch_end_callback=epoch_end_callback,  optimizer=opt, optimizer_params=opt_params, begin_epoch=begin_epoch, num_epoch=end_epoch,  initializer=mx.init.Xavier(), arg_params=args, aux_params=auxs, allow_missing=True,  monitor=monitor)

In the DetRecordIter, the data augmentation is done.
From the code, what I understand is that at first we do augmentation on the original train data(in here, if the original data has 1000 examples, then the transformed train data has also 1000 examples), then the transfromed data is divided into 100 batches if the batch_size is 10, the 100 batches are fed into the net for all iterative updata process.

I am confused. Maybe I have a wrong understanding.
How do you think on this? @safrooze @zhreshold @sad

zhreshold · May 6, 2019, 6:06pm

DetRecordIter is an iterator with capabilities to do random augmentation.
Rather than producing fixed number of images, every time you call DetRecordIter.next(), it will and generate transformed image. When each epoch is finished, the training module will call DetRecordIter.reset() to refresh the iterator and start the next epoch.

LucinyaLi · May 7, 2019, 6:59am

Ooooooo! Thanks! It is my fault to have a wrong understanding.
I have found the code train_data.reset() in the mod.fit.

Topic		Replies	Views
Data Augmentation for Semantic Segmentation Discussion	8	4274	July 6, 2018
How to load 2 images in model Discussion	0	465	March 23, 2021
.rec ImageRecordIter returning different images than the original JPGs Discussion	3	979	November 21, 2017
Inception scale and aspect ratio augmentation with ImageRecordIter Discussion	0	755	March 6, 2018
Training with one batch gives different training/validation accuracies when shuffled	2	580	October 18, 2017

Clarifications about Image Data Augmentation

Related Topics