Training with one batch gives different training/validation accuracies when shuffled

fanny · October 16, 2017, 9:02pm

In a research problem we’re using very small sets of images to fine-tune a resnet with model.fit and some things start to happen that I don’t understand - in particular in the case when I have number of images matching the batch size (i.e. 256 images and batch size 256) I find the following

If I shuffle the dataset differently (shuffle lst file, create rec file and not shuffle during training), I get a discrepancy in training and validation error (lowest throughout epoch) up to 10 %, example points: 0.88 train accuracy, 0.81 validation accuracy vs. 0.73 train accuracy, 0.71 validation accuracy. If I have one batch, logically it shouldn’t make a difference whether or not and how I shuffle? What could be causing the randomness here? This is trained on one GPU.
If I train with different number of gpus on the same rec file (without shuffling) I also get a discrepancy of about 5-10%

Another fact is that if I run training on the same file with fixed the number of gpus multiple times I do not get any variation.

Does anyone have an idea what could be behind this behavior and how one could mitigate that? I was going to try to just always have a sufficient number of batches (i.e. in this case reduce batch size) but it would be nice if it’s possible to train with one batch as well if required?

madjam · October 18, 2017, 4:44am

That is indeed odd. What is also odd is that your training accuracy is worse than validation accuracy. Is it possible for you to share a (simplified) code sample that reproduces this problem?

abieler · October 18, 2017, 9:00am

@fanny can you provide the code that is producing these results?

Topic		Replies	Views
Batch Norm And Batch Size 1 Recommendations Discussion	0	371	February 13, 2020
Optimal hyperparameters for training resnet34_v1 on ImageNet? Discussion	4	2170	July 19, 2018
Finetuneing a pretrained ResNet50_v1d in gluoncv Gluon	1	454	December 31, 2018
Batch formation from .rec files Gluon	3	830	January 21, 2019
.rec ImageRecordIter returning different images than the original JPGs Discussion	3	982	November 21, 2017

Training with one batch gives different training/validation accuracies when shuffled

Related Topics