Training with one batch gives different training/validation accuracies when shuffled


#1

In a research problem we’re using very small sets of images to fine-tune a resnet with model.fit and some things start to happen that I don’t understand - in particular in the case when I have number of images matching the batch size (i.e. 256 images and batch size 256) I find the following

  1. If I shuffle the dataset differently (shuffle lst file, create rec file and not shuffle during training), I get a discrepancy in training and validation error (lowest throughout epoch) up to 10 %, example points: 0.88 train accuracy, 0.81 validation accuracy vs. 0.73 train accuracy, 0.71 validation accuracy. If I have one batch, logically it shouldn’t make a difference whether or not and how I shuffle? What could be causing the randomness here? This is trained on one GPU.
  2. If I train with different number of gpus on the same rec file (without shuffling) I also get a discrepancy of about 5-10%

Another fact is that if I run training on the same file with fixed the number of gpus multiple times I do not get any variation.

Does anyone have an idea what could be behind this behavior and how one could mitigate that? I was going to try to just always have a sufficient number of batches (i.e. in this case reduce batch size) but it would be nice if it’s possible to train with one batch as well if required?


#2

That is indeed odd. What is also odd is that your training accuracy is worse than validation accuracy. Is it possible for you to share a (simplified) code sample that reproduces this problem?


#3

@fanny can you provide the code that is producing these results?