I was training a deep matrix factorization on a large dataset and kept finding the same troubling pattern. I am using the module.fit api and feeding data through the CSViter.
The validation metrics were always substantially different from the training metrics. I noticed that throughout the the fist epoch, the training batches steadily improved. At this point, all batches are “unseen data” so I was perplexed that I at least could not approximate these results if I stopped learning early, by only training on the first 3000 batches. In general, I found that training cross entropy decreased from 2.2 to 0.34 in this time, but validation data was always ~ 0.74.
More importantly, I found that validation cross entropy remained around 0.74 even when I set the validation data equal to the training data.
I’ll note that I have only been able to recover validation results comparable to the training data when I limit the validation data to just a single batch size. This is the case when I use true hold-out data, as well as when I disguise training data as validation data.
I find this finding incredibly disturbing - it seems to suggest that the module.fit and the CSViter api are not behaving appropriately.