Training metrics are not equal to validation metrics, even when using the same data



I was training a deep matrix factorization on a large dataset and kept finding the same troubling pattern. I am using the api and feeding data through the CSViter.

The validation metrics were always substantially different from the training metrics. I noticed that throughout the the fist epoch, the training batches steadily improved. At this point, all batches are “unseen data” so I was perplexed that I at least could not approximate these results if I stopped learning early, by only training on the first 3000 batches. In general, I found that training cross entropy decreased from 2.2 to 0.34 in this time, but validation data was always ~ 0.74.

More importantly, I found that validation cross entropy remained around 0.74 even when I set the validation data equal to the training data.

I’ll note that I have only been able to recover validation results comparable to the training data when I limit the validation data to just a single batch size. This is the case when I use true hold-out data, as well as when I disguise training data as validation data.

I find this finding incredibly disturbing - it seems to suggest that the and the CSViter api are not behaving appropriately.



Would you be able to post a snippet of your training code and the code you’re using to calculate the metrics? There might be a different reason as to why you’re seeing the issue you have and another pair of eyes might help spot it.