Accuracy issue on video classification

Why the accuracy of video classification model printed on gluoncv 's website can’t match with what is in the train log ?

For example, the top-1 accuracy of slowfast_4x16_resnet50_kinetics400 on kenetics-400 datasets is 75.3 but that in the train log is 67.1.

Here’s the URL https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

This is because during training, the performance is evaluated on a single video clip from the entire video.

During testing, we evenly select 10 video clips from the entire video, and perform three-crop augmentation technique. This is the standard evaluation technique adopted in the field. Given the fact that we can see more clips (more temporal information) and more spatial crops (more spatial information), we can obtain much better accuracy. Hope this helps.