Train speed is weird!


#1

When I trained a face recognition model using resnet100 as base network in nvidia P40, the training speed can reach 400 samples per second, however, when I using mobieFaceNet to train the model, the speed only can reach 100 samples per second, it is so weird because mobieFaceNet ’ network structure is much lighter than resnet 100. Anyone has some ideas? Besides, the code and training environment is the same.


#2

bare in mind that network’s theoretical FLOP does not 100% reflect to real throughput.
Resnet is the most optimized network, but I have no comment on the mobieFaceNet.