Asking for the training hyper-parameters for ImageNet-1k


Hello all,

I am using the training script under example/image-classifications to train ResNet50 on ImageNet-1k dataset.
I noticed on the README of example/image-classifications, there is a table listing the validation accuracy for a series of network including ResNet-50, while lack of the hyper-parameters such as batch size, initial LR etc, and details of the training, such as the epoch to LR adjustment and data pre-processing technique.

Is it possible to provide detailed information?




Look like @zhreshold is the author of the ImageNet training script. He should be able to provide exact answer. Have you tried using hyper-parameters found in the original ResNet paper? “We use SGD with a mini-batch size of 256. The learning rate starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60 × 104 iterations. We use a weight decay of 0.0001 and a momentum of 0.9.” Looking at the loss graph in the paper, the learning rate is dropped at iteration of ~150,000 and ~300,000.


You can find them in the logs


we recently transfered focus to gluon interface for training imagenet

If you are interested, you can always export gluon models to symbols.