Asking for the training hyper-parameters for ImageNet-1k


#1

Hello all,

I am using the training script under example/image-classifications to train ResNet50 on ImageNet-1k dataset.
I noticed on the README of example/image-classifications, there is a table listing the validation accuracy for a series of network including ResNet-50, while lack of the hyper-parameters such as batch size, initial LR etc, and details of the training, such as the epoch to LR adjustment and data pre-processing technique.

Is it possible to provide detailed information?

Thanks.

Shufan


#2

Look like @zhreshold is the author of the ImageNet training script. He should be able to provide exact answer. Have you tried using hyper-parameters found in the original ResNet paper? “We use SGD with a mini-batch size of 256. The learning rate starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60 × 104 iterations. We use a weight decay of 0.0001 and a momentum of 0.9.” Looking at the loss graph in the paper, the learning rate is dropped at iteration of ~150,000 and ~300,000.


#3

You can find them in the logs


#4

we recently transfered focus to gluon interface for training imagenet https://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html.

If you are interested, you can always export gluon models to symbols. https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=export#mxnet.gluon.HybridBlock.export