Optimal hyperparameters for training resnet34_v1 on ImageNet?

Hi,

I’ve been training ImageNet on resnet34_v1 on MXNet ~1.1.0, and I can get at best around 14% training accuracy after 1 epoch of training. However, with a different framework (BIDMach), I can get 29% training accuracy on the same. The command I am using to train is:

python train_imagenet.py --network resnet-v1 --num-layers 34 --data-train s3://bidmach/mxnet/train/ --data-val s3://bidmach/mxnet/val/ --batch-size 64 --model /data/mxnet/resnetv1-34 --num-epochs 1 --disp-batches 1 --gpus 0 --kv-store device --lr 0.02

Has anyone achieved better results with a different set of hyperparameters? If so I’d like to know. Thanks.

For ResNet 34,

  • Start with lr=0.1, mom=0.9, wd=0.0001, batch_size=512
  • Decrease learning rate by by 0.1 at epochs 60, 75 and 90

After 1 epoch, you should get around 0.16 Top-1 accuracy and 0.36 Top-5 accuracy on training data.

After 128 epochs, you should get around 0.727 Top-1 accuracy and 0.911 Top-5 accuracy on validation data.

Training log for ResNet 34 on ImageNet is available here.

More info here.

Hi,

Thanks for the information. Do you know of a way to get results better than that (say, >20% Top-1 accuracy after epoch 1)?

Easiest way is to run two epochs and you should get over 20% easily. Any reason why you are particularly interested in the accuracy after the first epoch?

You can also try some adaptive optimizers like adam to see if that works faster at initial stages.

Hi,

I’m benchmarking MXNet against a different machine learning framework which is able to obtain 29% top-1 training accuracy after the first epoch. Clearly, achieving that kind of accuracy is possible. But I have not found a set of hyperparameters for MXNet which are able to achieve the same accuracy. Is there something else I could be doing wrong?

Thanks