Learning rate doesnt decrease after resuming training in MXNets Gluon example

Master · January 14, 2019, 4:43am

Hi everyone,
My training stopped for some reason, and now that I want to resume it, the learning rate wont change!
I am following the Classification exmple here.
I believe I set all paramters correctly, which by the way is as follows :

DTYPE=float16 
BATCHSIZE=384 
WORKER=20
EPOCH=187
CHECKPOINT=params_model_mixup/0.3399-imagenet-186-best.states
PARAMS=params_model_mixup/0.3399-imagenet-186-best.params

python train_imagenet.py \
  --rec-train /media/void/SSD/ImageNet_DataSet/train/rec_train/train.rec --rec-train-idx /media/void/SSD/ImageNet_DataSet/train/rec_train/train.idx \
  --rec-val /media/void/SSD/ImageNet_DataSet/train/rec_val/val.rec --rec-val-idx /media/void/SSD/ImageNet_DataSet/train/rec_val/val.idx \
  --model model --mode hybrid \
  --lr 0.4 --lr-mode cosine --num-epochs 200 --batch-size $BATCHSIZE --num-gpus 1 -j $WORKER \
  --use-rec --dtype $DTYPE --warmup-epochs 0 --no-wd --label-smoothing --mixup \
  --save-dir params_model_mixup \
  --logging-file model_mixup.log --resume-states $CHECKPOINT --resume-params $PARAMS --resume-epoch $EPOCH

As you can see below, the learning rate wont change! :

|Epoch[187] Batch [49]|Speed: 492.394147 samples/sec|rmse=0.019614|lr=0.004371|
|Epoch[187] Batch [99]|Speed: 603.372949 samples/sec|rmse=0.019578|lr=0.004371|
|Epoch[187] Batch [149]|Speed: 604.314057 samples/sec|rmse=0.019593|lr=0.004371|

What am I missing here?
any help is greatly appreciated

thomelane · January 15, 2019, 12:26am

You don’t seem to be doing anything obviously wrong here. Set a breakpoint (using import pdb; pdb.set_trace() or otherwise) on line 359 in train_imagenet.py script to confirm that the learning rate schedule is being updated.

lr_scheduler.update(i, epoch)

Topic		Replies	Views
Gluon: Per-layer learning rate for fine tuning a pretrained network	1	1011	November 27, 2018
Load checkpoint and train Gluon	1	1275	July 19, 2019
How to train specific layers using gluon with different learning rate?	1	677	January 14, 2019
Help with simple classification Gluon	3	326	September 18, 2020
There are some question during the training process Discussion	1	458	June 1, 2018

Learning rate doesnt decrease after resuming training in MXNets Gluon example

Related Topics