Periodic Loss Value when training with “step” learning rate policy

stackoverflowbot · October 17, 2017, 9:21am

When training deep CNN, a common way is to use SGD with momentum with a “step” learning rate policy (e.g. learning rate set to be 0.1,0.01,0.001… at different stages of training).But I encounter an unexpected phenomenon when training with this strategy under MXNet.

That is the periodic training loss value https://user-images.githubusercontent.com/26757001/31327825-356401b6-ad04-11e7-9aeb-3f690bc50df2.png

The above is the training loss at a fixed learning rate 0.01, where the loss is decreasing normally https://user-images.githubusercontent.com/26757001/31327872-8093c3c4-ad04-11e7-8fbd-327b3916b278.png

However, at the second stage of training (with lr 0.001) , the loss goes up and down periodically, and the period is exactly an epoch

So I thought it might be the problem of data shuffling, but it cannot explain why it doesn’t happen in the first stage. Actually I used ImageRecordIter as the DataIter and reset it after every epoch, is there anything I missed or set mistakenly?

train_iter = mx.io.ImageRecordIter(
    path_imgrec=recPath,
    data_shape=dataShape,
    batch_size=batchSize,
    last_batch_handle='discard',
    shuffle=True,
    rand_crop=True,
    rand_mirror=True)

The codes for training and loss evaluation:

while True:
    train_iter.reset()
    for i,databatch in enumerate(train_iter):
                globalIter += 1
        mod.forward(databatch,is_train=True)
        mod.update_metric(metric,databatch.label)
        if globalIter % 100 == 0:
                    loss = metric.get()[1]
                    metric.reset()
                mod.backward()
                mod.update()

Actually the loss can converge, but it takes too long. I’ve suffered from this problem for a long period of time, on different network and different datasets. I didn’t have this problem when using Caffe. Is this due to the implementation difference?

Originally asked here: stackoverflow/questions/46704238/mxnetperiodic-loss-value-when-training-with-step-learning-rate-policy

madjam · October 17, 2017, 9:18pm

Can you reproduce this for a simple mnist example? Can you post the entire script? (the indentation seems to be a bit off in what you posted above)

astonzhang · October 21, 2017, 5:52am

What is your momentum parameter value? Is it default value 0.9? What is your batch size?

Jerry · October 23, 2017, 3:33am

what is your weight decay setting? I wonder if the weight decay is too large, once the lr is reduced, the weight decay takes over the learning and cause the loss to shoot up. Once the loss is high enough, the normal loss is large enough to overcome weight decay, combining this with momentum, it may oscillate.

Topic		Replies	Views
Training loss never changes but accuracy oscillates	10	1862	September 7, 2018
How to display loss while training? Gluon	2	952	July 15, 2019
Calculating loss b/w training Performance	2	830	May 22, 2019
Loss not decreasing (Tried a lot of ideas) Discussion	2	1058	August 15, 2018
How to update learning rate during training with symbol programming Discussion	4	642	November 3, 2018

Periodic Loss Value when training with “step” learning rate policy

Related Topics