When training deep CNN, a common way is to use SGD with momentum with a “step” learning rate policy (e.g. learning rate set to be 0.1,0.01,0.001… at different stages of training).But I encounter an unexpected phenomenon when training with this strategy under MXNet.
That is the periodic training loss value https://user-images.githubusercontent.com/26757001/31327825-356401b6-ad04-11e7-9aeb-3f690bc50df2.png
The above is the training loss at a fixed learning rate 0.01, where the loss is decreasing normally https://user-images.githubusercontent.com/26757001/31327872-8093c3c4-ad04-11e7-8fbd-327b3916b278.png
However, at the second stage of training (with lr 0.001) , the loss goes up and down periodically, and the period is exactly an epoch
So I thought it might be the problem of data shuffling, but it cannot explain why it doesn’t happen in the first stage. Actually I used
ImageRecordIter as the
DataIter and reset it after every epoch, is there anything I missed or set mistakenly?
train_iter = mx.io.ImageRecordIter( path_imgrec=recPath, data_shape=dataShape, batch_size=batchSize, last_batch_handle='discard', shuffle=True, rand_crop=True, rand_mirror=True)
The codes for training and loss evaluation:
while True: train_iter.reset() for i,databatch in enumerate(train_iter): globalIter += 1 mod.forward(databatch,is_train=True) mod.update_metric(metric,databatch.label) if globalIter % 100 == 0: loss = metric.get() metric.reset() mod.backward() mod.update()
Actually the loss can converge, but it takes too long. I’ve suffered from this problem for a long period of time, on different network and different datasets. I didn’t have this problem when using Caffe. Is this due to the implementation difference?
Originally asked here: stackoverflow/questions/46704238/mxnetperiodic-loss-value-when-training-with-step-learning-rate-policy