Loss not decreasing (Tried a lot of ideas)

pn-train · July 30, 2018, 8:54am

Hey Everyone,

I have a pretrained densenet that I extract the penultimate layer from and pass that through varying fully connected layer sizes on some image datasets I’m working with.

I construct the mxnet sym and for this network, and then pass that into my loss function. For some reason, when I’m training, the loss decreases in the first few epochs but then completely stagnates.

I’ve tried experimenting with varying learning rates but that doesn’t help (even on small subsets of the data when I’m trying to do a sanity check and overfit). The data in particular has 12 classes. I also figured that the gradient is 0 potentially in the backward pass and I thought changing ReLU to SoftReLU might help with that (to prevent the zero-ing out), but that didn’t solve anything.

Are there any other ideas that I could try? Feels like I’ve experimented with a bunch of different solutions – not sure if this is just the nature of training models on my dataset?

Thanks!

thomelane · August 15, 2018, 5:33pm

Hi @pn-train,

As a starting point, just try and train the fully connected layer at the end and don’t finetune the DenseNet. It could be that you’re experiencing “catastrophic forgetting” and your small dataset is causing the learnt features to be overwritten. You should just call the fully connected Block(s) under the autograd.record scope, then call backwards, and also only add the fully connected parameters to the Trainer object.

You might also want to try different optimizers too, but try with SGD for starters.

thomelane · August 15, 2018, 5:33pm

Oh, and at what point in the network is the gradient first at 0?

Topic		Replies	Views
Triplet loss cannot decrease	1	937	April 18, 2018
Training loss never changes but accuracy oscillates	10	1879	September 7, 2018
About stale gradient Gluon	17	3210	October 19, 2020
Multiple losses Gluon	7	3657	June 5, 2018
Adding network gradient to the computational graph Gluon	3	1647	December 17, 2018

Loss not decreasing (Tried a lot of ideas)

Related Topics