Loss not decreasing (Tried a lot of ideas)


Hey Everyone,

I have a pretrained densenet that I extract the penultimate layer from and pass that through varying fully connected layer sizes on some image datasets I’m working with.

I construct the mxnet sym and for this network, and then pass that into my loss function. For some reason, when I’m training, the loss decreases in the first few epochs but then completely stagnates.

I’ve tried experimenting with varying learning rates but that doesn’t help (even on small subsets of the data when I’m trying to do a sanity check and overfit). The data in particular has 12 classes. I also figured that the gradient is 0 potentially in the backward pass and I thought changing ReLU to SoftReLU might help with that (to prevent the zero-ing out), but that didn’t solve anything.

Are there any other ideas that I could try? Feels like I’ve experimented with a bunch of different solutions – not sure if this is just the nature of training models on my dataset?



Hi @pn-train,

As a starting point, just try and train the fully connected layer at the end and don’t finetune the DenseNet. It could be that you’re experiencing “catastrophic forgetting” and your small dataset is causing the learnt features to be overwritten. You should just call the fully connected Block(s) under the autograd.record scope, then call backwards, and also only add the fully connected parameters to the Trainer object.

You might also want to try different optimizers too, but try with SGD for starters.


Oh, and at what point in the network is the gradient first at 0?