I have a pretrained densenet that I extract the penultimate layer from and pass that through varying fully connected layer sizes on some image datasets I’m working with.
I construct the mxnet sym and for this network, and then pass that into my loss function. For some reason, when I’m training, the loss decreases in the first few epochs but then completely stagnates.
I’ve tried experimenting with varying learning rates but that doesn’t help (even on small subsets of the data when I’m trying to do a sanity check and overfit). The data in particular has 12 classes. I also figured that the gradient is 0 potentially in the backward pass and I thought changing ReLU to SoftReLU might help with that (to prevent the zero-ing out), but that didn’t solve anything.
Are there any other ideas that I could try? Feels like I’ve experimented with a bunch of different solutions – not sure if this is just the nature of training models on my dataset?