I want to use some pre-trained model, like ResNet or VGG, freeze all convolutional layers and replace fully connected layers on top of them. I’m pretty sure that features that conolutional layers provide are good for my task, so I put mx.sym.BlockGrad between these two parts. I also don’t want my model to overfit, so I choose some weight decay for optimizer. Am I correct that this will lead to all weights of convolutional layers to become zero eventually? If yes, how can I apply weight decay only to some layers?