Can BlockGrad and weight decay be used together?


I want to use some pre-trained model, like ResNet or VGG, freeze all convolutional layers and replace fully connected layers on top of them. I’m pretty sure that features that conolutional layers provide are good for my task, so I put mx.sym.BlockGrad between these two parts. I also don’t want my model to overfit, so I choose some weight decay for optimizer. Am I correct that this will lead to all weights of convolutional layers to become zero eventually? If yes, how can I apply weight decay only to some layers?


You can use the fixed_param_names parameter while creating Module. You can provide a regex matching the parameter names you want to freeze. Check this example.