How to eliminated the weight decay on the bias and batch nomalization?

Hi, guys.

I recently read some papers, and one of their idea is eliminated the weight decay on the bias and batch nomalization. And I use mxnet.symbol api to train the model, but I can’t find any documents or informations to implement that in mxnet. So, I want to ask your guys this forum, thank you so much.

Hi @Gary-Deeplearning

I am not sure I see what you want to achieve exactly.

Can you us give more insights on “eliminated the weight decay on the bias and batch nomalization” and a link to some of these papers?

Yep, the paper is 《Highly Scalable Deep Learning Training System with Mixed-Precision: Training》
section 4.2

This might not be the most efficient method, but that should do what is described in 4.2 of https://arxiv.org/pdf/1807.11205.pdf :

params = net.collect_params()
for p_name, p in params.items():
    if p_name.endswith(('_bias', '_gamma', '_beta')):
        p.wd_mult = 0

thanks for your sharing, but it seems the way of gluon, not mxnet.symbol or mx.mod?