How to eliminated the weight decay on the bias and batch nomalization?

Gary-Deeplearning · August 16, 2019, 6:00am

Hi, guys.

I recently read some papers, and one of their idea is eliminated the weight decay on the bias and batch nomalization. And I use mxnet.symbol api to train the model, but I can’t find any documents or informations to implement that in mxnet. So, I want to ask your guys this forum, thank you so much.

spanev · August 16, 2019, 10:33am

Hi @Gary-Deeplearning

I am not sure I see what you want to achieve exactly.

Can you us give more insights on “eliminated the weight decay on the bias and batch nomalization” and a link to some of these papers?

Gary-Deeplearning · August 16, 2019, 10:47am

Yep, the paper is 《Highly Scalable Deep Learning Training System with Mixed-Precision: Training》
section 4.2

spanev · August 16, 2019, 12:19pm

This might not be the most efficient method, but that should do what is described in 4.2 of https://arxiv.org/pdf/1807.11205.pdf :

params = net.collect_params()
for p_name, p in params.items():
    if p_name.endswith(('_bias', '_gamma', '_beta')):
        p.wd_mult = 0

Gary-Deeplearning · August 16, 2019, 3:04pm

thanks for your sharing, but it seems the way of gluon, not mxnet.symbol or mx.mod?

Topic		Replies	Views
Torch.nn.utils.weight_norm equivalence?	2	677	May 14, 2018
Max-norm constraint / regularizer on different layer how-to	2	1083	February 28, 2019
Running inference with varying input size	3	1139	October 20, 2019
Maximizing with a non-standard probability model Discussion	1	425	November 17, 2017
Does Mxnet have a symbol to support adversarial training	4	495	August 22, 2018

How to eliminated the weight decay on the bias and batch nomalization?

Related Topics