Train model with no bias in convolution layer

titikid · December 3, 2018, 7:25am

For specific purpose, i want to remove bias in some convolution layer of “mobilenet_ssd_300” model and training network from scratch. Mxnet and gluoncv use batchnorm layer for faster convergence when training. I tried to set “beta” and “running_mean” term of batchnorm to 0 and lr_mult=0 to make sure that they cannot learn anything. however, in output model, i could see that “running_mean” term was not completely removed. So the model still had a small shift factor. It was not my expectation.
So, how can i completely remove shift factor in batchnorm layer when training?

sad · December 3, 2018, 11:36pm

Hi you can try setting the center parameter in the BatchNorm layer to false with the use_global_stats param also set to False. See https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.BatchNorm for more.

Feel free to post a follow up with more details about your use case and or code sample if this doesn’t fully address your question.

titikid · December 12, 2018, 4:48am

the param use_global_stats is False by default, the “running_mean” is not use center parameter, so the result wasn’t change.
Btw, is it possible to remove bias completely when training with batchnorm?

safrooze · December 12, 2018, 7:56pm

This is in brief how BN works in default mode. In training, it calculates mean and std of each batch and normalizes the batch using these two values. It also updates the running mean and std, but it isn’t used in training. In inference, it only uses the running mean and inference.

This default behavior can be changed by setting use_global_stats to true, in which case BN simply uses the values of running mean and std to normalize the data without modifying anything.

I’m not exactly clear on what you’re trying to do. Are you intending to have BN calculate std, but not subtract mean and only scale by std?

titikid · December 13, 2018, 10:42am

@safrooze
The final target is the model after merge BN to convolution should have only weight term, no bias term. Follow the guild here, the final bias term is . To make it zero, a possible solution is set and to 0. They are corresponding to “beta” and “running_mean” term in MXnet’s BN, that’s right? I could set “beta” term to 0, but “running_mean” term depends on input batch’s image so it’s always different to 0.
I set use_global_stats to True but the ‘running_mean’ of BN still update every epoch.
If i have any error, plz fix for me.

safrooze · December 13, 2018, 7:10pm

Quote from the page you mentioned: “During runtime (test time, i.e., after training), the functinality of batch normalization is turned off and the approximated per-channel mean and variance are used instead.”

Looks like what you’re trying to do is modify the behavior during inference, not training. During inference, running_mean and running variance are used. Nothing is updated. So you can set beta to 0 and running_mean to 0 and there will be no bias term.

In your original post you mentioned that “In output model, I could see that “running_mean” term was not completely removed”. How did you verify that?

For your reference, here is the implementation of BN in CPP code. Specifically this is the part of code when model is in inference mode or use_global_stats is set to True.

titikid · December 24, 2018, 7:37am

In your original post you mentioned that “In output model, I could see that “running_mean” term was not completely removed”. How did you verify that?

i use function mx.nd.load(params_file) to load file .params and it looks like this:

that mean beta is already zero but running_mean isn’t?
i don’t really understand what you mean. But i should merge this model (include .json file and .params) file before inference to make an inference model that doesn’t have BN anymore. If the running_mean is not zeros so my inference model still have bias, is this right?
Anyway thank you for your detailed answers. I will very appreciate your help if you could help me solve this issue.

Topic		Replies	Views
MXNet - Use Batch Norm for Input Scaling	3	4885	April 3, 2019
Mxnet batchnorm with symbol API Discussion	8	738	March 1, 2019
Proper usage of BatchNorm during inference? Discussion python , gluon , docs	5	3704	February 8, 2019
Question about batch normalization Discussion	4	1860	April 24, 2018
Backward of mxnet's network with BatchNorm doesn't have gradient in input layer but has gradient without BatchNorm MXNet Model Server	1	384	June 13, 2019

Train model with no bias in convolution layer

Related Topics