Train model with no bias in convolution layer


#1

For specific purpose, i want to remove bias in some convolution layer of “mobilenet_ssd_300” model and training network from scratch. Mxnet and gluoncv use batchnorm layer for faster convergence when training. I tried to set “beta” and “running_mean” term of batchnorm to 0 and lr_mult=0 to make sure that they cannot learn anything. however, in output model, i could see that “running_mean” term was not completely removed. So the model still had a small shift factor. It was not my expectation.
So, how can i completely remove shift factor in batchnorm layer when training?


#2

Hi you can try setting the center parameter in the BatchNorm layer to false with the use_global_stats param also set to False. See https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.BatchNorm for more.

Feel free to post a follow up with more details about your use case and or code sample if this doesn’t fully address your question.


#3

the param use_global_stats is False by default, the “running_mean” is not use center parameter, so the result wasn’t change.
Btw, is it possible to remove bias completely when training with batchnorm?


#4

This is in brief how BN works in default mode. In training, it calculates mean and std of each batch and normalizes the batch using these two values. It also updates the running mean and std, but it isn’t used in training. In inference, it only uses the running mean and inference.

This default behavior can be changed by setting use_global_stats to true, in which case BN simply uses the values of running mean and std to normalize the data without modifying anything.

I’m not exactly clear on what you’re trying to do. Are you intending to have BN calculate std, but not subtract mean and only scale by std?


#5

@safrooze
The final target is the model after merge BN to convolution should have only weight term, no bias term. Follow the guild here, the final bias term is bias. To make it zero, a possible solution is set beta and running%20mean to 0. They are corresponding to “beta” and “running_mean” term in MXnet’s BN, that’s right? I could set “beta” term to 0, but “running_mean” term depends on input batch’s image so it’s always different to 0.
I set use_global_stats to True but the ‘running_mean’ of BN still update every epoch.
If i have any error, plz fix for me.


#6

Quote from the page you mentioned: “During runtime (test time, i.e., after training), the functinality of batch normalization is turned off and the approximated per-channel mean and variance are used instead.”

Looks like what you’re trying to do is modify the behavior during inference, not training. During inference, running_mean and running variance are used. Nothing is updated. So you can set beta to 0 and running_mean to 0 and there will be no bias term.

In your original post you mentioned that “In output model, I could see that “running_mean” term was not completely removed”. How did you verify that?

For your reference, here is the implementation of BN in CPP code. Specifically this is the part of code when model is in inference mode or use_global_stats is set to True.


#7

In your original post you mentioned that “In output model, I could see that “running_mean” term was not completely removed”. How did you verify that?

i use function mx.nd.load(params_file) to load file .params and it looks like this:

that mean beta is already zero but running_mean isn’t?
i don’t really understand what you mean. But i should merge this model (include .json file and .params) file before inference to make an inference model that doesn’t have BN anymore. If the running_mean is not zeros so my inference model still have bias, is this right?
Anyway thank you for your detailed answers. I will very appreciate your help if you could help me solve this issue.