Mxnet batchnorm with symbol API


Hi folks,

I am using the C++ implementation of the Mxnet library. And I noticed that for some reason, the moving_mean and moving_variances are not updated, no matter the momentum I choose.
The process that I am using is a full forward pass, in training mode (the forward parameter is set to true). Then a backward pass. Then, I update all the parameters with the optimizer. It seems that this operation doesn’t apply on the moving_mean nor the moving_var. Thus, I don’t know - am I missing a step to update the moving_variance and moving_mean?
I set the fix_gamma parameter to false, so I was expecting a full update of the BatchNorm during the training.
The only differences of values that I get are basically numerical approximation in the order of 1.0e-9.

Also, if I could have a hint on how to operate if I set the output_mean_var to true? The output doesn’t seem to be accepted by the following activation layer.



Hi @dmidge,

At least in Python, these global running statistics are updated during the backward pass, even before the update. So they should be updating. Are you setting any other parameters like use_global_stats?

Also, this is independent of fix_gamma: that’s a post normalisation scale factor but you still need the batch statistics or global statistics to normalise in the first place.


Hi @thomelane,

Thank you for your reply.
I have use_global_stats set to false. Indeed, I noticed that I had a bad inference behaviour when use_global_stats was set to true.


use_global_stats actually doesn’t affect inference, but can lead to an incorrectly training model (if not used appropriately). Can you share some code showing how you’re checking the values of the global stats and instantiating BatchNorm?


Oww, the code that I am using right now can’t be shared as such. Let me do an equivalent mockup, if I can reproduce the same problem. But basically, what I do, I fetch a first time the NDArray content with this formula:

std::vector<mx_float> dataVarLayerArray;

Then I do the forward pass (with parameter true), then backward pass. I then loop across all the arg_arrays to update the values with the optimizer. Those don’t contain the moving_mean nor the moving_var. Finally, I fetch the content back of ndarrValue as I did before, and I compare the result.

I have the NDArrays provided when I do the InferArgsMap and the SimpleBind steps, thus I trust I fetch the real values.

Also, do you have an idea on how to work with output_mean_var at true? I would be interested to fetch the mean of the batch.

Thank you @thomelane


A full reproducible example would be great, thanks!

Sorry, I’m not totally familiar with the C++ API. Are you sure you’re in “training mode” with this?

You don’t need to update these with gradients, so this isn’t an issue.


Well, unfortunately, with the C++ API, I am sure of nothing. I have to say that I really like the gluon documentation, which is both complete and comprehensive. But the symbol API is clearly less documented, and I don’t even mention the C++ API which requires close to exploratory experimentation to understand how it is working…

However, when I look at the header of the Forward function, it is void Forward(bool is_train). Meaning that putting true to that seems to put the forward pass in training mode. Besides, the C++ examples tends to put that whenever it is in training mode. So I am “quite” confident that we are in training mode, but I could be wrong.

Indeed. I was surprised not seeing them at the beginning, but there are indeed no gradient to compute for these parameters. But because of that, I don’t call specific function that updates the values. I just mentioned that in case there were another function to call to update the moving_mean that I was not aware of, which would explain why I have no update of these parameters.


Should all be handled for you when you do a backward pass, so no extra method needed to apply the update. Check out these lines in the BatchNorm source code, in the backward function.


Ahhh, there I get what was going on…
Whenever I was associating the matrices, I was using (as in the examples) auto *exec = net.SimpleBind(ctx, args);. args, as usual, contains the association of NDArray with its name. What I should have done is to compute auto *exec = net.SimpleBind(ctx, args, arg_grad_store, grad_req_type, aux_map); with aux_map containing the association of values from the NDArrays with its name for every mean and variance parameters of the BatchNorm. Thus, my NDArray, which was initialized, was not associated to the wanted NDArray. Therefore, it was not updated since another NDArray was created to actually contain the moving mean and variance, to which I had no access whatsoever.
Now, it works! I had to create a separate mockup to see that!

Thank you @thomelane for your time and your kind help! :smiley: