Yes, I had already seen that other post, but it didn’t help. My problem (I think) is that I don’t really understand what the usage pattern is supposed to be (and I can’t find any fully-fleshed out examples of training and then using a model with BatchNorm). Let me be clear about my questions (and I apologize if these seem silly or obvious; I’m still learning MXNet).
Question 1: How do I set a parameter such as “use_global_stats” differently when training vs inferring? In my (limited) understanding, the model is defined then trained then used for inference. The model - including the “use_global_stats” param - is defined in the first stage. How can I change the “use_global_stats” param AFTER the model has been defined and trained? Do I have to create a new model and then transfer the params from the trained model into the new variant? It would really, really help to see a COMPLETE example of using BatchNorm – from definition to training to saving to reloading to inferring. If that example exists, I certainly haven’t been able to find it.
Question 2: In my example, I did nothing at all special. I defined the model (without specifying “use_global_stats”). I trained the model. I then did scoring with the trained model using the same data as was used for training, just to try and verify/debug the results. I did not change anything between defining, training and scoring with the model. My question is: What is the expected behavior of the score function (in terms of BatchNorm) in this situation? What mean and variance was being used? Should the mean and variance learned by the training process have survived and been utilized automatically in this case? If not, how do I capture that data and pass it on to the model during inference and/or scoring?
Again, I think all of these questions would be answered easily by seeing a fully worked example that goes through all the stages of defining, training, saving, loading and then using a model with BatchNorm.