mxnet version: 1.4.0
operation system: linux
Now I have set all ‘grad_req’ in all Parameter to ‘add’, hoping to accumulate gradient over N batches to solve the problem of memory limitations. This is my code:
However, I encounter this Warning when I run the training code:
UserWarning: Gradient of Parameter
bn0_moving_mean on context gpu(0) has not been updated by backward since last
step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient
My questions is: is it because my code that leads to this Warning? if True, how can I modify my code, if False, what probably could lead to this warning?