About stale gradient

#1

mxnet version: 1.4.0
operation system: linux

Dear all:
Now I have set all ‘grad_req’ in all Parameter to ‘add’, hoping to accumulate gradient over N batches to solve the problem of memory limitations. This is my code:

However, I encounter this Warning when I run the training code:

UserWarning: Gradient of Parameter bn0_moving_mean on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient

My questions is: is it because my code that leads to this Warning? if True, how can I modify my code, if False, what probably could lead to this warning?

Thanks

#2

And now I am very frustrated because my training loss just not goes down

Any help or advice will be appreciated, thank you guys

#3

This message means that there are some parameters in you computation that has no effect in your final output, i.e. if those parameters are changed then there would be no effect in the final output.
Sometimes this is intentional by the user but sometimes there could be some bug(and I think that’s the case with your code), therefore mxnet shows this message to tell the user about this situation. You can hide this message by passing ignore_stale_grad=True, that is training.step(batch_size, ignore_stale_grad=True)

I don’t know how you are defining backbone_net, so I can’t exactly say where the bug is.

And your loss is not going down because your learning_rate is toooooo high.

#4

Hi, my friend, the loss finally goes down after the 1000 iteration,

but I still can not figure out the warning problem, here is how I define my Backbone_net:
I use the gluon.SymbolBlock interface in mxnet, i.e. I load the pretrained feature extracting network (symbol.json & .params) directly

Thank you for replying!!, I really appreciate it!

#5

Above code looks okay to me. How your definitely gn.Attach_arcloss?

Check out this notebook that explains how to degub gluon code, I think this might be helpful.

BTW welcome to the community.

#6

Try loading model like
gluon.nn.SymbolBlock.imports("./models/r50eir/model-symbol.json", ['data'], "./models/r50eir/model-0001.params", ctx=ctx_)

#7

Thank you bro!!! Here is how I define the gn.Attach_arcloss: in this HybridBlock, I just attach an arcloss layer to the backbone feature extractor net

#9

I make this change to gluon.SymbolBlock.imports, but the warning still exists

Aggregate gradients manually over n batches
#10

Anyone got any ideas??? T_T

#11

Sorry brother. I will try to find a solution.

#12

Thank you soooooo much bro, you are my hero!!

#13

Hey, bro!!! GOOD NEWS!!! I think I figure out the problem!!! The ‘grad_req’ of ‘bn0_moving_mean’ is ‘null’, which means it does not need gradient:


However, I write this line in my training code:

which compulsively change all ‘grad_req’ to ‘add’, and hence leads to the warning, after I modify my training code like this:

The warning is gone:grin::grin::grin:

#14

Finally… Well done. I think now your loss would go down a bit faster.

#15

Yeah…Maybe, but I don’t think it will make a difference, because ‘moving_mean’ and ‘moving_var’ in a
batchnorm layer doesn’t need gradient, so ‘null’ or ‘add’ won’t make a difference,… Emmm… , am I right?

#16

Yep u r absolutely right. Batch normalisation has nothing to do with loss computation, it just helps to update our parameters faster.

#17

yeah bro, nice to have you for this issue, thank you for company!!!
Looking forward to share opinion with you again!!!

#18

Thanks, I appreciate it.