Computing gradients of intermediate values

#1

Does mxnet allow us to compute gradients of intermediate values?
For example let’s we have attached gradient with w1, then under mx.autograd.record() I have calculated z1 = 1 + w1, z2 = z1 + w1, I know how to calculate gradient of z2 wrt w1 (by calling z2.backward(), then getting gradient by w1.grad) but how to calculate gradients of z2 wrt z1???
Because z1 has not gradient attached to it.

#2

Can I ask what your usecase is for retriving the intermeidate gradients? Usually we’re only interested in the getting the gradients of the ‘leaf nodes’ (e.g. the parameters), and don’t save the intermediate gradients to save memory.

If you’re only interested in the gradient of the intermediate variable, you can attach_grad on z1 after it is defined, but this will effect the gradient calculation for w1. You can alternatively implement a custom backward method and extract the intermediate gradient from there.

#3

Ok I’ll try it out. Actually I am doing a research work so I need to see what is happening to the gradients of hidden layers that’s why.

#4

@thomelane How can we attach grad to hidden layers of a loaded pre-trained network? My intention is to call forward and backward and then look at the gradients of the hidden layers. I know how find the gradients of the hidden layers wrt input but I am interested in the gradient of one hidden layer wrt another one.