 # Automatic Differentiation

#1

#2

Can’t understand the meaning of head gradient.
Why give it a nd.array([10, 1., .1, .01])?
What do I get in x.grad if I don’t pass the head gradient to the `backward` function? Isn’t it dz/dx?

#3

Hi @vermicelli,

I think backward should be applied to `y` here not `z`, that would make more sense to me.

And then the example should show a case where you could calculate dz/dy manually (possibly even not using mxnet), and still be able to use autograd for dy/dx to calculate dz/dx which is stored in `x.grad` as you pointed out.

Something like this example:

``````import mxnet as mx

x = mx.nd.array([0.,1.,2.,3.])

y = x * 2

# dy/dz calculated outside of autograd
dydz = mx.nd.array([10, 1., .1, .01])
y.backward(dydz)
# thus calculating dz/dx, even though dz/dx was outside of autograd
``````
``````[20.    2.    0.2   0.02]
<NDArray 4 @cpu(0)>
``````

@mli @smolix please confirm? Quite a complex example for an intro. Are there many use cases of this you’ve seen in the wild?

#4

Thank you for your reply. This makes sense to me. But I think the ‘dy/dz’ in the comment `# dy/dz calculated outside of autograd` should be ‘dz/dy’. My understanding of your example is that you let the MXNet do the autograd on dy/dx which should be 2, and told autograd you already have the dz/dy part manually which is `[10, 1., .1, .01]`. Then autograd store the dz/dy * dy/dx in x.grad as the final result. Am I right?

So the “head gradient” here just means the gradient of some calculation chains which don’t get recorded by autograd.

#5

@vermicelli, I think you are correct. The last example here implies that head_gradient is calculated outside of autograd. I think example implies that this head_gradient is actualy gradient of some other function w(z) that is missing. That head_gradient is actually dw/dz. I would put that into comments block in the code just to clarify this piece a bit more. Other than that I think your understanding is correct.