UR: http://en.diveintodeeplearning.org/chapter_crashcourse/autograd.html

# Automatic Differentiation

**mli**#1

**vermicelli**#2

Can’t understand the meaning of *head gradient*.

Why give it a nd.array([10, 1., .1, .01])?

What do I get in x.grad if I don’t pass the *head gradient* to the `backward`

function? Isn’t it dz/dx?

**thomelane**#3

Hi @vermicelli,

I think backward should be applied to `y`

here not `z`

, that would make more sense to me.

And then the example should show a case where you could calculate dz/dy manually (possibly even not using mxnet), and still be able to use autograd for dy/dx to calculate dz/dx which is stored in `x.grad`

as you pointed out.

Something like this example:

```
import mxnet as mx
x = mx.nd.array([0.,1.,2.,3.])
x.attach_grad()
with mx.autograd.record():
y = x * 2
# dy/dz calculated outside of autograd
dydz = mx.nd.array([10, 1., .1, .01])
y.backward(dydz)
# thus calculating dz/dx, even though dz/dx was outside of autograd
x.grad
```

```
[20. 2. 0.2 0.02]
<NDArray 4 @cpu(0)>
```

@mli @smolix please confirm? Quite a complex example for an intro. Are there many use cases of this you’ve seen in the wild?

**vermicelli**#4

Thank you for your reply. This makes sense to me. But I think the ‘dy/dz’ in the comment `# dy/dz calculated outside of autograd`

should be ‘dz/dy’. My understanding of your example is that you let the MXNet do the autograd on dy/dx which should be 2, and told autograd you already have the dz/dy part manually which is `[10, 1., .1, .01]`

. Then autograd store the dz/dy * dy/dx in x.grad as the final result. Am I right?

So the “head gradient” here just means the gradient of some calculation chains which don’t get recorded by autograd.