I have a loss function that depends on the gradient of a neural network w.r.t. the network inputs (not the network parameters). However, I’m having trouble backpropagating the loss function’s parameter-gradients because mxnet doesn’t seem to think the input-gradient is part of the computation graph. Can someone help me debug? Here’s a MWE:

```
# Data
x = nd.array([0])
dydx = nd.array([1])
# Network
net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(1))
net.collect_params().initialize(mx.init.Constant(1))
# Loss
l2_loss = gluon.loss.L2Loss()
x.attach_grad()
with mx.autograd.record():
y = net(x)
dydx_ = mx.autograd.grad(y, [x], retain_graph=True)[0]
loss = l2_loss(dydx, dydx_)
loss.backward()
```

I get this error:

```
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
<ipython-input-11-de03bb6615f7> in <module>()
17 dydx_ = mx.autograd.grad(y, [x], retain_graph=True)[0]
18 loss = l2_loss(dydx, dydx_)
---> 19 loss.backward()
MXNetError: [09:21:50] src/imperative/imperative.cc:373:
Check failed: !AGInfo::IsNone(*i)
Cannot differentiate node because it is not in a computational graph.
You need to set is_recording to true or use autograd.record()
to save computational graphs for backward.
If you want to differentiate the same graph twice,
you need to pass retain_graph=True to backward.
```

My question is: Why is dydx_ not considered part of the computational graph? It’s the derivative of net(x) w.r.t. to x and hence depends on network weights and biases. Shouldn’t it be extending the graph, or am I misunderstanding?