Obtaining second order derivatives for a function wrt arbitrary parameters in the computation graph



We have an implementation of a recurrent network in MXnet and are trying to obtain the second order derivatives of a loss function with respect to arbitrary(all) parameters in the computational graph.

Is there a way to do this? Any help would be nice :slight_smile:

How to compute higher order gradients
How to get gradients using symbol API
Hessian vector products in symbolic

It’s unclear to me how many operators support higher-order gradients at this time, so it might not work on your network, but there is an interface that should allow you to do it provided all the operators support it.


You can find documentation for it on this page, but you have to scroll down because for some reason, there isn’t an anchor link for it at the top.

Gist should be something like this (I didn’t test this)

with mx.autograd.record():
  output = net(x)
  loss = loss_func(output)
  dz = mx.autograd.grad(loss, [z], create_graph=True)  # where [z] is the parameter(s) you want

dz[0].backward()  # now the actual parameters should have second order gradients


Thanks! This (https://github.com/apache/incubator-mxnet/issues/10002) seems to imply that not all operators support this yet.

I’ll try the interface you’ve pointed to and report operators it fails on to the contributors.


Dear all,

is there a timeline on when higher order derivatives will be released for mxnet/gluon? A lot of GAN-like systems require them for stabilised training.


Looks like it’s starting to get some active development efforts. Refer to here (particularly the end of the thread):

And here:

I’m also eagerly watching this one. It’s the last critical feature that I think Mxnet lacks.