Computing per-class gradients

irina_nicolae · August 1, 2018, 6:48pm

I am running into an issue when trying to compute multiple gradients. The neural network is a classifier. I want to compute the gradient of each of the (class) output nodes w.r.t. the input of the network. Here is the full code example:

from mxnet import autograd, init, nd
from mxnet.gluon import nn
import numpy as np

nb_classes = 10


def class_gradient(model, x, nb_classes, label=None):
    x_ = nd.array(x)

    if label is not None:
        x_.attach_grad()
        with autograd.record(train_mode=False):
            preds = model(x_)
            preds[:, label].backward(retain_graph=True, train_mode=False)
        grads = x_.grad.asnumpy()
    else:
        grads = []
        for i in range(nb_classes):
            x_.attach_grad()
            with autograd.record(train_mode=False):
                preds = model(x_)

            preds[:, 0].backward(retain_graph=True, train_mode=False)
            grads.append(x_.grad.asnumpy())
    return grads

# Create a simple CNN
net = nn.Sequential()
with net.name_scope():
    net.add(
        nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Flatten(),
        nn.Dense(120, activation="relu"),
        nn.Dense(84, activation="relu"),
        nn.Dense(nb_classes)
    )
net.initialize(init=init.Xavier())

# Random data in the shape of a small MNIST sample
data = np.random.rand(10, 1, 28, 28)
grads = class_gradient(net, data, nb_classes=nb_classes)

I am getting the following error:

File ".../mxnet/base.py", line 149, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [18:04:27] src/imperative/imperative.cc:373: Check failed: !AGInfo::IsNone(*i) Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward.

I’m using Python3.6, mxnet 1.2.0 and numpy 1.14.5. I have tried a few workarounds and code variants, but the error persists. I would much appreciate help on solving this problem.

thomelane · August 15, 2018, 5:40am

Hi @irina_nicolae,

You’re getting this error because you’re performing a slice (i.e. preds[:, 0]) outside of the autograd.record scope. Given your objective to get the gradient for each class with respect to the input data, this slice is an important part of the computational graph which is used for backpropagation. You’ve done this correctly for the case when label is not None, but this code branch isn’t running in your example.

You can improve the code in a few other areas too. You don’t want to be calling .backward under an autograd record scope, since you don’t want to include the backward operations as part of a computational graph. Just do this this after the scope has closed.

Also I don’t see a reason for you to be setting train_mode=False. Just remove this unless you have a specific reason in mind separate to the example provided.

And lastly I think you have a typo in your slicing. I think preds[:, 0] should be preds[:, i].

thomelane · August 15, 2018, 5:44am

Something like this should work:

from mxnet import autograd, init, nd
from mxnet.gluon import nn
import numpy as np

nb_classes = 10

net = nn.Sequential()
with net.name_scope():
    net.add(
        nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Flatten(),
        nn.Dense(120, activation="relu"),
        nn.Dense(84, activation="relu"),
        nn.Dense(nb_classes)
    )
    
net.initialize(init=init.Xavier())
data = nd.random_uniform(shape=(5, 1, 28, 28))
data.attach_grad()
with autograd.record():
    preds = net(data)
    class_slices = [preds[:, i] for i in range(nb_classes)]
    
data_grads = []
for class_slice in class_slices:
    class_slice.backward(retain_graph=True)
    data_grad = data.grad.asnumpy()
    data_grads.append(data_grad)

# snippet of gradient for class 0
data_grads[0][0,0,:3,:3]

array([[ -6.55046140e-04,   5.01569957e-05,   2.20104214e-03],
       [ -9.35407006e-05,  -5.22406131e-04,  -3.39948572e-03],
       [ -8.62725952e-04,  -2.01242976e-03,  -1.27477385e-03]], dtype=float32)

# snippet of gradient for class 1
data_grads[1][0,0,:3,:3]

array([[  7.21680466e-04,  -5.52591991e-05,  -1.59816700e-04],
       [  2.29219068e-03,   8.36208521e-04,   4.56074625e-03],
       [  4.10425942e-03,  -5.38252061e-04,  -7.36064801e-04]], dtype=float32)

irina_nicolae · August 15, 2018, 10:33am

Hi, @thomelane! Thanks a lot for your solution, it worked like a charm! I was using train_mode=False because we’re not training: when computing gradient w.r.t. inputs, the weights and inputs do not vary. Does the option make sense in this context? Or is this not the intended use? Thanks again!

thomelane · August 15, 2018, 4:03pm

Ah, so train mode isn’t going to ‘train’ your model and update the weights for you automatically. Instead it just selects whether to run the forward pass of the network as you would in training (when true) or inference (when false). Your example network looks like it would be the same in both cases so this setting would matter either way. But if your model had dropout or batch norm there would be differences. i.e. no drop out is applied when running in inference (train mode set to false).

With regards to actually training the parameters of a network, that’s handled by the Gluon Trainer object (and Optimizer in turn).

irina_nicolae · August 16, 2018, 11:00am

Ok, this makes total sense! Very good point about layers that don’t have the same behaviour at training and test time. In my (almost functional) MWE, I used no such layer, but the class gradients I’m trying to compute should work for any model. The intended behaviour of the model would be the one at test time, and train_mode=False seems to achieve that. Thanks for the clarifications!

Topic		Replies	Views
Gradient fetching Discussion	2	586	May 31, 2018
Adding network gradient to the computational graph Gluon	3	1643	December 17, 2018
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	588	August 18, 2018
WGAN-gp: can't compute gradient penalty with gluon? Gluon	0	408	October 15, 2020
Differentiating specific softmax output label with respect to input image Discussion	1	788	October 11, 2017

Computing per-class gradients

Related Topics