Retrieve gradient with respect to attention map in Gluon


I am trying to implement Guided Attention Networks using the Gluon API. This technique extends GradCAM and requires that you access gradients w.r.t. the feature map output of the last convolutional layer. Is there a way to retrieve these gradients with Gluon?

I’ve previously implemented this in Pytorch here. I used pytorch’s backward hooks to retrieve this gradient. I’ve found a couple different GradCAM implementations in MXNet, but they all seem to be getting gradients w.r.t. the final convolution weight, not the feature map. I also found this post on the github with exactly what I’m looking for, but in stock mxnet.

I was limited to only two links in the post, but I have links to the papers and examples mentioned above that I can provide on request.


Hi @austin.doolittle,

If x is your feature map, try calling x.attach_grad() before autograd.record(). You should then be able to access the gradient with x.grad afterwards.


Thanks for your reply! I don’t have access to the feature map before I enter the autograd.record() scope because it is the output of a HybridSequential() containing my feature extractor. I tried calling x.attach_grad() from within the autograd.record() scope and it does appear to yield a gradient, however it also seems to destroy the graph up to that point. I call output.backward(retain_graph=True) to retrieve this gradient (where output is the ground truth labels), but do not update any parameters as a result of it because this gradient is factored into the training process. I later call total_loss.backward(), where total loss is the loss of original classification + attention mining loss as described in the GAIN paper, but I then get the error:

Gradient of Parameter conv0_weight on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient

EDIT: I reread my explanation above and realized it was super abstract and confusing, so I pushed my source to github. Lines of interest and


I attempted to leave the mx.autograd.record() block, call attach_grad() on the feature map, and reenter mx.autograd.record(). This doesn’t raise an error, but the gradient always returns 0. @thomelane do you have any further suggestions?


Check if this helps:
That PR implements Grad-CAM using Gluon.


That helps me understand exactly what I’ll need to do to get this implemented. Thanks for your help @indu and @thomelane!