Retrieve gradient with respect to attention map in Gluon

austin.doolittle · May 3, 2018, 2:04pm

I am trying to implement Guided Attention Networks using the Gluon API. This technique extends GradCAM and requires that you access gradients w.r.t. the feature map output of the last convolutional layer. Is there a way to retrieve these gradients with Gluon?

I’ve previously implemented this in Pytorch here. I used pytorch’s backward hooks to retrieve this gradient. I’ve found a couple different GradCAM implementations in MXNet, but they all seem to be getting gradients w.r.t. the final convolution weight, not the feature map. I also found this post on the github with exactly what I’m looking for, but in stock mxnet.

I was limited to only two links in the post, but I have links to the papers and examples mentioned above that I can provide on request.

thomelane · May 3, 2018, 5:27pm

Hi @austin.doolittle,

If x is your feature map, try calling x.attach_grad() before autograd.record(). You should then be able to access the gradient with x.grad afterwards.

austin.doolittle · May 3, 2018, 6:32pm

Thanks for your reply! I don’t have access to the feature map before I enter the autograd.record() scope because it is the output of a HybridSequential() containing my feature extractor. I tried calling x.attach_grad() from within the autograd.record() scope and it does appear to yield a gradient, however it also seems to destroy the graph up to that point. I call output.backward(retain_graph=True) to retrieve this gradient (where output is the ground truth labels), but do not update any parameters as a result of it because this gradient is factored into the training process. I later call total_loss.backward(), where total loss is the loss of original classification + attention mining loss as described in the GAIN paper, but I then get the error:

Gradient of Parameter conv0_weight on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient

EDIT: I reread my explanation above and realized it was super abstract and confusing, so I pushed my source to github. Lines of interest aregain.py:67 and model.py:81

austin.doolittle · May 18, 2018, 8:51pm

I attempted to leave the mx.autograd.record() block, call attach_grad() on the feature map, and reenter mx.autograd.record(). This doesn’t raise an error, but the gradient always returns 0. @thomelane do you have any further suggestions?

indu · May 21, 2018, 7:40am

Check if this helps: https://github.com/apache/incubator-mxnet/pull/10900.
That PR implements Grad-CAM using Gluon.

austin.doolittle · May 30, 2018, 2:38pm

That helps me understand exactly what I’ll need to do to get this implemented. Thanks for your help @indu and @thomelane!

adelshafiei · April 22, 2019, 6:47pm

The implementation at https://github.com/apache/incubator-mxnet/pull/10900 only works for specific standard models. How can we retrieve the gradients for any pre-trained model? Current implementation uses the hybrid_forward function to find the gradient of the last convolution layer but if we only load the checkpoint and run inference, we need a more generalized way of getting gradients of the hidden layers with regards to some other layers (like tf.grad). @indu any ideas?

hdjsjyl · August 2, 2019, 8:26pm

Hi, I have a problem for gradcam. My network is symbol. I don’t know how to fit my model to gradcam for gluon. Any advice will be appreciated, thanks

Topic		Replies	Views
Pass image to get gradient Gluon	0	296	September 1, 2021
Fine-tuning error "gradient has not been updated by backward since last step" Gluon	1	1433	September 1, 2019
SageMaker CPU Training: Gradient of Parameter `lstnet0_conv0_weight` on context cpu(1) has not been updated by backward since last `step` Gluon	4	861	April 2, 2019
Gluoncv fcn inference failed Gluon	10	620	November 26, 2018
Is this a correct way to copy features by the pretrained model of glunoncv? Gluon	1	1101	December 12, 2018

Retrieve gradient with respect to attention map in Gluon

Related Topics