How to preform one backward after several forward

My model training pipeline is as follows:

  1. MyNet has two branches: object detection branch (faster rcnn) and image retrieval branch.
  2. Image input to MyNet, go through the backbone network, and go to the detection branch first to detect the objects, output several bbox.
  3. Using these bbox to crop the regions from the last feature map of the backbone network, so we get these ROI features.
  4. ROI features input into retrieval branch, output an 128 dims embedding vector.
  5. Optimize the triplet loss.

But the problem is:

  1. faster rcnn batch size can only be 1, so I have to input the images one by one.
  2. but the triplet loss needs a batch (more than 3 images – anchor, positive, negative) of images.

So my solution is:

  1. In the detection branch, do the forward pass multiple times to get a batch embedding tensor.
  2. The retrieval branch takes the embedding tensor as input, calculate the triplet loss.
  3. Do the backward pass.

How to achieve this pipeline? As far as I know, we can do the parameters update - trainer.step() after several forward + backward, like this

for p in net.collect_params():
    p.grad_req = 'add'

for i in range(100):
    net.collect_params().zero_grads()
    for j in range(iter_size):
        y = net(data)
        y.backward()
    trainer.step()

But what I want to do is something like:

for i in range(num_epochs):
    emb_batch = []
    for j in range(batch_size):
        emb = net(data)
        emb_batch.append(emb)
    
    loss = triplet_loss(emb_batch)
    loss.backward
    trainer.step(batch_size)
    net.collect_params().zero_grads()

error raise:

UserWarning: Gradient of Parameter `tripletrcnn0_tnet_weight` on context gpu(0) has not been 
updated by backward since last `step`. This could mean a bug in your model that made it only use a 
subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step 
with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale 
gradient.

Any suggestions?

Does this help?

thanks for your reply, your solution is the same with the first one, which is one forward to one backward, but I want many forward to one backward.

1 Like

I understand - apologies I read rather quickly your post. I don’t see any reason that this shouldn’t work, probably a bug in your code? Which version of mxnet you are using? The following test example, which has multiple forward passes (of a random input) works:

from mxnet import autograd
from mxnet import nd, gluon
import mxnet as mx

net = gluon.nn.Conv2D(1,kernel_size=3,padding=1)
net.initialize()

xxlabel = nd.random.uniform(shape=[1,64*64])
mylosses = [gluon.loss.L2Loss(),gluon.loss.L1Loss()]*2 # "simulating" triplet/multiplet loss philosophy
trainer=gluon.Trainer(net.collect_params(),'adam')

loss = 0.0
for idx in range(4):
    xxin = nd.random.uniform(shape=[1,4,64,64])
    with autograd.record():
        out = net(xxin)
        loss = loss+ mylosses[idx](out.flatten(),xxlabel)
loss.backward()
trainer.step(4)

I’ve seen the warning you get in the past in my code, I can’t recall what was it, but definitely not multiple forward, it was a bug in my case. I hope this helps.

Thanks @feevos , that’s kinda close, a little difference is your loss function is computed immediately when forward pass is done. But in my case, I have to run many forward pass to get an embedding tensor first, and then compute the loss, I think that’s the problem, I don’t know if it can work.

1 Like