When we have 2 output layers and 2 loss functions, we can add them together and back propagate. Is it possible to run the back propagate twice with the 2 different losses, instead of summing them up together and call loss.backward()?
Yes you can. You need to pass
backward() call of the first loss, otherwise the computational graph is cleared and you get an error when you call
backward() on the second loss. I’m imagining that you’d also want the gradients to be summed together. In that case, you need to set
grad_req parameter to ‘add’ and manually call
zero_grad() on parameters after optimization step:
# Set grad_req to 'add' to accumulate gradients with each backward() call for p in net.collect_params().values(): p.grad_req = 'add' # Training loop happens here # Optimize weights after forward/backward trainer.step(batch_size) # reset gradients for p in net.collect_params().values(): p.zero_grad()
Thank you so much for your response! I’ll try this and update whenever I have results.
Is it possible to back propagate different losses to different layers? For example, my model have 2 output layers, and I want to use the loss_1 to optimize output_1 and loss_2 to optimize output_2, how can I do this? Or this is merely impossible? Thank you very much!
You can have as many losses as you want attached to as many branches of your network. You would simply calculate each loss under the
autograd.record() scope and call
loss1.backward(retain_graph=True) followed by
loss2.backward(). Once all
backward() calls are called, you can call
Thank you for your response! How would I attach 1 loss to 1 specific layer of network?
I’m quite confused about your question. Is this what you’re looking for?
net_base = gluon.nn.HybridSequential() with net_base.name_scope(): net_base.add(gluon.nn.Conv2D(channels=256, kernel_size=3, layout='NCHW', use_bias=False, activation='relu')) net_base.add(gluon.nn.Conv2D(channels=256, kernel_size=3, layout='NCHW', use_bias=False, activation='relu')) net1 = gluon.nn.Dense(10) net2 = gluon.nn.HybridSequential() with net2.name_scope(): net2.add(gluon.nn.Dense(2048, activation='relu')) net2.add(gluon.nn.Dense(100, activation='relu')) net_base.initialize() net1.initialize() net2.initialize() net_params = net_base.collect_params() net_params.update(net1.collect_params()) net_params.update(net2.collect_params()) for p in net_params.values(): p.grad_req = 'add' trainer = gluon.Trainer(net_params, optimizer='sgd') ce_loss1 = gluon.loss.SoftmaxCELoss() ce_loss2 = gluon.loss.SoftmaxCELoss() data = nd.random.uniform(shape=(16, 3, 100, 100)) label1 = nd.cast(nd.random.uniform(shape=(16,)) * 10, dtype='int32') label2 = nd.cast(nd.random.uniform(shape=(16,)) * 100, dtype='int32') with autograd.record(): out1 = net1(net_base(data)) out2 = net2(net_base(data)) loss1 = ce_loss1(out1, label1) loss2 = ce_loss2(out2, label2) loss1.backward(retain_graph=True) loss2.backward() trainer.step(batch_size=16) # Manually zero the gradients for processing the next batch for p in net_params.values(): p.zero_grad()
yeah sorry I was being silly. Thank you for being patient and helpful.
Glad I could help Also I fixed the example to set