Custom Loss + L2 Regularization


#1

Hello,

We are implementing a custom loss layer using mx.symbol.Custom . This works fine. We want to add L2 regularization to this loss. i.e.

loss = mx.symbol.Custom(.....)

l2_loss = mx.sym.sum(mx.sym.square(var))
reg_loss = mx.sym.MakeLoss(l2_loss * (reg_weight))

Now, Ideally I want to add both the loss.

Would the following code add both losses during training? or do I need to specifically add them?

final_loss = mx.sym.Group([loss, reg_loss])

Thanks!


#2

What you have done here should work from what I can see. Have you tried?


#3

Yes, I have tried. There are no errors as such. It trains fine.

But, am not sure if it is adding up the losses before taking the gradients.

Because, I am comparing the results between MXNet and TensorFlow and they are different.


#4

They get added, here is a toy example, where I have grouped two symbols: add and mult. I pass data through them and get the gradient back:

# input
a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
​
# ops
add = a + b
mult = a * b
​
# output
out = mx.sym.Group([add, mult])
​
# bind shapes, get executor
executor = out.simple_bind(mx.cpu(), a=(1,3), b=(1,3))
​
# data
a_data = mx.nd.array([[1,2,3]])
b_data = mx.nd.array([[3,4,5]])
​
# Forward pass
output = executor.forward(a=a_data, b=b_data, is_train=True)
​
# Backward pass
head_grad = mx.nd.ones((1,3))
executor.backward([head_grad, head_grad])
​
print(executor.arg_dict)
print(executor.grad_arrays)
{'a': 
[[1. 2. 3.]]
<NDArray 1x3 @cpu(0)>, 'b': 
[[3. 4. 5.]]
<NDArray 1x3 @cpu(0)>}
[
[[4. 5. 6.]]
<NDArray 1x3 @cpu(0)>, 
[[2. 3. 4.]]
<NDArray 1x3 @cpu(0)>]

We have:
d(add)/da = 1
d(mult)/da = b

d(add)/db = 1
d(mult)/db = a

We see that effectively our grad array for a is:
[1+3, 1+4, 1+5] = [4, 5, 6]
and for b is:
[1+1, 1+2, 1+3] = [2, 3, 4]