Questions about loss functions


I am implementing a customized loss function. Doing this thing, I have some question.
In papers, most loss functions are summed.
For examples, if you look at softmaxOutput in mxnet, it just calculates p_i - y_i (where p_i is softmax’s output and y_i is a label), it does not sum up the values.

In my customized loss function, do I implement to only calculate gradient values, not to sum up?

Thank you.


Can you explain more what you mean and can you point to where you’re looking at in the code where the softmax output is not summed.

Unless you’re referring to the gradients in which case it should not be summed correct?

If you’re looking at this: then you see that for Softmax output the loss is actually not computed because you only really need the gradient of the loss with respect to softmax and you can compute that without computing the loss.