Greetings,

I am implementing a customized loss function. Doing this thing, I have some question.

In papers, most loss functions are summed.

For examples, if you look at softmaxOutput in mxnet, it just calculates p_i - y_i (where p_i is softmax’s output and y_i is a label), it does not sum up the values.

In my customized loss function, do I implement to only calculate gradient values, not to sum up?

Thank you.