I am implementing a customized loss function. Doing this thing, I have some question.
In papers, most loss functions are summed.
For examples, if you look at softmaxOutput in mxnet, it just calculates p_i - y_i (where p_i is softmax’s output and y_i is a label), it does not sum up the values.
In my customized loss function, do I implement to only calculate gradient values, not to sum up?