Question about the example of symbol cunsom op

suyz · May 2, 2019, 10:12am

In this example:

It says: need_top_grad=False because softmax is a loss layer and you don’t need gradient input from preceding layers:

As far as I understand, this softmax op only normalizes the input data to the range between 0 and 1. The cross entropy loss is not calculated. So this op should not be a loss layer and the parameter need_top_grad should be set as True?

NRauschmayr · May 2, 2019, 5:45pm

need_top_grad indicates whether the gradient from the top layer is required at the time of the backward pass. Since this layer is placed at the end of the network, the gradient from the top layer is not required. You are right that in this example the operator only computes the softmax but not the cross entropy.

suyz · May 10, 2019, 1:07pm

Thanks for your reply.
I think there is a trick in this Softmax layer. In forward, it doesn’t calculate the cross entropy. However, in the backpropagation, the calculation of gradient has taken cross entropy into consideration.

Topic		Replies	Views
How to implement the addtion of grad in the backback-propagating,how to add extra term (which is the gradient to middle net layer output) to the network	2	588	August 18, 2018
Differentiating specific softmax output label with respect to input image Discussion	1	788	October 11, 2017
Custom Loss Function Shape mismatch! Discussion	3	1431	September 22, 2020
SoftmaxOutput in gluon Gluon	6	2230	April 10, 2018
Loss function in Mxnet C++	8	1594	June 22, 2018

Question about the example of symbol cunsom op

Related Topics