Mathematical expression for cross entropy loss is

-y_i*sum(logy_k) but in the cross entropy function it is given as - np.log(y_{hat}[range(len(y_hat)), y]).

You did not multiply with true y label.

Mathematical expression for cross entropy loss is

-y_i*sum(logy_k) but in the cross entropy function it is given as - np.log(y_{hat}[range(len(y_hat)), y]).

You did not multiply with true y label.

I’m stuck on the same thing. But i think the reasoning could be the following:

All the entries in y are 0, except for the true label index which is 1. That is why this is not multiplied for the sum(log y_k) (cause it is 1). All the others are multiplied by 0.

When we need to take from the y_hat the corresponding probability of the predicted label, they all are multiplied by 0, except the true label index.

look at the image. The y array is a one hot vector. Index 2 is the true label.

Yes. Multiplying is not efficient hence we just pick up the corresponding probability. Correct me if i am wrong

This should be the Reasoning.

Maybe someone has a better explanation.