Just some changes and (maybe) some corrections on the Gradients section:

  • I would say explicitly that this formulas are following the Denominator Layout

  • The second and third example (x_traspose * A and x_traspose * A * x) I think that A is assumed to have n rows instead of m as is said, neither of the two are possible if it has m rows (in fact the third one requires a square matrix) so this is confusing or just a mistake.

Thank you for the effort!


Great catch! I guess you are right. It should be like the following:

I suggest that the Numerator Layout should be used here for consistency, as the next chapter mentions the Jacobian (m by n matrix), which confused me quite a while.

1 Like


You are right, for me the most important thing is to stay consistent. With the explanation of the Automatic Differentiation section of the "… the gradient of y (a vector of length m) with respect to x (a vector of length n ) is the Jacobian (an m \times n matrix) " they are not consistent with the previous ones, this is the Numerator Layout or Jacobian formulation and in the Calculus section is the Denominator Layout


@gpolo @minhduc0711

Thanks. The formula in the Calculus section follows Denominator layout. It’s quite common in deep learning: when you differentiate a loss function (scalar) with respect to a tensor, the shape of the differentiation result is the same as that of the tensor in denominator layout.

I agree that consistency matters. Thus I just removed the Jacobian description (in Numerator layout) in the automatic differentiation section:

Just let us know if you feel more explanations are needed. Thanks.

Now is consistent, the explanation of the Jacobain there was I do not think it was needed, now there is no inconsistency.
Thank you very much for your effort!

anyone has solution for 3rd question?