I am reading the Gluon part of
Mixed precision training using float16.
In the provided exmple here, the dtype before softmaxloss layer is not converted back into float32.
While in the symbolic part, it did, as recommended.
It is advisable to cast the output of the layers before softmax to float32, so that the softmax computation is done in float32. This is because softmax involves large reductions and it helps to keep that in float32 for more precise answer.
So, does the gluon fp16 example will loss precision? If it is, could you please provide a correct version.