I was participating MicroNet Challenge recently and the host (Google) has proposed a way to calculate FLOPs for float16 models. They think that, for float16 model, the mult operations in the matrix by matrix multiplication should be considered as float16 operations however the add operations should be float32.
From what Nvidia and MXNet tutorial claimed:
Nvidia: The Volta generation of GPUs introduces Tensor Cores, which provide 8x more throughput than single-precision math pipelines. Each Tensor Core performs D = A x B + C, where A, B, C, and D are matrices. A and B are half-precision 4x4 matrices, whereas D and C can be either half or single precision 4x4 matrices .
MXNet: Nvidia Tensor Cores essentially perform the computation D = A * B + C, where A and B are half-precision matrices, while C and D could be either half-precision or full precision .
I am wondering how, in actual implementation, Gluon mixed-precision does addition for the float16 model? Half or full precision? Thanks!