Data type error in tensor_blob.h when using float16 training

Hi all,

I encountered the following error when training a simple alexnet using float16 precision (–dtype float16). There seems to be a couple of reports on this issue but I fail to find a solution so far. I am using the Symbol API and the latest 1.5.0 version (cuda 10.1).

I am wondering if there is anything I did wrong. Do I need to make the training data also float16 in advance? Any input is highly appreciated. Thanks.

– J

mxnet.base.MXNetError: [13:53:26] include/mxnet/././tensor_blob.h:236: Check failed:
mshadow::DataType::kFlag == type_flag_: TBlob.get_with_shape: data type do not match specified type.Expected: 2 v.s. given 0
Stack trace:
[bt] (0) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4a3b8b) [0x7f39c945ab8b]
[bt] (1) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x304366d) [0x7f39cbffa66d]
[bt] (2) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3553e98) [0x7f39cc50ae98]
[bt] (3) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2675018) [0x7f39cb62c018]
[bt] (4) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x267abf5) [0x7f39cb631bf5]
[bt] (5) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x265a8a1) [0x7f39cb6118a1]
[bt] (6) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x265ddb0) [0x7f39cb614db0]
[bt] (7) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x265e046) [0x7f39cb615046]
[bt] (8) /home/xinghua/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2659004) [0x7f39cb610004]

Depending on how you’ve done it, it is likely the data needs to be float16 too. Otherwise you can preprend a casting layer to your network, that way the data is casted to float16 automatically.