Use float16 to train yolov3


How could I use float16 to train yolov3? The scripts keep giving me error messages

raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator yolov3loss0_broadcast_mul1: [14:27:23] c:\jenkins\workspace\mxnet-tag\mxnet\src\io…/operator/elemwise_op_common.h:133: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node yolov3loss0_broadcast_mul1 at 1-th input: expected float16, got float32

What I tried

# convert net and async_net to float16
train(net, train_data, val_data, eval_metric, ctx, args)

Convert input image and some data to float16

data = gluon.utils.split_and_load(batch[0].astype('float16'), ctx_list=ctx, batch_axis=0)
fixed_targets = [gluon.utils.split_and_load(batch[it].astype('float16'), ctx_list=ctx, batch_axis=0) for it in range(1, 6)]
gt_boxes = gluon.utils.split_and_load(batch[6].astype('float16'), ctx_list=ctx, batch_axis=0)

#Error throw at this line
obj_loss, center_loss, scale_loss, cls_loss = net(x.astype('float16'), gt_boxes[ix], *[ft[ix] for ft in fixed_targets])

Not sure which input data I haven’t converted to float16


Do you also have test or validation data being used in the training loop? In happened to me twice to have an expected float16, got float32 error because I converted only train data and not validation data.


I comment out the codes of validate, still the same issue


As you found here indeed, there’s no current support for the provided script for float16 support.

I tried to have a quick look and it seems that float32 is quite deeply embedded at several level, from the target generation to the transforms and losses.