Use float16 to train yolov3


#1

How could I use float16 to train yolov3? The scripts keep giving me error messages

raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator yolov3loss0_broadcast_mul1: [14:27:23] c:\jenkins\workspace\mxnet-tag\mxnet\src\io…/operator/elemwise_op_common.h:133: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node yolov3loss0_broadcast_mul1 at 1-th input: expected float16, got float32

What I tried

# convert net and async_net to float16
net.cast('float16')
async_net.cast('float16')        
train(net, train_data, val_data, eval_metric, ctx, args)

Convert input image and some data to float16

data = gluon.utils.split_and_load(batch[0].astype('float16'), ctx_list=ctx, batch_axis=0)
fixed_targets = [gluon.utils.split_and_load(batch[it].astype('float16'), ctx_list=ctx, batch_axis=0) for it in range(1, 6)]
gt_boxes = gluon.utils.split_and_load(batch[6].astype('float16'), ctx_list=ctx, batch_axis=0)

#Error throw at this line
obj_loss, center_loss, scale_loss, cls_loss = net(x.astype('float16'), gt_boxes[ix], *[ft[ix] for ft in fixed_targets])

Not sure which input data I haven’t converted to float16


#2

Do you also have test or validation data being used in the training loop? In happened to me twice to have an expected float16, got float32 error because I converted only train data and not validation data.


#3

I comment out the codes of validate, still the same issue


#4

As you found here indeed, there’s no current support for the provided script for float16 support.

I tried to have a quick look and it seems that float32 is quite deeply embedded at several level, from the target generation to the transforms and losses.