Batch norm crashes with float16

ifeherva · July 24, 2018, 12:17am

I am trying to run SSD from gluoncv with float16. While I fully understand the model is not fully supporting float16 it seem the resnet head already crashes due to the batchnorm layer. See my minimal example below

import mxnet as mx
import gluoncv as gcv
from gluoncv.data.transforms import presets
from matplotlib import pyplot as plt

ctx = [mx.gpu(0)]

gcv.utils.download("https://cloud.githubusercontent.com/assets/3307514/" +
        "20012568/cbc2d6f6-a27d-11e6-94c3-d35a9cb47609.jpg", 'street.jpg')
image_list = ['street.jpg']

net = gcv.model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
net.set_nms(0.45, 200)

ax = None
net.collect_params().reset_ctx(mx.gpu(0))
net.cast(dtype='float16')
for image in image_list:
    x, img = presets.ssd.load_test(image, short=512)
    x = x.as_in_context(mx.gpu(0))
    x = x.astype('float16', copy=False)
    ids, scores, bboxes = [xx[0].asnumpy() for xx in net(x)]
    ax = gcv.utils.viz.plot_bbox(img, bboxes, scores, ids,
                                class_names=net.classes, ax=ax)
    plt.show()

The error I get is the following:

mxnet.base.MXNetError: Error in operator ssd0_resnetv10_batchnorm0_fwd: [00:11:51] src/operator/nn/batch_norm.cc:369: Check failed: (*in_type)[i] == dtype_param (2 vs. 0) This layer requires uniform type. Expected 'float32' v.s. given 'float16' at 'gamma'

zhreshold · July 24, 2018, 6:04pm

Since you have solved it. I am posting the solution for reference.

SSD network uses SymbolBlock as inner backbone, which don’t recognize BatchNorm when casting the network.
One quick solution is instead of net.cast('float16'), you can iterate through net.collect_params().items(), if parameter not ends with gamma, beta, moving_mean, moving_var then cast it to float16.

Rainweic · December 25, 2019, 1:18pm

Can you give an example? I don’t know how to use net.collect_params().items()
Thank you

Topic		Replies	Views
How to float16 gluoncv SSD finetuning? Gluon	3	697	December 28, 2019
Import export fails for float16 net after cast Gluon	2	1391	November 7, 2018
Resnet does not want to float16 Gluon	3	1258	December 13, 2018
Cryptic failure of SSD training with gluoncv 0.5.0 Gluon	1	506	October 23, 2019
cudaMalloc(and CPU memory allocation also) failed on asnumpy() Gluon	2	3617	March 25, 2019

Batch norm crashes with float16

Related Topics