How to float16 gluoncv SSD finetuning?

olivcruche · July 23, 2019, 10:58am

Hi!
I’m trying to adapt this gluoncv tuto https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html to train in float16 over a V100.

I’m following the doc https://beta.mxnet.io/guide/performance/float16.html and writing this code:

net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
                              classes=classes, pretrained_base=False, transfer='voc')

net.cast('float16')

net.reset_class(classes)

from gluoncv.data.batchify import Tuple, Stack, Pad
from gluoncv.data.transforms.presets.ssd import SSDDefaultTrainTransform
from mxnet.gluon.data.vision.transforms import Cast

batch = 16

def get_dataloader(net, train_dataset, data_shape, batch_size, num_workers):
    
    width, height = data_shape, data_shape
    # use fake data to generate fixed anchors for target generation
    with autograd.train_mode():
        _, _, anchors = net(mx.nd.zeros((1, 3, height, width)).astype('float16', copy=False))
    batchify_fn = Tuple(Stack(), Stack(), Stack())  # stack image, cls_targets, box_targets
    train_loader = gluon.data.DataLoader(
        dataset=(train_dataset
                 .transform(SSDDefaultTrainTransform(width, height, anchors))
                 .transform(Cast('float16'))),
        batch_size=batch_size,
        shuffle=True,
        batchify_fn=batchify_fn,
        last_batch='rollover',
        num_workers=num_workers)
    
    return train_loader

train_data = get_dataloader(net, dataset, 512, batch, 4)

this returns

MXNetError: Error in operator ssd13_mobilenet0_batchnorm0_fwd: [10:56:52] src/operator/nn/batch_norm.cc:370: Check failed: (*in_type)[i] == dtype_param (2 vs. 0) This layer requires uniform type. Expected 'float32' v.s. given 'float16' at 'gamma'

olivcruche · July 23, 2019, 11:07am

following this Batch norm crashes with float16 I’m doing a selective float16 cast:

for p in net.collect_params().items():
    if (not p[0].endswith('gamma')
    and not p[0].endswith('beta')
    and not p[0].endswith('_mean')
    and not p[0].endswith('_var')):
        p[1].cast('float16')

which also errors MXNetError: [11:06:56] src/operator/nn/convolution.cc:286: Check failed: (*in_type)[i] == dtype (0 vs. 2) This layer requires uniform type. Expected 'float16' v.s. given 'float32' at 'weight'

ifeherva · July 23, 2019, 8:58pm

I used this line

if not 'gamma' in param.name and not 'beta' in param.name and not 'mean' in param.name and not 'var' in param.name:
     param.cast('float16')

As far as I remember this worked.

Rainweic · December 28, 2019, 7:35am

Have you solve it? I also meet this error.

Topic		Replies	Views
Batch norm crashes with float16 Gluon	2	1310	December 25, 2019
Cryptic failure of SSD training with gluoncv 0.5.0 Gluon	1	503	October 23, 2019
Fine-tune object detectors on custom dataset gluon-cv , how-to , general-question	1	714	February 19, 2019
Help with SSD SmoothL1 metric reporting NaN during training Gluon	7	1380	December 27, 2023
Use float16 to train yolov3 Gluon	3	810	January 22, 2019

How to float16 gluoncv SSD finetuning?

Related Topics