How to use AMP with gluoncv SSD?

Hi,
I’m very happy to see that blog post on AMP: https://medium.com/apache-mxnet/simplify-mixed-precision-training-with-mxnet-amp-dc2564b1c7b0
I don’t understand the part about the type error at instantiation:
“This error occurs because the SSD script from GluonCV, before the actual training, launches the network once on the CPU context in order to obtain anchors for the data loader, and the CPU context does not support some of the FP16 operations, like Conv or Dense layers. We will fix this by changing the get_dataloader() function to use the GPU context for anchor generation:”

what should we do? instantiate the net on GPU and do the anchor generation on GPU too?

EDIT: When I do the proposed solution (instantiate on GPU and anchor on GPU) I have this:
terminate called after throwing an instance of ‘dmlc::Error’
what(): [14:06:06] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error

cheers

Instantiating the net on GPU and anchors on CPU seem to work:

net = gcv.model_zoo.get_model(args.basemodel, pretrained=True, ctx=ctx[0])

# instantiate training iterator
with autograd.train_mode():
    _, _, anchors = net(mx.nd.zeros((1, 3, image_size, image_size), ctx=ctx[0]))

anchors = anchors.as_in_context(mx.cpu())
1 Like

Thanks @olivcruche for sharing your solution. Indeed there is a little documented feature of the gluonCV SSD model where calling the model under the autograd train_mode you get back the anchors. The issue is that on mixed precision, if you do that on CPU and the CPU does not support the half precision then it crashes.

hence to get the anchors, you need to get them on GPU and then copy the results on CPU just like you did in your code snippet.

1 Like