Gluon Vision model prediction returns a vector of nan

ifeherva · April 3, 2018, 8:08pm

I am using a pretrained resnet50_v2 model from the model zoo. After a training epoch I would like to do some evaluations without dataloader. See a sample below:

img_path = 'path to img'
img = mx.img.imdecode(open(img_path, 'rb').read()).astype('float32')
img = mx.image.resize_short(img, cfg.common.image.rows)
img = mx.image.center_crop(img, (cfg.common.image.rows, cfg.common.image.rows))[0]
img /= 255
img = mx.img.color_normalize(img, mean, std)
img = img.transpose()
img = mx.nd.reshape(img, shape=((1,) + img.shape))

embedding = model(img)

However, each time I call model(img) I receive a vector of nan. Training seems to be working with the same code inside a dataloader.

ifeherva · April 3, 2018, 9:10pm

It seems the backpropagation causes issues. After I call

with autograd.record():
    losses = [loss(*(model(X1, X2) + [Y])) for X1, X2, Y in zip(img1, img2, label)]

    for l in losses:
        l.backward()

the model becomes unusable. losses is a list of one NDArray with some floating values in it (no nan or inf)

ifeherva · April 3, 2018, 10:30pm

The issue was in my loss function, one exp function was returning inf.

Topic		Replies	Views
Help with DeepLab. Runing problem. Nan output Gluon gluon-cv	3	631	September 17, 2020
Gluon pretrained model layer access and usage	6	5694	October 31, 2019
Different predictions with pre-trained resnet on same picture? Gluon	1	406	November 15, 2018
GluonCV, Faster RCNN, normalization layers Discussion	7	852	July 4, 2018
Help with SSD SmoothL1 metric reporting NaN during training Gluon	7	1378	December 27, 2023

Gluon Vision model prediction returns a vector of nan

Related Topics