Gluon Vision model prediction returns a vector of nan


#1

I am using a pretrained resnet50_v2 model from the model zoo. After a training epoch I would like to do some evaluations without dataloader. See a sample below:

img_path = 'path to img'
img = mx.img.imdecode(open(img_path, 'rb').read()).astype('float32')
img = mx.image.resize_short(img, cfg.common.image.rows)
img = mx.image.center_crop(img, (cfg.common.image.rows, cfg.common.image.rows))[0]
img /= 255
img = mx.img.color_normalize(img, mean, std)
img = img.transpose()
img = mx.nd.reshape(img, shape=((1,) + img.shape))

embedding = model(img)

However, each time I call model(img) I receive a vector of nan. Training seems to be working with the same code inside a dataloader.


#2

It seems the backpropagation causes issues. After I call

with autograd.record():
    losses = [loss(*(model(X1, X2) + [Y])) for X1, X2, Y in zip(img1, img2, label)]

    for l in losses:
        l.backward()

the model becomes unusable. losses is a list of one NDArray with some floating values in it (no nan or inf)


#3

The issue was in my loss function, one exp function was returning inf.