Error as shown in the figure,It's already set autograd.record(),but still error

Error as shown in the figure,It’s already set autograd.record(),but still error.
I downloaded the network model by gluoncv and Iwant to train it,the above error occurred

Could you give some more information e.g. which model did you use and did you change anything? Or could you provide the code so that I can reproduce the problem?

https://pan.baidu.com/s/1VbaVpZlR2iT1d0GNZWVa3g

Here’s the code and some more relative information,Thank you.

I had a look on your code and simplified it quite a bit. There were some unnecessary array transformations which caused the error. There were also a few logical mistakes in the code and data. For instance:

  • It seems that your label images are RGB instead of having only 1 channel indicating the class label. Now I have recreated them by setting the positive class when the mean pixel value is >30
  • When you iterate through your data, you can iterate through train and label images at the same time for i, (data, label) in enumerate(zip(train_image, train_label_image)):. This simplifies your training loop a bit.
  • In the loop you were calling predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy() . asnumpy() copies your mx.nd.array output to the CPU, which is very bad for the training performance, but most importantly your computational graph isn’t captured anymore and you won’t be able to call .backward() on your loss anymore.
  • You were using fcn_net.demo( when you should be using fcn_net(
def get_fcn_resnet101(nclass=2,ctx=mx.cpu(),crop_size=273,pretrained_base=False) :
    return gluoncv.model_zoo.FCN(nclass=nclass,backbone='resnet101',ctx=ctx,crop_size=crop_size,pretrained_base=pretrained_base)

fcn_net=get_fcn_resnet101(ctx=ctx)
fcn_loss=gluon.loss.SoftmaxCrossEntropyLoss(axis=1)
trainer=gluon.Trainer(fcn_net.collect_params(),'adam',{'learning_rate':0.001})

train_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_train.rec',path_imgidx='./test_train.idx')
train_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_train.rec',path_imgidx='./label_train.idx')
test_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_test.rec',path_imgidx='./test_test.idx')
test_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_test.rec',path_imgidx='./label_test.idx')

for e in range(epoch):
    train_image.reset()
    train_label_image.reset()
    test_image.reset()
    test_label_image.reset() 
    moving_loss = 0.
    for i, (data, label) in enumerate(zip(train_image, train_label_image)):
        image=data.data[0].as_in_context(ctx)/255.
        # Labels shouldn't be RGB, using filtering to bring it back to single dimension and class numbers [0, 1]
        label=(label.data[0].as_in_context(ctx).mean(axis=1) > 30)
        with autograd.record():
            output = fcn_net(image)
            loss2 = fcn_loss(output[0], label)
            loss1 = fcn_loss(output[1], label)
            loss = (loss1+loss2)
        loss.backward()
        trainer.step(image.shape[0])
        curr_loss = nd.mean(loss)
        moving_loss = (curr_loss if i == 0 else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)
    print("Epoch %s. Batch %s. Loss: %s" % (e, i, moving_loss.asscalar()))        



I can confirm the model is training:
first the original image, then the label, then the predicted image by the network

3 Likes

Thank you very much for your patience and help.