Error as shown in the figure,It's already set autograd.record(),but still error

DarkWings · October 31, 2018, 3:25pm

Error as shown in the figure,It’s already set autograd.record(),but still error.
I downloaded the network model by gluoncv and Iwant to train it,the above error occurred

NRauschmayr · October 31, 2018, 6:10pm

Could you give some more information e.g. which model did you use and did you change anything? Or could you provide the code so that I can reproduce the problem?

DarkWings · October 31, 2018, 11:29pm

https://pan.baidu.com/s/1VbaVpZlR2iT1d0GNZWVa3g

Here’s the code and some more relative information，Thank you.

NRauschmayr · November 1, 2018, 10:56pm

I had a look on your code and simplified it quite a bit. There were some unnecessary array transformations which caused the error. There were also a few logical mistakes in the code and data. For instance:

It seems that your label images are RGB instead of having only 1 channel indicating the class label. Now I have recreated them by setting the positive class when the mean pixel value is >30
When you iterate through your data, you can iterate through train and label images at the same time for i, (data, label) in enumerate(zip(train_image, train_label_image)):. This simplifies your training loop a bit.
In the loop you were calling predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy() . asnumpy() copies your mx.nd.array output to the CPU, which is very bad for the training performance, but most importantly your computational graph isn’t captured anymore and you won’t be able to call .backward() on your loss anymore.
You were using fcn_net.demo( when you should be using fcn_net(

def get_fcn_resnet101(nclass=2,ctx=mx.cpu(),crop_size=273,pretrained_base=False) :
    return gluoncv.model_zoo.FCN(nclass=nclass,backbone='resnet101',ctx=ctx,crop_size=crop_size,pretrained_base=pretrained_base)

fcn_net=get_fcn_resnet101(ctx=ctx)
fcn_loss=gluon.loss.SoftmaxCrossEntropyLoss(axis=1)
trainer=gluon.Trainer(fcn_net.collect_params(),'adam',{'learning_rate':0.001})

train_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_train.rec',path_imgidx='./test_train.idx')
train_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_train.rec',path_imgidx='./label_train.idx')
test_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_test.rec',path_imgidx='./test_test.idx')
test_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_test.rec',path_imgidx='./label_test.idx')

for e in range(epoch):
    train_image.reset()
    train_label_image.reset()
    test_image.reset()
    test_label_image.reset() 
    moving_loss = 0.
    for i, (data, label) in enumerate(zip(train_image, train_label_image)):
        image=data.data[0].as_in_context(ctx)/255.
        # Labels shouldn't be RGB, using filtering to bring it back to single dimension and class numbers [0, 1]
        label=(label.data[0].as_in_context(ctx).mean(axis=1) > 30)
        with autograd.record():
            output = fcn_net(image)
            loss2 = fcn_loss(output[0], label)
            loss1 = fcn_loss(output[1], label)
            loss = (loss1+loss2)
        loss.backward()
        trainer.step(image.shape[0])
        curr_loss = nd.mean(loss)
        moving_loss = (curr_loss if i == 0 else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)
    print("Epoch %s. Batch %s. Loss: %s" % (e, i, moving_loss.asscalar()))

I can confirm the model is training:
first the original image, then the label, then the predicted image by the network

DarkWings · November 6, 2018, 2:11am

Thank you very much for your patience and help.

Topic		Replies	Views
Gluoncv fcn inference failed Gluon	10	622	November 26, 2018
There are some question during the training process Discussion	1	460	June 1, 2018
Weird result when run with the autograd.record Gluon	1	428	November 4, 2018
Error when trying to import a trained net: multiple outputs with name	2	814	December 1, 2018
My first neural network for classification in mxnet gluon, I don't understand what is the problem	1	703	July 23, 2019

Error as shown in the figure,It's already set autograd.record(),but still error

Related Topics