Error as shown in the figure,It’s already set autograd.record(),but still error.
I downloaded the network model by gluoncv and Iwant to train it,the above error occurred
Could you give some more information e.g. which model did you use and did you change anything? Or could you provide the code so that I can reproduce the problem?
https://pan.baidu.com/s/1VbaVpZlR2iT1d0GNZWVa3g
Here’s the code and some more relative information,Thank you.
I had a look on your code and simplified it quite a bit. There were some unnecessary array transformations which caused the error. There were also a few logical mistakes in the code and data. For instance:
- It seems that your label images are RGB instead of having only 1 channel indicating the class label. Now I have recreated them by setting the positive class when the mean pixel value is
>30
- When you iterate through your data, you can iterate through train and label images at the same time
for i, (data, label) in enumerate(zip(train_image, train_label_image)):
. This simplifies your training loop a bit. - In the loop you were calling
predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy()
.asnumpy()
copies your mx.nd.arrayoutput
to the CPU, which is very bad for the training performance, but most importantly your computational graph isn’t captured anymore and you won’t be able to call.backward()
on your loss anymore. - You were using
fcn_net.demo(
when you should be usingfcn_net(
def get_fcn_resnet101(nclass=2,ctx=mx.cpu(),crop_size=273,pretrained_base=False) :
return gluoncv.model_zoo.FCN(nclass=nclass,backbone='resnet101',ctx=ctx,crop_size=crop_size,pretrained_base=pretrained_base)
fcn_net=get_fcn_resnet101(ctx=ctx)
fcn_loss=gluon.loss.SoftmaxCrossEntropyLoss(axis=1)
trainer=gluon.Trainer(fcn_net.collect_params(),'adam',{'learning_rate':0.001})
train_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_train.rec',path_imgidx='./test_train.idx')
train_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_train.rec',path_imgidx='./label_train.idx')
test_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./test_test.rec',path_imgidx='./test_test.idx')
test_label_image=mx.image.ImageIter(batch_size=batch_size,data_shape=input_image_shape,shuffle=False,path_imgrec='./label_test.rec',path_imgidx='./label_test.idx')
for e in range(epoch):
train_image.reset()
train_label_image.reset()
test_image.reset()
test_label_image.reset()
moving_loss = 0.
for i, (data, label) in enumerate(zip(train_image, train_label_image)):
image=data.data[0].as_in_context(ctx)/255.
# Labels shouldn't be RGB, using filtering to bring it back to single dimension and class numbers [0, 1]
label=(label.data[0].as_in_context(ctx).mean(axis=1) > 30)
with autograd.record():
output = fcn_net(image)
loss2 = fcn_loss(output[0], label)
loss1 = fcn_loss(output[1], label)
loss = (loss1+loss2)
loss.backward()
trainer.step(image.shape[0])
curr_loss = nd.mean(loss)
moving_loss = (curr_loss if i == 0 else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)
print("Epoch %s. Batch %s. Loss: %s" % (e, i, moving_loss.asscalar()))
I can confirm the model is training:
first the original image, then the label, then the predicted image by the network
3 Likes
Thank you very much for your patience and help.