Fine-tuning error "gradient has not been updated by backward since last step"

I’m trying to classify image sequences by passing each frame in a pre-trained resnet then doing a couple FCs on the pooled embedding. Below is the class I created for the net. The training loop returns "UserWarning: Gradient of Parameter poolingclassifier1_resnetv20_dense0_weight on context gpu(0) has not been updated by backward since last step. This could mean a bug in your model that made it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient"

What is wrong? how to get my resnets to train too?

class PoolingClassifier(gluon.HybridBlock):
    """this network runs a softmax on top of the pooled frame imagenet embeddings"""
    def __init__(self, num_classes, backbone, fc_width, ctx, dropout_p=0.3):
        super(PoolingClassifier, self).__init__()

        self.num_classes = num_classes
        self.backbone = backbone
        self.fc_width = fc_width
        self.dropout_p = dropout_p
        with self.name_scope():
            self.basenet = models.get_model(name=self.backbone, ctx=ctx, pretrained=True)
            self.emb = self.basenet.features
            self.dropout_1 = gluon.nn.Dropout(self.dropout_p)
            self.dropout_2 = gluon.nn.Dropout(self.dropout_p)
            self.fc1 = gluon.nn.Dense(self.fc_width, activation='relu')
            self.fc2 = gluon.nn.Dense(self.num_classes, activation='relu')

    def hybrid_forward(self, F, x):
        emb = F.concat(*[F.max(self.emb(ts), axis=0).expand_dims(axis=0) for ts in x], dim=0)
        e1 = self.fc1(emb)
        e1 = self.dropout_1(e1)
        e2 = self.fc2(e1)
        Y = self.dropout_2(e2)
        return Y

You get his error when you are passing parameters in your trainer that are not in your compute graph. In this case your basenet has the output layer that has parameters but are not used in the computational flow. However you pass them to your trainer by using the .collect_params()

A simple solution would be to pass in only the ‘feature’ branch of your base model.

1 Like