Fine tune a model with gluon


#1

My plan is to remove the output layer of like Resnet, reuse all the previous layers and add a couple of other layers. I know how to do it with MXMet Module API. How to do the same thing with Gluon?


#2

You can learn more about fine tuning in gluon from here: https://gluon.mxnet.io/chapter08_computer-vision/fine-tuning.html


#3

That was the one I was looking at. But it does not show how to completely remove an old layer and add a new layer–and initialize only the new layer… :smile:


#4

Also: They have the following code snippet:
deep_dog_net = models.squeezenet1_1(prefix=‘deep_dog_’, classes=2)
deep_dog_net.collect_params().initialize()
deep_dog_net.features = net.features
print(deep_dog_net)
I am very confused how the following line works:
deep_dog_net.collect_params().initialize()
Will it erase all SqueezeNet weights loaded?


#5

You load two versions of the network, one pretrained net and one not pretrained deep_dog_net with 2 classes.

with deep_dog_net.collect_params().initialize() it will initialize every layers included in the .features and .output blocks.

Then you are getting rid of the untrained .features block and replacing it with the pre-trained net.features block.

In the end you have a pretrained .features block and a randomly initialized untrained .output block, just what you wanted.


#6

Thanks for the explanation! It really helps. But how can I add one more fc layer with dropout?


#7
alexnet = gluon.model_zoo.vision.alexnet(pretrained=True)
new_net = gluon.nn.HybridSequential()
with new_net.name_scope():
    
    pretrained_features = alexnet.features
    
    new_tail = gluon.nn.HybridSequential()
    new_tail.add(
        gluon.nn.Dense(100),
        gluon.nn.Dropout(0.5),
        gluon.nn.Dense(12)
    )
    new_tail.initialize()
    
    new_net.add(
        pretrained_features,
        new_tail
    )
    
new_net
print(new_net)
HybridSequential(
  (0): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (2): Conv2D(64 -> 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (4): Conv2D(192 -> 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2D(384 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    (8): Flatten
    (9): Dense(9216 -> 4096, Activation(relu))
    (10): Dropout(p = 0.5, axes=())
    (11): Dense(4096 -> 4096, Activation(relu))
    (12): Dropout(p = 0.5, axes=())
  )
  (1): HybridSequential(
    (0): Dense(None -> 100, linear)
    (1): Dropout(p = 0.5, axes=())
    (2): Dense(None -> 12, linear)
  )
)

#8

Thanks for the example!