Is this a correct way to copy features by the pretrained model of glunoncv?


#1

I would like to perform features subtraction by gluoncv, could I do it as following show(pseudo codes)?

import mxnet as mx

from mxnet import gluon, nd
from mxnet.gluon import nn

from gluoncv.model_zoo import get_model

# Get the model CIFAR_ResNet20_v1, with 10 output classes, without pre-trained weights
gluon_net = get_model('ResNet50_v2', pretrained=True)

#after print, I find out net composed by two blocks, features(composed by 13 blocks) and output
print(gluon_net)

#get the features part
features = gluon_net.features
new_features_net = nn.HybridSequential()

#copy first 11 blocks
for i in range(11):
    print(features[i])	
    new_features_net.add(features[i])
	
#fix weights of first 11 blocks
for _, w in new_features_net.collect_params().items():
    w.grad_req = 'null'

def my_block():
    my_net = nn.HybridSequential()
    my_net.add(...)
    return my_net
	
net = nn.HybridSequential()
net.add(new_features_net)
net.add(my_block())
net[1].collect_params().initialize(init=mx.init.Xavier(),ctx=mx.gpu())
#load data, train, test blah blah blah

Anything I miss? Thanks

By the way, if I want to finetune, how could I set the learning rate of each layer?


#2

That sudo code looks fine to me. To cannot set the exact learning rate of each layer, but rather set a learning rate multiplier for each parameter. To do that for each layer, you can do:

block.collect_params().setattr('lr_mult', 0.5)

Alternatively, you can have multiple Trainer objects, each initialized with a subset of network parameters, and that gives you full flexibility to not only use different learning rate for different layers, but also different optimizers, different optimization schedule, etc.