Freezing weight training for certain inputs to a hidden layer


#1

Hi,

I am new to mxnet and gluon. I have the following use case. Can you please advise how to achieve this?

firstNet = gluon.nn.HybridSequential(prefix = ‘first’)
firstNet.add(gluon.nn.Dense(units = 5))
first = firstNet(X1)

secondNet = gluon.nn.HybridSequential(prefix = ‘second’)
secondNet.add(gluon.nn.Dense(units = 5))
second = secondNet(X2)

concatData = F.concat(first, second, dim = 1)

thirdNet = gluon.nn.HybridSequential(prefix = ‘third’)
thirdNet.add(gluon.nn.Dense(units = 2))

return thirdNet(concatData)

I need to freeze updating weights for all connections from secondNet to thirdNet after a phase of training.

Thanks


How to train specific layers using gluon with different learning rate?
#2

Hi @nasudhan,

You can use param.grad_req = 'null' to achieve this.

Gluon layers have parameters (for weights and biases) which are updated after calling trainer.step() by default. You can prevent the update of a parameter (i.e. freeze) by ignoring the calculation of the gradient, which has the added benefit of speeding up the backward pass slightly (by avoiding unnecessary calculation). You can freeze a whole layer by looping through all the parameters in a layer and setting param.grad_req = 'null'.

for param in net.second_net.collect_params().values():
    param.grad_req = 'null'

#3

So for your example:

import mxnet as mx

class MyNetwork(mx.gluon.nn.HybridBlock):
    def __init__(self, **kwargs):
        super(MyNetwork, self).__init__(**kwargs)
        with self.name_scope():
            self.first_net = mx.gluon.nn.HybridSequential()
            self.first_net.add(mx.gluon.nn.Dense(units = 3))
            self.second_net = mx.gluon.nn.HybridSequential()
            self.second_net.add(mx.gluon.nn.Dense(units = 2))
            self.third_net = mx.gluon.nn.HybridSequential()
            self.third_net.add(mx.gluon.nn.Dense(units = 1))
    
    def hybrid_forward(self, F, x1, x2):
        first = self.first_net(x1)
        second = self.second_net(x2)
        concat = F.concat(first, second, dim = 1)
        third = self.third_net(concat)
        return third

net = MyNetwork()
net.initialize()
trainer = mx.gluon.Trainer(net.collect_params(), optimizer='sgd', optimizer_params={'learning_rate': 1})
x1 = mx.nd.random.uniform(shape=(10, 4))
x2 = mx.nd.random.uniform(shape=(10, 4))
with mx.autograd.record():
    output = net(x1, x2)

net.second_net[0].weight.data()
[[-0.04054644 -0.0198587  -0.05195032  0.03509606]
 [-0.02584003  0.01509629 -0.01908049 -0.02449339]]
<NDArray 2x4 @cpu(0)>

Weights from initialisation.

Update: default

output.backward()
trainer.step(batch_size=x1.shape[0])
net.second_net[0].weight.data()
[[-0.04904126 -0.0277727  -0.06480464  0.02541063]
 [-0.05673689 -0.01368807 -0.06583347 -0.0597207 ]]
<NDArray 2x4 @cpu(0)>

Weights after single update step: change from initialisation.

Update: after freezing

for param in net.second_net.collect_params().values():
    param.grad_req = 'null'
with mx.autograd.record():
    output = net(x1, x2)
output.backward()
trainer.step(batch_size=x1.shape[0])
net.second_net[0].weight.data()
[[-0.04904126 -0.0277727  -0.06480464  0.02541063]
 [-0.05673689 -0.01368807 -0.06583347 -0.0597207 ]]
<NDArray 2x4 @cpu(0)>

Weights after another update step: same as before.


#4

Thank you. I understand this freezes weights for all inputs to second_net. But what I want is to freeze weights for all inputs from second_net to third_net. For example if first_net, second_net and third_net have 3, 2 and 2 nodes, respectively, shape of input weights for third_net would be (2, 5). I want to set grad_req to ‘null’ for all weights at [ : , -2 : ] and allow [ : , : 3] to be updated.


#5

Oh sorry, I didn’t quite catch what you were trying to do the first time!

You can separate the layers in third_net for this. So rather than concat beforehand, just pass first through one layer and second through another layer, which you can freeze as per the method above, and then sum the results before the activation. Your network structure would look something like:

class MyNetwork(mx.gluon.nn.HybridBlock):
    def __init__(self, **kwargs):
        super(MyNetwork, self).__init__(**kwargs)
        self.fc1 = mx.gluon.nn.Dense(units = 5)
        self.fc2 = mx.gluon.nn.Dense(units = 3)
        self.fc3a = mx.gluon.nn.Dense(units = 2)
        self.fc3b = mx.gluon.nn.Dense(units = 2)
        self.act = mx.gluon.nn.Activation('sigmoid')
    
    def hybrid_forward(self, F, input1, input2):
        first = self.fc1(input1)
        second = self.fc2(input2)
        third_a = self.fc3a(first)
        third_b = self.fc3b(second)
        third = third_a + third_b
        return self.act(third)

#6

What’s your use case for this by the way? I’ve not seen it done before.


#7

Thanks a lot. This worked; however I am seeing weird results when I have activation with ‘relu’ for my third net after the concatenation operation. Code below - Unfreezing really does not work as expected when I have an activation for the thrid_net below.

class MyNetwork(mx.gluon.nn.HybridBlock):
    def __init__(self, **kwargs):
        super(MyNetwork, self).__init__(**kwargs)
        with self.name_scope():
            self.first_net = mx.gluon.nn.HybridSequential()
            self.first_net.add(mx.gluon.nn.Dense(units = 3))
            self.second_net = mx.gluon.nn.HybridSequential()
            self.second_net.add(mx.gluon.nn.Dense(units = 3))
            self.fc3a = mx.gluon.nn.Dense(units = 2) # this layer weights are frozen in first phase and unfrozen in the second phase.
            self.fc3b = mx.gluon.nn.Dense(units = 2)
            self.act = mx.gluon.nn.Activation('sigmoid')
            self.third_net = mx.gluon.nn.HybridSequential()
            self.third_net.add(mx.gluon.nn.Dense(units = 2))
            self.third_net.add(mx.gluon.nn.Activation(activation = 'relu')) # everything works fine without this activation. But with this activation, when I unfreeze the weight learning for fc3a in second stage model, I am not seeing the weights being updated.

    
    def hybrid_forward(self, F, x1, x2):
        first = self.first_net(x1)
        second = self.second_net(x2)
        third_a = self.fc3a(first)
        third_b = self.fc3b(second)
        third = third_a + third_b
        third_act = self.act(third)                                      
        third_net = self.third_net(third_act)
        return third_net

PS: I am exploring these for getting insights into a specific regression problem.