Best practice for shared weights between different layers?

how-to
#1

Dear all,

I want to use the same weights (only) across different layers. For example, assuming I have two layers (say a convolution and a Dense layer) that depend on the same weight variable (assuming again that dimensions are appropriately chosen):

class Layer1(HybridBlock):
    def __init__(self, some_weight, some_params, **kwargs):
        super().__init__(**kwargs)

        with self.name_scope():
                        self.weight = some_weight # this is supposed to be a Parameter

    def hybrid_forward(self, F, input):
          out = # do some things to input with weight

class Layer2(HybridBlock):
    def __init__(self, some_weight, some_other_params, **kwargs):
        super().__init__(**kwargs)

        with self.name_scope():
                        self.weight = some_weight # this is supposed to be a Parameter

    def hybrid_forward(self, F, input):
          out = # do some OTHER things to input with  the SAME weight

I am aware of e.g. creating a single convolution layer and using it in different parts of the network inside the hybrid_forward function, but this unfortunately does not fall into this category.

Thank you for your time.

#2

You can always pass ParameterDict while constructing your block to share weights. But I am not sure, how is it possible to make dimensions of Conv2D (or even Conv1D) same as dimensions of a Dense layer.

If it was the same layers, then they can easily share weights like that:

from mxnet.gluon.nn import Conv2D

# Create 2 Conv2D layers with same arguments, so shape of parameters are same.
conv_layer_1 = Conv2D(kernel_size=(3, 3), padding=(1, 1), channels=32, activation="relu", bias_initializer='ones')
# we share weight, not bias
conv_layer_2 = Conv2D(kernel_size=(3, 3), padding=(1, 1), channels=32, activation="relu",
                      params=conv_layer_1.collect_params('conv0_weight'))

conv_layer_1.initialize()
conv_layer_2.initialize()

# unfortunately, names of the biases are also same, while they have different values.
# But in new versions MXNet loading works based on indices, not parameter names
print(conv_layer_1.collect_params())
print(conv_layer_2.collect_params())

# just showing that biases are different
print(conv_layer_1.bias.data())
print(conv_layer_2.bias.data())
1 Like