Is that possible to update only certain weights of embedding layers?


#1

Hi Guys,

I had a matrix factorization network defined by the following graph. I could get the weights of the embedding layers. Say now I had saved the weights. If I reloaded the weights to a new instance of the same network graph, can I fix certain weights (eg, get only some weights being updated)?

For example, when training the new network instance, I actually just want to update the last weight of the 8 latent_dim weights, and leave the first 7 fixed.

Thanks!


latent_dim = 8
y_true = mx.symbol.Variable("label")
user = mx.symbol.Variable("member")
user = mx.symbol.Embedding(name='member_embedding', data=user, input_dim=n_users, output_dim=latent_dim) 

book = mx.symbol.Variable("book")
book = mx.symbol.Embedding(name='book_embedding', data=book, input_dim=n_items, output_dim=latent_dim)
    
dot = user * book
dot = mx.symbol.sum_axis(dot, axis=1)
dot = mx.symbol.Flatten(dot)
dot = 1 - dot
    
return mx.symbol.LinearRegressionOutput(data=dot, label=y_true)

#2

If you’re using the module API for optimization, the easy way to freeze weights is by using fixed_param_names passed in module’s constructor. However what you are looking for is sub-parameter weight freezing. An easy way to achieve what you want is to split your latent space into two subsets. In your specific example, you’d write something like this:

user = mx.symbol.Variable("member")
user1 = mx.symbol.Embedding(name='member_embedding1', data=user, input_dim=n_users, output_dim=latent_dim-1)
user2 = mx.symbol.Embedding(name='member_embedding2', data=user, input_dim=n_users, output_dim=1)
user = mx.symbol.concat(user1, user2, dim=1)

Using the above trick, you get two weight sets (member_embedding1_weight and member_embedding2_weight) and you can freeze one set and optimize the other set.


#3

That’s a really cool idea/trick! I learned some new stuff from it. Thanks Safrooze!

After reading your reply. I realized that my question was misleading, sorry about this. What I really wanted is after training the embedding layers, I want to add some new ‘words’ to the model, but at the same time, I want to keep the weights of the existing ‘vocabulary’ unchanged.

In my case, say I use 2 latent dims, and got three books: 0, 1, 2, the book_embedding_weight matrix may be like this:

book_embedding_weight =
[[0.3, 0.4],
[0.8, 0.5],
[0.2, 0.3]]

Now, I want to introduce one new ‘book’.
I could create a new instance of the network and reload the old weights and initialize the new weights as zeros:

[[0.3, 0.4], # book 0’ weights, keep this unchanged
[0.8, 0.5], # book 1’ weights, keep this unchanged
[0.2, 0.3], # book 2’ weights, keep this unchanged
[0, 0]] # book3, only learn this weight vector

If I start to train the network, I really just want to learn the weights for the new book. And ideally the weights to the new book are comparable to the weights of ‘old’ book.
Is this even possible in mxnet? I would want to add new ‘users’ as well.

Thanks!


#4

In order to achieve what you want, you’d need to manually compose the embedding layer by stacking one_hot, two instances of slice_axis, FullyConnected, and sum . Example:

book = mx.sym.one_hot(indices=book, depth=n_items, name='one_hot')
book1 = mx.sym.slice_axis(data=book, axis=-1, begin=0, end=-1)
book1 = mx.sym.FullyConnected(data=book1, num_hidden=num_hidden, no_bias=True)
book2 = mx.sym.slice_axis(data=book, axis=-1, begin=-1, end=n_items)
book2 = mx.sym.FullyConnected(data=book2, num_hidden=num_hidden, no_bias=True)
book = book1 + book2

Now you have fullyconnected0_weight and fullyconnected1_weight and you can freeze one and let the other train.


#5

This is an elegant solution Safrooze!
Really appreciate your help!