Set and Freeze weights of Embedding layer

Used gensim to create my own word2vec model based on my own text need to create embedding with this but don’t want weights to change since its already trained. This is part of a next character rnn model.

@hskramer,

You can use it like this:

net.embedding_layer.collect_params().set_attr('grad_req', null)

That way the parameters won’t have any gradient and won’t change.

I was pretty sure I had tried that one and I receive the error: AttributeError: ‘Sequential’ object has no attribute ‘embedding_layer’, I have also used the freeze weights from a post on stackoverflow by Sergei on freezing weights this looked like it might work until I initialize the net then I receive a similar error. The one that comes closest is freezing the the weights with the method by Sergei then I use net[1].embedding0_weights = my_word2vecweigts,
followed by net[1:].initialize(mx.init.Xavier(), ctx=ctx). This one errors out in training with parameter 'embedding0_weight has not been initialized even though it has weights that are nd.arrays

The code:

vocab_size = word2vecweights[0] my vocab size
embed_size = word2vecweights[1] the embed size is  100 I found this worked best 
it will produce this when asking for similar words:

network -> networks (0.40), connected (0.28), fully (0.27), subclass (0.26)

net = nn.Sequential() 
net.add(nn.Embedding(myword2vec_weights, output_dim=embed_size),
        rnn.LSTM(num_hiddens),
        nn.Dense(vocab_size))

This function produces the training data

train_x = nd.zeros([len(sentences), max_sentence_len], dtype='float32')
train_y = nd.zeros([len(sentences)], dtype='float32')
for i, sentence in enumerate(sentences):
    for j, word in enumerate(sentence[:-1]):
        train_x[i, j] = word2idx(word)
        train_y[i] = word2idx(sentence[-1])
print('train_x shape:', train_x.shape)
print('train_y shape:', train_y.shape)

Dataloader so far no issues with this.

train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(train_x, train_y), batch_size=64, shuffle=False) 

Train is very simple for now until I figure out how to freeze my embed weights since this were trained and produce nice results when used just like word2vec in gluonnlp which won’t recognize my word2vec I have tried making a vocab and everything else shown in the tutorials and api’s I’ve read.

lr = 0.5
num_epochs = 10
metric = mx.metric.Accuracy()


for epoch in range(num_epochs):
    metric.reset()
    
    for (source, target) in train_data:
        source = source.as_in_context(ctx)
        target = target.as_in_context(ctx)
    
        with autograd.record():
            output = net(source)
        
            l = loss(output, target)
        
        l.backward()

Is their a way to make a next char rnn using gluonnlp and produce text similar to want its trained with. I think its clear now that I like to produce text perplexity is necessary for comparing one model to another but has little meaning to most people where as apposed to saying this model has perplexity x and here’s some output and this model has perplexity y and here’ some better output now they have some idea of perplexity and why its important. This opens up the possibility to talk about AI and the impact it is having on our society (benefits and risks).

@hskramer, net.embedding_layer is just one hypothetical way to access your embedding layer. This would work for example if you had set your embedding layer as an attribute of your network.

in your specific case you would need to use:

net[0].collect_params().set_attr('grad_req', null)

“Is their a way to make a next char rnn using gluonnlp and produce text similar to want its trained with.”
=> The code I shared with you here: Language models and BeamSearch/SeqGenaration does that

Still have not been able to get it to work have even gone so far as to create a new virtual environment for mxnet numpy since gensim creates the word vectors as numpy vectors.
This is a list of the other things I have tried and the errors received standard mxnet:

initialize whole net then set weight to pretrained and set grad_req = null like you showed receive error during out = self.foward(*args) followed by iteration through the params
TypeError: ‘NDArray’ object is not callable.

set weights then I have to initialize net[1].init… or I recieve another error
ValueError: The truth value of an NDArray with multiple elements is ambiguous.

with net[1].init and net[0] set to pretrained weights with grad_req = null, looks like it going to train but then I receive this error:
TypeError: ‘NDArray’ object is not callable on the forward pass

With the numpy version receive similar error just replace NDArray with numpy.ndarray.

Despite all the time and effort I have put into this I’m OK with it, I have learned so much and for me that’s enough leaning is always my end goal. If I can get it to work that would be even better, but I’m out of ideas. Thank so you for all the help you have provided. If you have any more suggestions that be great.

@hskramer, can you please share your entire code? I’ll see if I can fix it.

import string
import re
from gensim.models import Word2Vec

import numpy as np
import mxnet as mx
import gluonnlp as nlp
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn, rnn

url = 'https://raw.githubusercontent.com/maxim5/stanford-tensorflow-tutorials/master/data/arxiv_abstracts.txt'
path = download(url)
path = 'data/nlp_data/texts/arxiv_abstracts.txt'

print('\nPreparing the sentences...')
max_sentence_len = 50
with open(path) as file_:
    docs = file_.readlines()
sentences = [[word for word in doc.lower().translate(string.punctuation).split()[:max_sentence_len]] for doc in docs]
print('Num sentences:', len(sentences))

print('\nTraining word2vec...')
model = Word2Vec(sentences, size=100, min_count=5, window=5, workers=16, iter=100)
pretrained_weights = model.wv.vectors
vocab_size, emdedding_size = pretrained_weights.shape
print('Result embedding shape:', pretrained_weights.shape)
print('Checking similar words:')
for word in ['model', 'network', 'train', 'learn']:
    most_similar = ', '.join('%s (%.2f)' % (similar, dist) for similar, dist in model.wv.most_similar(word)[:8])
    print('  %s -> %s' % (word, most_similar))

model.save('model')
model = 'model' 
model = Word2Vec.load(model)

pretrained_weights = model.wv.vectors
pretrained_weights.shape

def word2idx(word):
    return model.wv.vocab[word].index
def idx2word(idx):
    return model.wv.index2word[idx]

train_x = nd.zeros([len(sentences), max_sentence_len], dtype='float32')
train_y = nd.zeros([len(sentences)], dtype='float32')
for i, sentence in enumerate(sentences):
    for j, word in enumerate(sentence[:-1]):
        train_x[i, j] = word2idx(word)
    train_y[i] = word2idx(sentence[-1])
print('train_x shape:', train_x.shape)
print('train_y shape:', train_y.shape)


vocab_size = pretrained_weights.shape[0]
embed_size = pretrained_weights.shape[1]

net = nn.Sequential()
net.add(nn.Embedding(pretrained_weights, output_dim=embed_size),
        rnn.LSTM(num_hiddens),
        nn.Dense(vocab_size))


def sample(preds, temperature=1.0):
    if temperature <= 0:
        return nd.argmax(preds)
    preds = nd.asarray(preds).astype('int32')
    preds = nd.log(preds) / temperature
    exp_preds = nd.exp(preds)
    preds = exp_preds / nd.sum(exp_preds)
    probas = nd.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_next(text, num_generated=10):
    word_idxs = [word2idx(word) for word in text.lower().split()]
    for i in range(num_generated):
        prediction = net[1](nd.array(word_idxs))
        idx = sample(prediction[-1], temperature=0.7)
        word_idxs.append(idx)
    return ' '.join(idx2word(idx) for idx in word_idxs)


train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(train_x, train_y), batch_size=64, shuffle=False)

net[1].initialize(mx.init.Xavier(), ctx=mx.cpu())

loss = gluon.loss.SoftmaxCrossEntropyLoss()

trainer = gluon.Trainer(net[1].collect_params(), 'sgd', {'learning_rate': 0.8})
ctx = mx.cpu()

net[0].weight.data = pretrained_weights
net[0].collect_params().setattr('grad_req', 'null')

lr = 0.5
num_epochs = 10
metric = mx.metric.Accuracy()


for epoch in range(num_epochs):
    metric.reset()
    
    for (source, target) in train_data:
        source = source.as_in_context(ctx)
        target = target.as_in_context(ctx)
    
        with autograd.record():
            output = net(source)
        
            l = loss(output, target)
        
        l.backward()

I was just working in with cpu context until working then planned on moving to gpu.

1 Like

Since I was waiting I decided to just use all gluonnlp and mxnet finally have a nice embedding using word2vec 6B.100d. Might need help with this hopefully not. Just out of curiosity I ran the original tf code (I’m using as a template) on colab and it ran with some deprecation warnings. Thanks again for all the help if fixing the above becomes too much trouble I can work with gluonnlp. Still it would be nice to know how to integrate gensim with mxnet/glunnlp.

You don’t need to fix what I was working on I moved in an entirely different direction. Probably will go with word2vec but have also used the tokenizer used in the ELMo example along with the ELMoCharVocab. Too much help can hurt the learning process it acts like a crutch, but I could use some pointers in the future if that’s OK.

I’m not sure why you haven’t responded to any of my post what ever it is I hope it’s not too serious. It was nice to finally have someone respond to all my posts I had almost given up. This will be my last post regarding this issue unless I hear something.