Word embedding training example


#1

I am following the steps in

https://gluon-nlp.mxnet.io/examples/word_embedding/word_embedding_training.html

which returns similar results until the following step:

example_token = “vector”
get_k_closest_tokens(vocab, embedding, 10, example_token)

which does not return similar tokens to “vector”.

closest tokens to “vector”: is, in, zero, a, one, two, of, the, and, to

It appears that word vector for the example token are all zeroes and presumable all words in the vocabulary are likewise. I thought that the intialization is based on its ngrams?

When I set the model to train , the result is the same at the end of training


#2

I just downloaded and ran the example and it worked fine for me. Do you have the latest gluonnlp pip package installed? I’m using gluonnlp-0.5.0.post0 with mxnet 1.3.1.


#3

Yes I have gluonnlp==0.5.0 and mxnet==1.3.1. I am running this on python 2 on Ubuntu 16.04 LTS. Maybe it 's an issue with other libraries?


#4

That’s very strange. What’s your development environment? I used a SageMaker notebook.