Seeking Synonyms and Analogies

https://d2l.ai/chapter_natural-language-processing-pretraining/similarity-analogy.html

Hi, I found this chapter really interesting. I want to evaluate how well this might perform on a larger scale dataset of analogies. Does anyone know where I might find such a dataset?

I have tried it on some cases and see that it fails. For example:

“eagle” is to “bird”, as “beagle” is to “____”. (dog). The model returns “influenza”.

I’m trying to get a better grasp on how one might evaluate these analogies more generally. It seems like any answer will be biased by distance in the distributional space: Closer targets are more likely to be hit. If it happens to be further way, then even if the analogy is completely valid, it is very unlikely to be predicted correctly.

If anyone could give me a tip on how to set up a small experiment to explore this further I’d be very grateful!

Am I the only having issue with the imports ? I can’t run this code.

How much time it take on an average?

Analogy datasets include: Google Analogy set and Bigger analogy test set.

To make it easy to run experiments, we used 50-dimensional GloVe in this section. Increasing its dimension usually helps.

Have you followed the installation section:
https://d2l.ai/chapter_installation/index.html

Running this section may take less than 2 mins, depending on your hardware.