I have merged the LSTM-based language models with sequence and beam search. The text output from Sherlockholmes is given as the input text followed mostly by it would be nice to see if it can produce better text. I know perplexity is the most important measure of nlp models since they can’t be compared by the text they produce, but text is more interesting. I have used the d2l and straight dope example for text generation, but I know the LSTM LM and transformer models can do much better.
To produce high quality text generative model a few tricks are useful:
- Use long sentences / paragraph / documents when training your language model. That’s mainly why GPT-2 gave such good results with the WebText dataset, and why text generated from being trainined on the Google 1 Billion word dataset is so poor (only randomized short sentences).
- Use very large model (200M+ parameters)