Does anyone have working examples of how to use the Attention cell in the gluon NLP package? How general purpose is it and can they be stacked to make a hierarchical attention model?
You can find an example usuage of the
AttentionCell in the Google Neural Machine Translation System example found here.
@Sergey has been working on a model using attention so maybe able to give some additional advice if you need it.