Seq2seq Attention

http://d2l.ai/chapter_attention-mechanisms/seq2seq-attention.html

1 Like

Does seq2seq with attention work better than directly attending on the input embeddings?

Instead of using a transformer, if we just employ self-attention as a pooling mechanism over the input embeddings, would the performance be remarkably different from seq2seq with attention? Is this dataset specific?