Attention

gold_piggy · July 4, 2019, 12:56am

https://d2l.ai/chapter_attention-mechanisms/attention.html

TristonC · February 22, 2020, 1:06am

The link does work with error:

403 Forbidden

Code: AccessDenied
Message: Access Denied

Knulph · February 22, 2020, 8:25pm

@TristonC, I think that link now is this:

http://d2l.ai/chapter_attention-mechanisms/index.html

gold_piggy · February 25, 2020, 12:31am

It should work now! Thanks for pointing it out!

miguelrc · March 10, 2020, 6:40pm

I think there is an problem between the Multiulayer Perceptron Attention definition and the implementation:

\exists x,y \in \mathbb{R}: tanh(x+y) \neq tanh(x)+ tanh(y)

The reason I say that it’s because after the sum we are basically adding the tanh(x)+ tanh(y) together rather than adding them first adding and then applying tanh.

Please let me know if my understanding is wrong!

jakeyap · May 3, 2020, 9:45am

Is there a problem with notation inside the code of MLPAttention?
The code is listed as
query, key = self.W_k(query), self.W_q(key)
But the math notation tells me it should be
query, key = self.W_k(key), self.W_q(query)
Not that it will break anything, but for clarity’s sake. Is there anything I am missing?

vahuja4 · June 6, 2020, 9:16am

Can someone please explain why, in the code below, valid_len is set equal to the batch_size? From the masked_softmax code, I can understand the role of valid_len, but it is not clear why it should be equal to the batch_size. An example would be great.

#@save
class DotProductAttention(nn.Block):
    def __init__(self, dropout, **kwargs):
        super(DotProductAttention, self).__init__(**kwargs)
        self.dropout = nn.Dropout(dropout)

    # query: (batch_size, #queries, d)
    # key: (batch_size, #kv_pairs, d)
    # value: (batch_size, #kv_pairs, dim_v)
    # valid_len: either (batch_size, ) or (batch_size, xx)
    def forward(self, query, key, value, valid_len=None):
        d = query.shape[-1]
        # Set transpose_b=True to swap the last two dimensions of key
        scores = npx.batch_dot(query, key, transpose_b=True) / math.sqrt(d)
        attention_weights = self.dropout(masked_softmax(scores, valid_len))
        return npx.batch_dot(attention_weights, value)

sanjaradylov · June 13, 2020, 3:53pm

@vahuja4, recall that a valid length is the length of a sequence without appended padding tokens. In our code, valid_len is not a single number but rather an array (or even matrix) of shape (batch_size, ). So, all the comments describe the shapes of the formal parameters of forward, not their values. In particular, valid_len must be of shape (batch_size,), i.e. for every entry in a batch, specify a valid length; e.g. batch_size = 4, valid_len = np.array([3, 5, 5, 2]).

What about the case when valid_len is a matrix? I guess it is expected to be of shape (batch_size, #queries).

Klkjer_Jxue · June 25, 2020, 5:10pm

the pytorch MLP attention version is missing the tanh operator on the sum

Klkjer_Jxue · June 25, 2020, 5:13pm

also, is there any reason why the bias term is not added in the MLPAttention?

Topic		Replies	Views
Transformer D2L Book	6	1365	April 24, 2020
Softmax Regression D2L Book	10	1524	April 17, 2020
Concise Implementation of Linear Regression D2L Book	11	1853	May 30, 2020
Exercise 1 in 5.1.6 (chapter_deep-learning-computation, model-construction) D2L Book	0	311	May 5, 2020
Multilayer Perceptron in Gluon D2L Book	1	733	August 12, 2019

Attention

Related Topics