Transformer

gold_piggy · July 4, 2019, 1:35am

https://d2l.ai/chapter_attention-mechanisms/transformer.html

Siyang · January 3, 2020, 7:32pm

I found this implementation somewhat contradictory to the text description.

Shall we create num_heads copies of W_q, W_k, W_v? Or equivalently use hidden_size*num_heads as the first parameter to the dense layers like W_q, W_k, W_v?
Shall we have num_heads copies of attention layers, instead of one?

ZI_HE · February 23, 2020, 11:24pm

Yeah. I also think the implementation for multi-head attention is wrong here and what you suggest is correct.

aker218 · March 11, 2020, 3:01am

I think the implementation has the same effect as the text description.

1.For W_q,W_k,W_v, the implementation here use dense layer of hidden_size*num_heads units, while the paper use num heads copies of dense layer of hidden_size units. I think the ith copy of dense layer is equivalent to the [(i-1) *num_hidden_size:i *num_hidden_size] units in the big dense layer. There is the same effect of back propagation in training using one layer or num heads copies of small layer.
2. For attention layers,transformer use Dotproduct attention which has no parameters to train, it just compute matrix multiplication and dotproduct of inputs. Using one or more has nothing different.

TristonC · March 31, 2020, 9:01pm

@gold_piggy the link does not work. It shows following error.

403 Forbidden

    Code: AccessDenied
    Message: Access Denied
    RequestId: 1FC5C3A1976FBE23
    HostId: I8mK3i9om5k4II980EvBAYXCdYigJcQgCr32AVziVpRDxsGu6kf8yVaikbMnU9F3dzYeAzWiib0=

An Error Occurred While Attempting to Retrieve a Custom Error Document

    Code: AccessDenied
    Message: Access Denied

gold_piggy · April 20, 2020, 4:14pm

Hi @TristonC, it should be working now. Apologize for the broken link as we are currently revising the contents and knowledge flow now.

TristonC · April 24, 2020, 8:19pm

It works now. thanks @gold_piggy

Topic		Replies	Views
Attention D2L Book	9	1881	June 25, 2020
From Dense Layers to Convolutions D2L Book	4	1338	August 17, 2019
Multiple output layers and multiple losses handling Discussion	2	1346	June 13, 2018
Multilayer Perceptron in Gluon D2L Book	1	733	August 12, 2019
Implementation of a Recurrent Neural Network from Scratch D2L Book	3	927	April 17, 2020

Transformer

Related Topics