Implementation of a Recurrent Neural Network from Scratch

mli · November 28, 2018, 12:06am

https://en.diveintodeeplearning.org/chapter_recurrent-neural-networks/rnn-scratch.html

shahensha · September 11, 2019, 4:19pm

While it will be clear to the observant reader, it could confuse some people as to why we need a separate init_rnn_state() function.

Let us assume that the first input to the RNN is called X_1 and the first output created by the RNN is called y_1.

Consider the general equation:

\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh} + \mathbf{b}_h)

Since t=1, we have:

\mathbf{H}_1 = \phi(\mathbf{X}_1 \mathbf{W}_{xh} + \mathbf{H}_{0} \mathbf{W}_{hh} + \mathbf{b}_h).

Now what is this H_0. This is a state which MUST be present even before RNN starts processing the first input. Hence for the code to work, it is very important that we initialise this. Even if we choose to initialise this to 0, we have to make sure that the dimensions of this matrix are such that the above equations holds true.

If you look closely, you will realise that the dimensions of any H_t is in reality \text{batch_size} \times \text{num_hidden_units}

Since batch_size is not static at the time of creating the network, we need to initialise this H_0 vector at runtime and hence the need for a separate function.

srsridharan · September 25, 2019, 11:35am

param.grad[:] *= theta / norm

Upon trying to implement the original version I received an error regarding inability to “subscript a function” perhaps it should be param.grad()[:] .

sanjaradylov · April 17, 2020, 11:27am

I wonder if there is a convenient way to initialize hidden state after full training, prior to making predictions. If our model is trained and fine-tuned and we wish to obtain multiple predictions starting from the beginning-of-sentence token <bos>, we expect them to be different. As far as I can tell, if the hidden state is initialized with zeros (by default) in predict_ch8, the model will predict the same sequences every launch.

Topic		Replies	Views
Text Sentiment Classification: Using Recurrent Neural Networks D2L Book	4	1344	April 19, 2019
Rnn internal Hidden stade handling Gluon	1	523	June 4, 2018
Concise Implementation in Recurrent Neural Networks D2L Book	2	894	June 18, 2019
Difficulties with recurrent network Gluon	0	440	August 11, 2020
Bidirectional LSTM model state output data format	2	425	April 21, 2020

Implementation of a Recurrent Neural Network from Scratch

Related Topics