Implementation of a Recurrent Neural Network from Scratch

https://en.diveintodeeplearning.org/chapter_recurrent-neural-networks/rnn-scratch.html

While it will be clear to the observant reader, it could confuse some people as to why we need a separate init_rnn_state() function.

Let us assume that the first input to the RNN is called X_1 and the first output created by the RNN is called y_1.

Consider the general equation:

\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh} + \mathbf{b}_h)

Since t=1, we have:

\mathbf{H}_1 = \phi(\mathbf{X}_1 \mathbf{W}_{xh} + \mathbf{H}_{0} \mathbf{W}_{hh} + \mathbf{b}_h).

Now what is this H_0. This is a state which MUST be present even before RNN starts processing the first input. Hence for the code to work, it is very important that we initialise this. Even if we choose to initialise this to 0, we have to make sure that the dimensions of this matrix are such that the above equations holds true.

If you look closely, you will realise that the dimensions of any H_t is in reality \text{batch_size} \times \text{num_hidden_units}

Since batch_size is not static at the time of creating the network, we need to initialise this H_0 vector at runtime and hence the need for a separate function.