https://en.diveintodeeplearning.org/chapter_recurrent-neural-networks/rnn-scratch.html

# Implementation of a Recurrent Neural Network from Scratch

While it will be clear to the observant reader, it could confuse some people as to why we need a separate `init_rnn_state()`

function.

Let us assume that the first input to the RNN is called X_1 and the first output created by the RNN is called y_1.

Consider the general equation:

\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{xh} + \mathbf{H}_{t-1} \mathbf{W}_{hh} + \mathbf{b}_h)

Since t=1, we have:

\mathbf{H}_1 = \phi(\mathbf{X}_1 \mathbf{W}_{xh} + \mathbf{H}_{0} \mathbf{W}_{hh} + \mathbf{b}_h).

Now what is this H_0. This is a state which MUST be present even before RNN starts processing the first input. Hence for the code to work, it is very important that we initialise this. Even if we choose to initialise this to 0, we have to make sure that the dimensions of this matrix are such that the above equations holds true.

If you look closely, you will realise that the dimensions of any H_t is in reality \text{batch_size} \times \text{num_hidden_units}

Since batch_size is not static at the time of creating the network, we need to initialise this H_0 vector at runtime and hence the need for a separate function.

param.grad[:] *= theta / norm

Upon trying to implement the original version I received an error regarding inability to “subscript a function” perhaps it should be param.grad()[:] .