How to use hybridization for Language modelling (RNN)


#1

I’ll keep it light~
Using an RNN layer like LSTM or GRU gives the ‘non-hybrid children’ error

How to overcome it to create a RNN Language model capable of Hybridization ?

Thanks


What makes the RNN layer unable to hybridize?
#2

Currently RNN blocks are not hybridizable. This is due to the fact that the fused RNN is not implemented for CPU. The work in progress is happening here (https://github.com/apache/incubator-mxnet/pull/10104) and here (https://github.com/apache/incubator-mxnet/pull/10311)


#3

hi,thanks for the answer, but I had a query(this may sound really stupid but-) , if we were to train the RNN layer on GPUs , will its hybridization work ?

Or is there a way to make a language model (using rnn cells) that is hybridizable


#4

Yes you can certainly create a hybrizable RNN by chaining cells together. However the performance is going to be significantly worse than the fusedRNN operator. I have implemented a hybridizable LSTM block that uses LSTM cells, If you really need hybridization, I can share the code.


#5

In case there is a misunderstanding, I want to clarify one thing (sorry, if you already know it).

Hybridization and working on GPU are not related. You can still train your model on GPU and gain from all its glory without hybridization.

Non-hybridazable version is still slower than hybridazable one, but the effect is way less than the difference between executing your model on CPU vs. GPU.


#6

Thanks for the response. So once the fusedRNN operator for CPU project is complete (https://github.com/apache/incubator-mxnet/pull/10104) , one can also run it on the GPU ?

The thing that is slightly confusing to me is the ‘for CPU’ part

I am pretty new to this so thank you for all the help :smiley:


#7

I see. Thanks for clearing that out! And thanks for offer to share the code. but I would like to give a try for it myself first :smiley:

if I end up failing , I hope the offer would still stand :grin:


#8

well I gave it a go and I doubt my attempt is close to a solution :sweat:
I would be really appreciate if you could share that code :sweat_smile:


#9

There you go. Keep in mind that even though this block is hybridizable, it is significantly less efficient than gluon.rnn.LSTM on GPU. So I would only use this if you have to have an end-to-end hybridizable network (for example you want to save your model in json format and run it in C++). Either way, I’d recommend waiting for the hybridizable version of gluon.rnn.LSTM to be released.

class LstmHybrid(gluon.HybridBlock):
    def __init__(self, hidden_dim, seq_len, layout='NTC'):
        """
        :param int hidden_dim: hidden size of the LSTM
        :param int seq_len: sequence length of the unrolled LSTM
        :param int batch_size: optional batch size
        :param str layout: valid options: NTC, TNC, or NCT
        """
        super(LstmHybrid, self).__init__()

        with self.name_scope():
            # T=sequence_length, N=batch_size, C=feature dimension
            self.seq_len = seq_len
            self.layout = layout
            self.lstmcell = gluon.rnn.LSTMCell(hidden_size=hidden_dim)
            self.begin_state_h = self.params.get(
                'begin_state_h', shape=(0, hidden_dim), init='zeros', allow_deferred_init=True)
            self.begin_state_c = self.params.get(
                'begin_state_c', shape=(0, hidden_dim), init='zeros', allow_deferred_init=True)

    def hybrid_forward(self, F, x, begin_state_c, begin_state_h):
        """
        :param mx.nd or mx.sym F: type
        :param mx.NDArray or mx.Symbol x: data in correct layout (N must be before C)
        :param mx.NDArray or mx.Symbol begin_state_c: begin cell state parameter
        :param mx.NDArray or mx.Symbol begin_state_h: begin hidden state parameter
        :return:
        """
        t_axis = self.layout.index('T')
        states = [begin_state_c, begin_state_h]
        outputs = []
        x = F.split(x, self.seq_len, axis=t_axis)
        for i in range(self.seq_len):
            output, states = self.lstmcell(F.squeeze(x[i], axis=t_axis), states)
            outputs.append(output)
        return F.stack(*outputs, axis=t_axis)