Different architecture per batch & parameter sharing across batches for Gluon RNN cells


#1

The standard Bucketing module of MXNet symbolic interface for RNN/LSTMs provides support for different architecture per batch and sharing parameters across batches. This is because MXNet uses the same internal memory buffers among all executors while implementing bucketing module.
http://mxnet-yuweiw.readthedocs.io/en/stable/how_to/bucketing.html

If we want to implement similar bucketing iterator based seq2seq models using MXNet Gluon, what can be a way to implement different architecture per batch and sharing the parameters across batches?

If I initialize an gluon.rnn.LSTMCell in my encoder class (extending the Block/HybridBlock class) and then while implementing the forward method of the Block class, I do LSTMCell.unroll(length_for_specific_batch) for every batch, will the parameters be shared across batches like it’s done for the symbolic graph using Bucketing module?

Very crude pseudo-code:

class BucketingLSTM(Block):
def init:
Initialize LSTMCell with num_hidden and other initializers
def forward:
LSTMCell.unroll(seq_len_for_cur_batch)

Update 1: I tried this and I this does not seem to work. LSTM cell infers the shape from the first batch size and uses that. Any suggestion on how to implement different architecture per batch along with parameter sharing?

Update 2: I was making some mistake in the dimensionality declaration. Here’s a working example which shows RNN can work with batches of different lengths and to me it looks like parameters are shared. Still it’ll be good if someone can verify that my understanding is indeed right.

Notebook: https://gist.github.com/orchidmajumder/dca0fa16882c458fc85b5f139bf37164


#2

Yes, parameters goes with Blocks. As long as you are using the same block parameters are shared.
Sharing parameter between blocks can be done with the params argument during construction


#3

Thanks Eric for confirming.