Dropout in LSTM blocks


#1

Apologize if I am missing something obvious, I have an LSTM-related question.

In Keras, LSTM layer (https://keras.io/layers/recurrent/#lstm) has

dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.

If my understanding is correct, the Gluon LSTM dropout doesn’t correspond to any of these:

dropout (float, default 0) – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer.

Could anyone please shed some more light on this? Can we match the Keras functionality somehow?


#2

RNN (LSTM) implementation in MxNet/Gluon comes from cuDNN and the semantics of dropout is similar to what is supported in cuDNN: Dropout will be applied between layers; a single layer network will have no dropout applied.

@piiswrong do you have any thoughts on this?