I’m running a Keras LSTM-based sequence classifier for a recommender system, where the goal is to predict next item consumed, given a sequence of items.
Data is sequence of strings, that are pre-processed using sklearn
sequence.pad_sequences from keras.
On a given architecture and optimizer setting, by just switching jupyter kernel in a sagemaker notebook, keras-MXNet is 25% faster than keras-TF in single GPU (1 V100 on P3.16xl) and 60% faster in multi GPU (8 V100 on P3.16xl) . Are there MXNet-specific optimizations that can be used to push keras-MXNet further? In particular, is it possible to use the following MXNet features in keras-MXNet?
- mixed precision?
- multi-processed loaders?