We are upgrading our model hosting production system from MXNet 1.1 (built with MKL but no MKLDNN) to MXNet 1.3 (built with both MKL and MKLDNN and with MXNET_MKLDNN_ENABLED=1 as default). We noticed significant latency degradation from 20 ms/doc to 100 ms/doc with MKLDNN enabled. But after disable MKLDNN by setting MXNET_MKLDNN_ENABLED=0, we observed latency was back to 20 ms/doc.
We also tested the LSTM-PTB model (in mxnet-model-server example) in the same environment. With that model, we actually found that MKLDNN did improve latency instead of making it worse. Therefore, we suspected the latency problem we saw was probably due to how we build our model.
Our model used the following Symbol operators that seemingly not supported in MKLDNN (according to http://intel.github.io/mkl-dnn/):
mx.sym.SliceChannel() mx.sym.ElementWiseSum() mx.sym.Activation(..., act_type="sigmoid") mx.sym.broadcast_mul() mx.sym.Dropout()
We did use other operators such as
mx.sym.SoftmaxActivation(), but those seem to be supported by MKLDNN and we assume they shouldn’t be a problem.
My questions are:
- Is the latency issue caused by our model using operators such as
mx.sym.broadcast_mul()that are seemingly not supported by MKLDNN?
- For short term, should we just disable MKLDNN and use MKL instead?
- For long term, should we expect something like https://github.com/apache/incubator-mxnet/issues/13598 would fix our problem by using MKLDNN for the supported operators and falls back to MKL for the other operators?