The complied mxnet from latest master branch is slower than pip installed version?


#1

Hi,

In order to improve the performance, I clone the master branch and recompile from source based on the tutorial here: https://mxnet.incubator.apache.org/get_started/install.html. The make command I used was:

make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 USE_PROFILER=1

After install the compiled mxnet, the train script ran even slower. Previously every 500 batches ran for about 3 minutes 55 seconds. The same script and same datasets ran for about 4 minutes 10 secs for the re-compiled version. I also uninstalled and re-compiled again without profiler enabled, but it did the same.

Then I uninstalled it, and installed mxnet-cu80 using pip install again. The speed was back to about 3 minutes 55 sec. The script uses single GPU to train neural network model. In term of the performance, is there any optimization that could be done for compiling from source?


#2

Which version did you pip install?

Can you check if the pre release pip version also has the performance issue? If so, that could be a regression.

pip install mxnet-cu80 --pre --user