Slow speed of mallocing gpu memory using mxnet built from source


#1

when using the mxnet built from source, the mxnet doesn’t start training directly,it mallocs gpu memory slowly(observed by nvidia-smi)。no such problem when using pip-version mxnet.

test example: mxnet/example/ctc
hardware: tesla p4( i will test on 1080 later)


#3

get a solution from: https://github.com/apache/incubator-mxnet/issues/3239

but still slower than pip-verison.