Multi CPU cores usage


Dear all,

sometimes I want to do inference on my laptop - and the gpu cannot handle the memory load of the models trained on HPC clusters. When I do I am using [ctx = mx.cpu()] but with this configuration I do not see all threads being used. Is there something I can do so as mxnet to take full advantage of all 8 threads I have in my laptop?

Thank you very much for your time.


Hey @feevos,

The first thing you want to do is to make sure you are using mxnet-mkl so that you are taking advantage of the parallelization offered by mkldnn.

pip install mxnet-mkl

You can read more on this medium post:

From the article they suggest setting these env variables to get the maximum performance:

export KMP_AFFINITY=granularity=fine,compact,1,0
export vCPUs=`cat /proc/cpuinfo | grep processor | wc -l`
export OMP_NUM_THREADS=$((vCPUs / 2))

If the problem persists, try with:

export OMP_NUM_THREADS=`cat /proc/cpuinfo | grep processor | wc -l`


Thank you very much @ThomasDelteil - indeed by changing to mxnet-cu90mkl I got a boost, and now I see 4 threads being used (just like what the authors suggest - for my 8 thread laptop).