The GPU memory usage is not stable


#1

When I training my network, the GPU memory usage is not stable. Because I was used the GPU together with others, it cause the memory suddenly lifting and program OOM.


#2

have you ever set your MXNET_CUDNN_AUTOTUNE_DEFAULT.
The default setting lead to auto-tuning with limited workplace and this cause unstable memory usage in my case.

Maybe you can try set it to 0 or 2 if you can accept the trade-off.


#3

Hello, I want to know how to change MXNET_CUDNN_AUTOTUNE_DEFAULT number, my work env is Ubuntu 16.04 LTS, thank you very much!


#4

There are two ways to do that.

  1. mention it in command line before execution, like:
    export MXNET_CUDNN_AUTOTUNE_DEFAULT=1; python my_code.py
  2. mention it in the ur code by using os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT']=1