How to allocate fixed amount of gpu memory?


Is there a way to allocate fixed amount of gpu memory for a program in mxnet (like tensorflow)? Currently, I have a training script which take days to run and its gpu memory usage fluctuates from time to time. When other users accidentally use the same gpu, my program will crash because of OOM.


There is currently a limited number of things you can do to manage GPU memory. Take a look into possible environment variables -


If you are using gluon, you can try net.hybridize(static_alloc=True)


For those who have root privilege, you can change the Compute Mode of the target GPU to “EXCLUSIVE_PROCESS”, which means only one context is allowed per device, usable from multiple threads at a time. Please refer man nvidia-smi, e.g.:

sudo nvidia-smi -c 3 -i 1