How to limit GPU memory usage

We plan to share one GPU machine.
So I need to specify the memory usage at runtime.
(Example A:40%, B:40%)
Exists in TF. 「gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.4」
Is there such a way with mxnet?
Give me some advice!

According to this file gpu memory storage manage - github

If the MXNET_GPU_MEM_POOL_TYPE environment variable is not set, then set MXNET_GPU_MEM_POOL_RESERVE this environment variable to the value x (0 -> 100), meaning x% gpu memory would not used.

However, if this value is too larger, the time used on re-allocating gpu memory would be larger.

2 Likes

Thank you for your reply.
Following your advice,I tried running.

MXNET_GPU_MEM_POOL_TYPE:Not set
MXNET_GPU_MEM_POOL_RESERVE:60
(I confirmed with the set command of anaconda)

However, the usage rate did not fall below 40% and did not change…(81%)
What’s wrong with my setup.

Attached the confirmation result of GPU usage rate

I believe the MXNET_GPU_MEM_POOL_RESERVE environment variable is just a hint to MXNet to release all ever allocated but now FREED gpu memories, which is served as possible reuse in the future. If your network is large (for example, the gpu cannot run two such networks simultaneously), one way to handle this is to implement the memory sharing between shallow layers and deeper layers’ calculation, or more aggressive inplace operators, either of these two methods could involve some workload to the DL frameworks.

BTW, have you tried the network using TF with same GPU memory budget without an OOM error raised ? As far as I know, the MXNet already have a good implementation to use less gpu memory.

And, the GPU Load means the calculation ability (for example, the cuda cores) used by current application, but not memory used by 81 % in my opinion, where higher means better use of GPU. Instead, the Memory Used indicate the usage of gpu memory, you can have a look of this value if it have a change after modifying the mentioned environement variable.

@TriLoon Any other thoughts on this? It seems like you have quite a bit more experience, but we’re facing a similar issue.

MXNET on InsightFace is taking up around 7GB memory on inference. That seems large?

You can have a look to this paper sublinear memory usage, which include some common solutions used by DL frames to lower gpu memory usage.

If you only care about Forward Inference, you can try to change the batchsize to a small value (at the cost of speed); quantize the network (int8, float16 etc.), as far as I know, the mkldnn backend of mxnet support this, otherwise, tensorrt also have a good support to model quantization. Some other methods you can also have a try.

How about the gpu memory cost of same model using TF or PyTorch?

I strongly recommend that execute AT MOST ONE training instance in a GPU.

If there are 2 or more instances in the same GPU, the training time will be longer than training these two instances sequentially.

(That’s what I found using Linux, as for Windows, the training speed is often lower than Linux. If you’re training a big network, It is better to install a Linux system (I’m using Manjaro and training network now.))

You should refer to Memory Used(how much space is used) rather than GPU Load(how much percent of time is used)
Notice that 7626/.6=12710, the environment MXNET_GPU_MEM_POOL_RESERVE:60 actually works