Memory leak when running cpu inference

memory
python
gluon-cv
#1

I’m running into a memory leak when performing inference on an mxnet model (i.e. converting an image buffer to tensor and running one forward pass through the model).

A minimal reproducable example is below:

import mxnet
from gluoncv import model_zoo
from gluoncv.data.transforms.presets import ssd

model = model_zoo.get_model('ssd_512_resnet50_v1_coco')
model.initialize()

for _ in range(100000):
  # note: an example imgbuf string is too long to post
  # see gist or use requests etc to obtain
  imgbuf = 
  ndarray = mxnet.image.imdecode(imgbuf, to_rgb=1)
  tensor, orig = ssd.transform_test(ndarray, 512)
  labels, confidences, bboxs = model.forward(tensor)

The result is a linear increase of RSS memory (from 700MB up to 10GB+).

Libraries used: gluoncv==0.3.0, mxnet-mkl==1.3.1

The problem persists with other pretrained models and with a custom model that I am trying to use. And using garbage collectors does not show any increase in objects.

This gist has the full code snippet including an example imgbuf.

#2

This very likely due to you adding ops faster than MXNet is able to process them.
MXNet is foundamentally asynchronous, it runs on eager execution. When you call forward, you effectively say, compute this forward as soon as possible. The python callbacks returns which allows very simple and intuitive parallelism.
To properly benchmark you need to add a synchronous call.
For example mx.nd.waitall() or labels.wait_to_read() or bboxs.asnumpy() etc

1 Like
#3

Hey thanks for the quick reply.

You are right, in the example above adding the synchronous call stops the memory increasing.

In my actual use case (which I tried to simplify above, but clearly not properly!) I actually already had this in place, and am still seeing constant memory increase. My program uses a queue system to feed image buffers to a function which does the tensor transformation and forward pass, then puts the result back on a different queue. If I perform this without the mxnet component (e.g. either the function returns a fake result, or the function does some ML work using a different library such as pytorch) then the memory is stable.

Any ideas on what may be causing this? Or do you know if there is a way to force mxnet to release all memory?

Thanks!

#4

Could you share a bigger snippet of your code?
MXNet should release the memory once it is out of scope, it gets garbage collected.
My hunch is that you are calling nd.array somewhere and keeping a reference to that object.

#5

Sorry for hijacking.
Have a similar problem, where i repeatedly call a function that loads a model and returns a prediction and memory keeps increasing with number of calls to that function.

here some stripped down example code:

import mxnet as mx
import numpy as np
import cv2
from IPython import embed

CTX = mx.cpu()

def resize(img, img_dims):
    img = cv2.resize(img, (img_dims[0], img_dims[1]))
    img = np.swapaxes(img, 0, 2)
    img = np.swapaxes(img, 1, 2)
    img = img[np.newaxis, :].astype(np.float32) / 255.0
    return mx.nd.array(img)

def predict():
    img = cv2.imread('/path/to/some/image.jpg')
    small_img = resize(img.copy(), (224,224))
    model_name = "/path/to/model.json"
    model_params = "path/to/model.params"
    model = mx.gluon.nn.SymbolBlock.imports(model_name, ['data'], model_params, ctx=CTX)
    return model(small_img).asnumpy()


def main(repeats=3):
    for i in range(repeats):
        print(i)
        result = predict()

if __name__ == '__main__':
    main(40)
    embed()

mxnet = 1.3.0
python = 3.6.6

The idea was to load and predict inside a function such that memory would be freed up once the function call is done and model/data are out of scope.

thanks!

#6

Hi, two things.

First, regarding the loop, it’s the same issue as above, the engine is asynchronous so what’s happening is that you’re giving it work it faster than it can complete it.

Add a synchronous call the loop, e.g. print(i, result) or mx.nd.waitall()

Also see discussion above by Thomas, et. al.

Second, you shouldn’t re-load your model on each invocation. It’s better to have a class that loads it once and re-uses it for each prediction. :slight_smile:

VIshaal

#7

Hi @VishaalKapoor,

thanks for the reply.
.asnumpy() and mx.nd.waitall() do not prevent this problem from happening unfortunately. As for the load-model-once-make-several-predictions approach: that reduces the problem to some extend as the memory is still continuously increasing, but at a lower rate than with the model-load also happening inside the loop.
Secondly, our use case is server-ish in nature, i.e. depending on the input/request a different model is loaded and used for prediction (which, agreed, is a debatable design decision), so keeping all models in memory at all times is not ideal from a resource point of view.

Could it be an issue with the nn-model itself?

Cheers
Andre

#8

Are you seeing out of memory errors followed by segfaults?

MXNet will re-use memory but the usage may appear to be going up if you look at nvidia-smi. If you see an eventual OOM error than something is wrong. It’s unlikely to be a memory leak, and more likely that you’re hanging on to references of memory some how.

If your model has a fixed number of params it shouldn’t be causing the issue you’re seeing.

I can’t say further without seeing the model. If it’s something you can attach to the post, it would be helpful to debug.