I use GLuon to implement SSD, and I run the SSD and Faster RCNN examples in official folder, the space of graphics card is increasing until it explodes


#1

I use GLuon to implement SSD, and I run the SSD and Faster RCNN examples in official folder, the space of graphics card is increasing until it explodes… When I re-run the code from the stop point, the space of graphics card becomes the space at the beginning, how to release the space of graphics card? Any advice will be appreciated, thank you.


#2

Hey @hdjsjyl,

Space on the graphics card is is released when the processes terminate. Is your program seg faulting unless you check point before the seg fault, and start from the check point? Can you give explicit repro steps (which notebook), which graphics card, if you made any changes to the notebooks?

Vishaal


#3

I use mxnet-cu90, GTX TITAN X, 12G memory. My program doesn’t have seg faulting problem. The error is:
Traceback (most recent call last):
File “/home/cougarnet.uh.edu/lshi22/mxnet-SCFD/train.py”, line 140, in
args.save_folder, args.annoPath)
File “/home/cougarnet.uh.edu/lshi22/mxnet-SCFD/train.py”, line 77, in train
conf_losses += nd.mean(conf_loss).asscalar()
File “/home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py”, line 1990, in asscalar
return self.asnumpy()[0]
File “/home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py”, line 1972, in asnumpy
ctypes.c_size_t(data.size)))
File “/home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/base.py”, line 251, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:22:24] src/storage/./pooled_storage_manager.h:143: cudaMalloc failed: out of memory

Stack trace returned 10 entries:
[bt] (0) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x39008a) [0x7ff25405908a]
[bt] (1) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x3906c1) [0x7ff2540596c1]
[bt] (2) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31af813) [0x7ff256e78813]
[bt] (3) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b3e65) [0x7ff256e7ce65]
[bt] (4) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x31b7051) [0x7ff256e80051]
[bt] (5) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x3391585) [0x7ff25705a585]
[bt] (6) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2d54d24) [0x7ff256a1dd24]
[bt] (7) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b8ffa0) [0x7ff256858fa0]
[bt] (8) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2b90616) [0x7ff256859616]
[bt] (9) /home/cougarnet.uh.edu/lshi22/dl-mxnet/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2af0279) [0x7ff2567b9279]


#4

@hdjsjyl And which notebook was it in the examples folder, and did you make any changes or just run it verbatim? Thanks!


#5

Now I write the code by myself in terms of the Gluon documentation. Thanks