I noticed that a loop of dummy forward passes (arrays of zeros) will use up nearly all available memory. Why is that so? I assume it is some kind of optimization built into mxnet to use available ressources to speed things up. Is there a way to stop mxnet from doing that?
Here is some sample code. If I run this, all memory of my GPU is allocated, although I am using a very small network and the data is also tiny!
import mxnet as mx from mxnet import gluon import mxnet.autograd as ag class Model(gluon.Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) with self.name_scope(): self.dense0 = gluon.nn.Dense(20) self.dense1 = gluon.nn.Dense(20) self.mydense = gluon.nn.Dense(20, prefix='mydense_') def forward(self, x): x = mx.nd.relu(self.dense0(x)) x = mx.nd.relu(self.dense1(x)) return mx.nd.relu(self.mydense(x)) ctx = [mx.gpu()] net = Model() net.initialize(mx.init.Xavier(), ctx=ctx) repeat_dummy = 1000000 for i in range(repeat_dummy): with ag.record(): data = mx.nd.zeros((64,32,32,1), ctx) output = net(data) del output
Is there a way to force Mxnet to free up gpu memory at the end of the for loop that is not needed anymore?
The problem I have is that I need to run several dummy forward passes, but while this works fine with this example code here (although high memory consumption) if I do it with my own network it results in cuda out of memory exceptions, although the actual training would run without problems with the memory that I have available.
Thanks for any replies!