My MxNet.module implementation runs at:
Speed: 716.69 samples/sec rmse=0.176675
While my Gluon implementation runs at (after rum hybridize()):
speed: 267.994337 samples/s, training: rmse=0.2200
Is there anything else I can do to help?
My MxNet.module implementation runs at:
Speed: 716.69 samples/sec rmse=0.176675
While my Gluon implementation runs at (after rum hybridize()):
speed: 267.994337 samples/s, training: rmse=0.2200
Is there anything else I can do to help?
There should be no reason why Gluon would run slower, unless something isn’t properly configured. Without knowing details of your implementation, I can only give you some tips:
HybridBlock
end to endDataLoader
, make sure num_workers
is set to your number of CPUs availablehybridize()
set static_alloc=True
and static_shape=True
(i.e. hybridize(static_alloc=True, static_shape=True)
)Could you share your implementation?
Here is my code:
def build_net(hyper_params):
from mxnet.gluon.model_zoo import vision as models
res_net = models.resnet152_v1(pretrained=True)
new_net = gluon.nn.HybridSequential()
with new_net.name_scope():
pretrained_features = res_net.features
new_tail = gluon.nn.HybridSequential()
new_tail.add(
gluon.nn.Dense(hyper_params.NUM_HIDDENS1, activation=hyper_params.ACTIVATION),
gluon.nn.Dropout(hyper_params.DROPOUT),
gluon.nn.Dense(hyper_params.NUM_HIDDENS2, activation=hyper_params.ACTIVATION),
gluon.nn.Dropout(hyper_params.DROPOUT),
gluon.nn.Dense(hyper_params.NUM_OUTPUTS)
)
new_tail.initialize(mx.init.Xavier(magnitude=hyper_params.MAGNITUDE))
new_net.add(
pretrained_features,
new_tail
)
return new_net
bin_net = build_net(hyper_params)
bin_net.hybridize()
This looks ok to me, did you try what @safrooze suggested
HybridBlock
end to endDataLoader
, make sure num_workers
is set to your number of CPUs availablehybridize()
set static_alloc=True
and static_shape=True
(i.e. hybridize(static_alloc=True, static_shape=True)
)Does not work for me:
MXNetError: Cannot find argument ‘static_shape’, Possible Arguments:
inline_limit : int (non-negative), optional, default=2
Maximum number of operators that can be inlined.
forward_bulk_size : int (non-negative), optional, default=15
Segment size of bulk execution during forward pass.
backward_bulk_size : int (non-negative), optional, default=15
Segment size of bulk execution during backward pass.
The training code:
def forward_backward(net, data, label, metric):
losses, outputs = ,
with autograd.record():
for X, Y in zip(data, label):
Z = net(X)
losses.append(loss(Z, Y))
outputs.append(Z)
for l in losses:
l.backward()
metric.update(label, outputs)
return losses
I am using MxNet 1.1 could it be the cause of the problem?
static_alloc
is only on master branch which will be released soon as part of 1.3.0 release. However static_alloc
only helps with the last 10% gap between Gluon and Symbolic. How much difference in performance are you seeing?
Gluon is less than half of Symbolic.
If you are using 1.1, then there’s
The IO is the root cause I think.