GluonCV, Faster RCNN, normalization layers


#1

The gluon-cv faster r-cnn model uses a special resnet50 model that “denormalizes” the input image if I understand it correctly. I assume this is added so the pretrained weights from the mxnet model zoo could be reused.

However I was wondering if this step could actually be left out by simply not normalizing the image in the dataloader?


#2

@ifeherva It seems you are correct that it is doing the inverse transformation, with a *255 factor missing.
I am not entirely sure what is the reasonning behind the decision of proceeding that way rather than just multiplying the initial image by 255.
@Hang_Zhang @zhreshold could you advise on the reason behind this hard-coded rescale layer?


#3

I tried plugging in the resnet50v2 model from the mxnet model zoo which worked but the performance was much worse.


#4

The reason is fairly simple, due to observations in our experiments, different input scales used did affect performances quite a lot. Whether it’s due to initialization scale or pre-trained model is still unknown.

Before we figuring out a generic solution, we use these hard-coded scaling layers for consistency throughout gluon-cv package.


#5

Thanks for the reply. If I use the resnet50v2 model without the rescaling I get considerably worse recall on my validation set.
See code below:

base_network = mx.gluon.model_zoo.vision.get_model(base_net, pretrained=pretrained_base)
features = base_network.features[:8]
top_features = base_network.features[8:11]
train_patterns = '|'.join(['.*dense', '.*rpn', '.*stage(2|3|4)_conv'])
return model_zoo.get_faster_rcnn(base_net, features, top_features, scales=(2, 4, 8, 16, 32),
                        ratios=(0.5, 1, 2), classes=['BG', 'CLASS1'], dataset=dataset,
                        roi_mode='align', roi_size=(14, 14), stride=16,
                        rpn_channel=1024, train_patterns=train_patterns,
                        pretrained=False)

On the other hand it is at least 3 times faster at inference time. Where does this speedup come?


#6

less classes, therefore less non-maximum-suppression time

We use per-class NMS for best recall, so complexity of NMS is O(N) where N is number of foreground classes.


#7

I turned off normalization in the dataloader and used the mxnet resnet50v2 model. Got better recall (still not as good as the gluoncv resnet50). Still, my model is 2-3 times faster at inference time om gpu. The only difference I see is that

self.layer0.add(nn.BatchNorm(scale=False, epsilon=2e-5, use_global_stats=True))

vs

self.features.add(nn.BatchNorm(scale=False, center=False))

Could use_global_stats be responsible for the speed difference?


#8

Yes, we are investigating the bad perf of BN without CUDNN