The gluon-cv faster r-cnn model uses a special resnet50 model that “denormalizes” the input image if I understand it correctly. I assume this is added so the pretrained weights from the mxnet model zoo could be reused.
However I was wondering if this step could actually be left out by simply not normalizing the image in the dataloader?
@ifeherva It seems you are correct that it is doing the inverse transformation, with a *255 factor missing.
I am not entirely sure what is the reasonning behind the decision of proceeding that way rather than just multiplying the initial image by 255. @Hang_Zhang@zhreshold could you advise on the reason behind this hard-coded rescale layer?
The reason is fairly simple, due to observations in our experiments, different input scales used did affect performances quite a lot. Whether it’s due to initialization scale or pre-trained model is still unknown.
Before we figuring out a generic solution, we use these hard-coded scaling layers for consistency throughout gluon-cv package.
I turned off normalization in the dataloader and used the mxnet resnet50v2 model. Got better recall (still not as good as the gluoncv resnet50). Still, my model is 2-3 times faster at inference time om gpu. The only difference I see is that