MobilenetV1 as Backbone network in Faster RCNN in Gluon


I have done something like this to include mobilenet as base network…using this Repo

But i am getting Map as 0 even after 5 epochs.
I have randomized the weights.
My input image is 31*512… (512*512, This is a different dataset) I want to train the network using both datasets. My first target is to work with 512*512.
I have not changed anything else from the source code…

Please help me out in getting some good MAP… Let me know the mistake i have done in the network…

class MobileNet_mod(nn.HybridBlock):
def  __init__(self, base_model, multiplier=0.25, classes=3, **kwargs):
    super(MobileNet_mod, self).__init__(**kwargs)
    with self.name_scope():
         self.features = nn.HybridSequential(prefix='')
         for layer in base_model.features[:-35]:

def hybrid_forward(self, F, x, *args, **kwargs):
    x = self.features(x)
    #x = self.output(x)
    return x
class MObFastRCNNHead(HybridBlock):

def __init__(self, base_model, num_classes, feature_stride, **kwargs):
    super(MObFastRCNNHead, self).__init__(**kwargs)
    self.feature_stride = feature_stride
    self.bottom = nn.HybridSequential()
    # Include last 2 mobilenet feature layers
    for layer in base_model.features[-2:]:
    self.cls_score = nn.Dense(in_units=128, units=num_classes, weight_initializer=initializer.Normal(0.01))
    self.bbox_pred = nn.Dense(in_units=128, units=num_classes * 4, weight_initializer=initializer.Normal(0.001))

def hybrid_forward(self, F, feature_map, rois):
    x = F.ROIPooling(data=feature_map, rois=rois, pooled_size=(3, 3), spatial_scale=1.0 / self.feature_stride)
    x = self.bottom(x)

    cls_score = self.cls_score(x)
    cls_prob = F.softmax(data=cls_score)  # shape(roi_num, num_classes)
    bbox_pred = self.bbox_pred(x)  # shape(roi_num, num_classes*4)
    return cls_prob, bbox_pred


You need to modify a few files, I have created a fork of the repo and pushed the solution.
After 1 epoch, it reached 0.09 map, after 2 epochs, it reached 0.12 map, so it is training.

FYI the faster rcnn implementation is now maintained in gluoncv

Have a look at my fork:

python ./ --network "mobilenetv2_0.5" --gpus 0 1 2 3

INFO:root:[Epoch 2][Batch 339], Speed: 61.444763 samples/sec, RPNLogLoss=3.177017, RPNSmoothL1Loss=0.499732, RCNNLogLoss=0.674288, RCNNSmoothL1Loss=0.493353
INFO:root:[Epoch 2][Batch 349], Speed: 59.817366 samples/sec, RPNLogLoss=3.171431, RPNSmoothL1Loss=0.499433, RCNNLogLoss=0.672364, RCNNSmoothL1Loss=0.493662

note that I am training on 4 GPUs, if you only train on 1, consider reducing the batchsize and the learning rate in the file.


Thank You so much for this solution… i would work on it and let you know the results. I have one more question on this topic.
What should be changed if i want to use input size of 51231 instead 512512?
should i use less pooling size??


the first naive thing you can try is to pad with 0s so you get 512x512 and see if it is training.
Otherwise yes you’ll have issues because of the dimensionality reduction. At some point your pooling kernels are going to be larger than your feature maps.

FYI after 34 epochs, I reached map of 0.41, so that’s not too bad for such a small model.


If i pad with 0’s… Then i should change the kernel, padding and strides in each convolution layer to fit that data right?? Because the information will be at the centre part of the frame after doing padding to 512*31 image. (i don’t know whether i am right or wrong).

So, How many classes were there in your case??

And whats the Dimension of those images?


I am using the VOC2012 dataset

If you pad your 512x31 dataset with 0s to get 512x512 images, you won’t need to change anything in your network.


But , when we provide the input of size 51231, the input dimensions are getting converted to 600600 after resize step right??
It means that my data will be of size 600*600 to the first convolution layer right ??

do i really need to pad with zero even now??
if yes, how can i do that?


Best advice I can give you is to try and experiment. If your data doesn’t lose too much spatial correlation after being upsampled 15 times on the height dimension, then just roll with it.

to pad with zero the simplest is to create a ndarray with zeros and add your data

padded = mx.nd.zeros((512,512), ctx=ctx)
padded[0:31, :] += input_data

For example. Adjust the shape according to what you are doing. You could do that one batch at a time too.


I have tried doing this. But i did not get any outcome… I have question for you.
Can i use pascal format dataset with .npy files instead of .jpg ? Because i have single channel data (512,512,1). if i save as jpg. then i would get (512,512,3) , which i do not want to do as there will be some loss of information. should i change mean and std anywhere if i use .npy files? or i should keep it as 0 ?