Bounding boxes ratio and scales in MutliboxPrior and MultiboxTarget function

So I have am trying to implement SSD and have various anchor sizes as defined below.

SSD_ANCHOR_SIZE = [[0.03, 0.1], [0.04, 0.1], [0.05, 0.1], [0.06, 0.1], [0.07, 0.1], [0.08, 0.1], [0.085, 0.1], [0.09, 0.1],
[0.095, 0.1], [0.1, 0.1], [0.11, 0.1], [0.12, 0.1], [0.13, 0.1], [0.14, 0.1], [0.15, 0.1]]
SSD_ANCHOR_RATIO = [[1, 1.1, 1.2], [1, 1.5, 1.75, 2], [1, 2.5, 3],[1, 3.5, 4, 4.5], [1, 4.5, 5, 5.5, 6], [1, 6.5, 7, 8],
[1, 7, 7.5, 8], [1, 8, 9], [1, 8, 8.5, 9], [1, 9, 9.5, 10], [1, 10, 10.5, 11, 12],
[1, 10, 11, 12, 13, 14], [1, 12 ,13, 14, 15, 16, 17], [1, 16, 17, 18, 19], [1, 17, 17.5, 18, 19]]

Specification of boxes generated by these anchor size and ratio is that either proposed boxes are square or proposed boxes should have height ranging from 45 px to 55px and width ranging from 50px to 1000px ranged at about 20px each on image size of 1650x1650px.

So in summary, bounding boxes proposed my MultiboxPrior function of Gloun would generate bounding boxes that mimics a line (horizontal rectangle). This rectangle would have size as described above.

But when I train the network and use MultiboxTarget function and visualize the bounding boxes, I do not get any bounding boxes that are or horizontal orientation or square. I get all bounding boxes that are of vertical orientation.

So the question is how is this possible. Because my anchor size and ratio allows for bounding box to be either horizontal rectangle or a perfect square. Am I missing something? Does SSD work differently?

Hi @aakashpatel,

Anchor boxes are just starting points from which the network learns the adjustments required to surround an object. So it’s certainly possible for the output bounding boxes to be portrait when the anchor boxes are landscape, but raises the question why this is happening.

Are you objects actually vertical? In which case why aren’t you adding vertical anchor boxes. Smaller adjustments are easier to learn, so model performance should increase.

Also, your specification of anchor boxes is a little different to what I’ve seen before. With MultiBoxPrior you specify n shapes and m aspect ratios, and for every pixel you get n+m-1 anchor boxes generated. Can you plot the anchor boxes for a given pixel? And then plot the bounding boxes returned by the model for that same pixel? It might help diagnose the problem. See this tutorial for an example of how to do this.

Hi @thomelane

Thank you for clarification. I am trying to do object detection for printed text, hence all my objects have fixed vertical size but variable horizontal size.

Using the tutorial I came up with varying size of bounding boxes that have fixed vertical size and variable horizontal size.

I have followed same tutorial, in my case I see decline in the classification loss but my L1 loss (for bounding box prediction) some how doesn’t reduce after certain point. Surprising factor is that L1 loss is as low as 0.000x since starting point which doesn’t make sense.

And all bounding boxes that are produces by MultiboxTarget somehow has same starting point which is left top of the page and variable vertical size output. For visualization purpose please see my this question. I have illustrated my loss fucntion graph and sample result of produced by my network after 100 epoch

So what I am trying to do is debug this issue. And one question that came naturally was that why is it producing vertical box, your explanation is clear but output result is very strange and it doesn’t go with the intuition of learning horizontal boxes.

I’d reduce the complexity of your problem to diagnose the issue, try a much smaller number of anchor box variants to begin with. And to confirm, you’re creating these anchor boxes across the whole image? And not just once in the top corner of the image. And limit the task to a single class (also background). And then plot predicted bounding boxes over time.

Check out GluonCV for this task too. See this example.

@thomelane Thanks. I have plotted the result and visualized it before. As it can be see here.

But for sure now I will try to get it working on 1 single class before moving to advanced version. But from my understanding the problem lies somewhere else. But I would confirm that my network works fine on small dataset before moving up.