sizes argument of the
MultiBoxPrior defines a set of square bounding boxes. The size is defined as a % of total w and h.
ratios argument defines a set of ratios to apply to the first
sizes element (sizes). As far as I can tell, the first ratio element is ignored, always good to set it to 1 then.
>1 ratios define horizontal rectangles.
<1 ratios define vertical rectangles.
For example if you have
sizes = [0.2, 0.5]
ratios = [1, 4, 0.25]
You will get 4 anchor boxes.
- A square anchor box of [0.2, 0.2] size
- A square anchor box of [0.5, 0.5] size
- A rectangular anchor box of [0.1, 0.4] size
- A rectangular anchor box of [0.4, 0.1] size
(using the code provided in the chapter 8 in object detection and the above values you get the following result)
To answer your second question, for each anchor box, we predict the real w and h in [0,1], and the offsets dw, dh from the center of the anchor box in [0, 1]. That way even if we have only a 16x16 feature map, we can predict a bounding box that is (h,w,dw,dh) (0.123, 0.432, 0.12, 0.23) in the original image for example.
Alternatively, there is another implementation of SSD and Faster-RCNN in pure gluon that you can find available here:
Note they follow different conventions on ratio etc, so have a look at the code directly to understand how the anchor box are calculated.