The `sizes`

argument of the `MultiBoxPrior`

defines a set of square bounding boxes. The size is defined as a % of total w and h.

The `ratios`

argument defines a set of ratios to apply to the first `sizes`

element (sizes[0]). As far as I can tell, **the first ratio element is ignored**, always good to set it to 1 then.

`>1`

ratios define horizontal rectangles. `<1`

ratios define vertical rectangles.

For example if you have

```
sizes = [0.2, 0.5]
ratios = [1, 4, 0.25]
```

You will get 4 anchor boxes.

- A square anchor box of [0.2, 0.2] size
- A square anchor box of [0.5, 0.5] size
- A rectangular anchor box of [0.1, 0.4] size
- A rectangular anchor box of [0.4, 0.1] size

(using the code provided in the chapter 8 in object detection and the above values you get the following result)

To answer your second question, for each anchor box, we predict the real w and h in [0,1], and the offsets dw, dh from the center of the anchor box in [0, 1]. That way even if we have only a 16x16 feature map, we can predict a bounding box that is (h,w,dw,dh) (0.123, 0.432, 0.12, 0.23) in the original image for example.

Alternatively, there is another implementation of SSD and Faster-RCNN in pure gluon that you can find available here:

https://gluon-cv.mxnet.io/build/examples_detection/index.html

Note they follow different conventions on ratio etc, so have a look at the code directly to understand how the anchor box are calculated.