Single Shot Multibox Detection (SSD)

http://en.diveintodeeplearning.org/chapter_computer-vision/ssd.html

1 Like

Hi, Can we ask questions about the tutorial here or should I be creating a new thread?(if so where?)

If the topic is relevant to SSD, please post it here. Or you can post to the other related topics.

1 Like

My question is, how are bounding-boxes adjusted during training?
Are anchor boxes created randomly each time? or are they just created at fixed coordinates ?
I know the network learns the offset with these anchor boxes(against the actual ground truth bbox), but how is that done?
This seems to be only possible if anchor boxes are fixed am I wrong ?

Do we still create anchor boxes in the test time and use NMS to choose the best one or will the network simply produce final coordinates for each object it finds (and or are they offsets that we need to scale to get final coordinates)?

Also I noticed, the predicted bounding boxes are simply subtracted from the ground-truth labels, without being scaled! Are ground-truth labels scaled? since as far as I know, the coordinates are relative to each featuremap, and in order to get the actual coordinates on the real image, they need to be scaled based on the actual image dimensions (this was explained in previous sections I believe)
However, I dont see anything like this being done here!

Iā€™d greatly appreciate any clarification in this regard, as this part is missing in this section.

1 Like