I am trying to modify notebook 12.7. Single Shot Multibox Detection (SSD) to do multi-class object detection on PascalVOC dataset. I am having trouble getting the predicted class id in prediction response.
I am referring to this code from the notebook:
def predict(X):
anchors, cls_preds, bbox_preds = net(X.as_in_context(ctx))
cls_probs = cls_preds.softmax().transpose((0, 2, 1))
output = contrib.nd.MultiBoxDetection(cls_probs, bbox_preds, anchors)
idx = [i for i, row in enumerate(output[0]) if row[0].asscalar() != -1]
return output[0, idx]
output = predict(X)
def display(img, output, threshold):
d2l.set_figsize((5, 5))
fig = d2l.plt.imshow(img.asnumpy())
for row in output:
score = row[1].asscalar()
if score < threshold:
continue
h, w = img.shape[0:2]
print(row)
bbox = [row[2:6] * nd.array((w, h, w, h), ctx=row.context)]
d2l.show_bboxes(fig.axes, bbox, '%.2f' % score, 'w')
display(img, output, threshold=0.3)
Question:
bbox = [row[2:6] * nd.array((w, h, w, h), ctx=row.context)]
Question 1)
row[2:6] denotes the bounding box. Based on my observation row[1] denotes predicted class’s probability. What does row[0] denote?
Question 2)
How do I get anchor box’s predicted class out of the output?