I am using Gluon’s pretrained object detection model. I was wondering if there is a way to gather the data for what objects were detected by the algorithm. I want to write a computer program that will do something based on what objects are detected in the photo, without me seeing the photo. I tried some print statements of class_IDs, scores, class_IDs etc. but none give much useful information. Interestingly, printing class_IDs prints a nested list composed of numbers 0 or -1 inside individual brackets. Online, it says the class_IDs variable holds the predicted class IDs detected by the model… that doesn’t seem to align. If anyone has any advice please let me know! Thank you!
Yes, this is certainly possible. Can you link to the specific model you’re using here? I presume it’s from GluonCV. One of the most important factors when using pre-trained models (without fine-tuning) is the dataset that was used for pre-training, since that determines the number and type of classes detected by the model. I recommend model’s pre-trained on COCO.
When you run the network on an image, a tuple of 3 arrays will be returned which, as you correctly point out, will be #1 class ids, #2 scores, and #3 bounding boxes. You can use the class ids array to determine the class of the object. An image with 3 detected objects, might return something like…
[5, 14, 3, -1, -1, -1, ...]
Our model predicts objects with class indexes of 5, 14 and 3. -1 is just use to pad the array when no more objects have been found. You can use
net.classes (i.e. the classes property of your network) to get a list of class labels and then use this to find out what class has been detected by the model: e.g.
Hi @thomelane! Thank you so much for your help. I fully agree with what you’re saying and would expect the same to happen. I run into confusion when running this gluon cv objects detection tutorial: https://gluon-cv.mxnet.io/build/examples_detection/demo_faster_rcnn.html . When I write the line print(box_ids) on the second to last line (before plt.show()) in the code given in the tutorial, I get this array in return:
<NDArray 1x6000x1 @cpu(0)>
This would imply that there are no detected objects which is not the case. I notice that this is a nested array, which seems not logical either. Let me know what your thoughts, I really appreciate your help!
Please try this code.
import gluoncv from gluoncv import model_zoo, data, utils net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True) im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' + 'gluoncv/detection/biking.jpg?raw=true', path='biking.jpg') x, orig_img = data.transforms.presets.rcnn.load_test(im_fname) box_ids, scores, bboxes = net(x) n_box = box_ids.shape for n in range(n_box): if box_ids[n].asscalar() != -1: print('id = %d, score=%f'%(box_ids[n].asscalar(), scores[n].asscalar()))