I am trying to run an object detection on video using pre-trained YOLO models.
net = model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)
Everything was going well but It detected all things. I want to detect only person.
How could I do that?
Thank you.
@emesssii the simplest way is to filter the outputs to only consider the person class.
You can get the index of the person
class like this
person_ind = [i for i, cls in enumerate(net.classes) if cls == 'person'][0]
Then you simply loop through your results and keep only the ones that predicted that class
2 Likes
@ThomasDelteil Now I know that ‘person’ index is 14, and what should I do next?
# Load the model
net = gcv.model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)
cap = cv2.VideoCapture('test.mp4')
time.sleep(1) ### letting the camera autofocus
axes = None
while(cap.isOpened()):
# Load frame from the camera
ret, frame = cap.read()
# Image pre-processing
frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
rgb_nd, frame = gcv.data.transforms.presets.ssd.transform_test(frame, short=512, max_size=700)
# Run frame through network
class_IDs, scores, bounding_boxes = net(rgb_nd)
# Display the result
plt.cla()
axes = gcv.utils.viz.plot_bbox(frame, bounding_boxes[0], scores[0], class_IDs[0], class_names=net.classes, ax=axes)
plt.draw()
plt.pause(0.0001)
cap.release()
cv2.destroyAllWindows()
Thank you.
Something like this after the predictions should do the trick:
for i in range(len(class_IDs[0])):
if class_IDs[0][i].asscalar() != 17.:
scores[0][i] = 0
2 Likes
Hi, @ThomasDelteil ,
Thanks a lot for your answers.
I met the problems that if I just filter the outputs it will mess up in the next steps: counting people and doing trajectory. So, Is it possible to detect only a person class? Because I have to use the coordinates (x,y) to do trajectory.
Could you just slice the arrays? Using @ThomasDelteil’s suggestion you have the class id for people (14) so you can filter down to just the bounding boxes with people using something as follows:
import numpy as np
# to numpy and take first sample
class_IDs = class_IDs.asnumpy()[0]
scores = scores.asnumpy()[0]
bounding_boxes = bounding_boxes.asnumpy()[0]
# filter to only people
person_id = 14
selection = np.argwhere(class_IDs == person_id)[:,0]
class_IDs_subset = class_IDs[selection]
class_IDs_subset = scores[selection]
class_IDs_subset = bounding_boxes[selection]
You can’t do this with a batch of images though, because you might have a different number of people detected in each image (you’d end up with a jagged array), but for a single image at a time this should work okay.
1 Like
Actually Gluon-cv just released a tutorial and some utilities to make this a breeze,
https://gluon-cv.mxnet.io/build/examples_detection/skip_fintune.html
Check it out, it does exactly what you need.
1 Like