Detecting only person using pre-trained YOLO



I am trying to run an object detection on video using pre-trained YOLO models.
net = model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)

Everything was going well but It detected all things. I want to detect only person.
How could I do that?

Thank you.


@emesssii the simplest way is to filter the outputs to only consider the person class.

You can get the index of the person class like this

person_ind = [i for i, cls in enumerate(net.classes) if cls == 'person'][0]

Then you simply loop through your results and keep only the ones that predicted that class


@ThomasDelteil Now I know that ‘person’ index is 14, and what should I do next? :sweat_smile:

# Load the model
net = gcv.model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)

cap = cv2.VideoCapture('test.mp4')
time.sleep(1)  ### letting the camera autofocus

axes = None


    # Load frame from the camera
    ret, frame =

    # Image pre-processing
    frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
    rgb_nd, frame =, short=512, max_size=700)

    # Run frame through network
    class_IDs, scores, bounding_boxes = net(rgb_nd)

    # Display the result
    axes = gcv.utils.viz.plot_bbox(frame, bounding_boxes[0], scores[0], class_IDs[0], class_names=net.classes, ax=axes)


Thank you.


Something like this after the predictions should do the trick:

for i in range(len(class_IDs[0])):
    if class_IDs[0][i].asscalar() != 17.:
        scores[0][i] = 0


Hi, @ThomasDelteil ,
Thanks a lot for your answers.

I met the problems that if I just filter the outputs it will mess up in the next steps: counting people and doing trajectory. So, Is it possible to detect only a person class? Because I have to use the coordinates (x,y) to do trajectory.


Could you just slice the arrays? Using @ThomasDelteil’s suggestion you have the class id for people (14) so you can filter down to just the bounding boxes with people using something as follows:

import numpy as np

# to numpy and take first sample 
class_IDs = class_IDs.asnumpy()[0]
scores = scores.asnumpy()[0]
bounding_boxes = bounding_boxes.asnumpy()[0]

# filter to only people
person_id = 14
selection = np.argwhere(class_IDs  == person_id)[:,0]
class_IDs_subset = class_IDs[selection]
class_IDs_subset = scores[selection]
class_IDs_subset = bounding_boxes[selection]

You can’t do this with a batch of images though, because you might have a different number of people detected in each image (you’d end up with a jagged array), but for a single image at a time this should work okay.


Actually Gluon-cv just released a tutorial and some utilities to make this a breeze,
Check it out, it does exactly what you need.