Would yolov3 and other object detect network gain better performance if I reduce the objects need to detect?


#1

What I mean is, the pretrained network of gluoncv, like yolo3_darknet53_coco, can detect 80 kind of objects, but in the real world, I do not need to detect so many things. Is it possible to improve the performance if I reduce the objects need to detect when training?

Example, if I remove aeroplane, boat, bird, bear, elephant, wine glass and other’s objects, would it improve the mAP, F1 score of other objects do not remove(ex : person, car, truck, bus etc)?

Do anyone did this kind of experiments?Thanks


#2

Hi @stereomatchingkiss,

I haven’t seen such experiments, but that’s an interesting idea. I’d expect a different effect for the two types of loss you have for object detection, so it depends on whether detection or classification is more important for you.

  1. Object detection

Are you talking about removing the objects at the pre-training stage or the fine-tuning stage? I would expect worse performance if you remove data from other classes from pre-training, because even this data can be useful to learning how to perform object detection more generally. Similar to (but not the same as) the idea of semi-supervised learning.

  1. Classification

I’d expect better performance here, just because you’ve simplified the problem. Sampling from classes at random would be correct more often with less classes to select from.

I’d be really interested to see your results if you do try this experiment.


#3

Tried with 6 classes on coco data sets, I removed all of the images do not contains those 6 classes from training set and validate set, the results are not excited at all.
I train the network with

  1. 25 epochs
  2. enable random shape
  3. enable mixup
  4. learning rate begin with 0.0001, drop to 0.00001 at epoch 12, drop to 0.000001 at epoch 18
  5. Other hyper-parameters are using default parameters, the scripts put at github

The mAP at IOU 0.5 with 6 classes are

bicycle = 0.5109787835003734
car = 0.6040473095262737
motorcycle = 0.6899253562204593
bus = 0.7718191967075589
train = 0.8497070977584869
truck = 0.5477797650304865
mAP = 0.6623762514572732

The mAP of the model zoo of these classes are

bicycle = 0.4963655879178486
car = 0.5742504924554275
motorcycle = 0.6583807437693533
bus = 0.7768372808061054
train = 0.8552888916750468
truck = 0.5550842642475577
mAP = 0.6527012101452232

Looks like it did not help much, maybe train with more epochs, do not lower down the learning rate so fast can make the results better, but since my gpu is gtx1060 only, the prices of experiments are very high for me(wait too long)

Planning to buy a new card, waiting gtx11 series release, I hope they can remove the ray tracing cores(most of the users do not need those cores), make the prices lower or add more vram, rtx are overprices.