YOLOv3 evaluation on custom dataset

Hi there, I prepared a custom dataset of vehicle detection using voc2007 format. It can be correctly read using VOCLike() API, but when I use the eval_yolov3.py sample code to evaluate the result, my MAP went extremely low. I cannot find out the reason. Any help would be appreciated, Thanks.

this is code I used to load and visualize one sample in my dataset

visualization result

detection result using demo_yolo.py

evaluation result using eval_yolo.py