Is this a correct way to prepare custom data for yolo v3 detector?

My solution

  1. convert the data to rec format as this post mentioned

  2. Change the get_dataset function to following(training set and validate set are pikachu_train.rec of this example)

    def get_dataset(args): 
     train_dataset = gcv.data.RecordFileDetection(args.train_dataset)
     val_dataset = gcv.data.RecordFileDetection(args.validate_dataset)
     classes = read_classes(args)//this function read the classes from a txt file
     val_metric = VOC07MApMetric(iou_thresh=0.5, class_names=classes)
     
     if args.num_samples < 0:
         args.num_samples = len(train_dataset)
     if args.mixup:
         from gluoncv.data import MixupDetection
         train_dataset = MixupDetection(train_dataset)
     return train_dataset, val_dataset, val_metric
    

Other things are same as train_yolo3.py

The training results looks fine(?), although not as good as ssd, wonder if I commit any bugs.

ps : validate function don’t work at all, finding a way to solve it. Error message is
“ValueError: zero-dimensional arrays cannot be concatenated”

Hi @stereomatchingkiss,

One reason for the drop in performance compared to SSD could be due to the lack of augmentation being applied. I see that in the SSD example you’ve linked to SSDDefaultTrainTransform is used. You could try YOLO3DefaultTrainTransform for your case.

As for the error, it would be great if you could provide the full stack trace. You mentioned that you’re using pikachu_train.rec for the validation set, is this what you intended?

Yes, I use YOLO3DefaultTrainTransform in my case. Change the detection size to 608 can detect more pikachu, but still lack than SSD, maybe one of the reason is pikachu are too small and yolov3 is bad at detect small object compare with ssd.

def get_dataloader(net, train_dataset, val_dataset, data_shape, batch_size, num_workers, args):
    """Get dataloader."""
    width, height = data_shape, data_shape
    batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))  # stack image, all targets generated
    if args.no_random_shape:
        print("no random shape")
        train_loader = gluon.data.DataLoader(
            train_dataset.transform(YOLO3DefaultTrainTransform(width, height, net, mixup=args.mixup)),
            batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
    else:
        print("with random shape")
        transform_fns = [YOLO3DefaultTrainTransform(x * 32, x * 32, net, mixup=args.mixup) for x in range(10, 20)]
        train_loader = RandomTransformDataLoader(
            transform_fns, train_dataset, batch_size=batch_size, interval=10, last_batch='rollover',
            shuffle=True, batchify_fn=batchify_fn, num_workers=num_workers)
    val_batchify_fn = Tuple(Stack(), Pad(pad_val=-1))    
    val_loader = gluon.data.DataLoader(
        val_dataset.transform(YOLO3DefaultValTransform(width, height)),
        batch_size, True, batchify_fn=val_batchify_fn, last_batch='keep', num_workers=num_workers)
    return train_loader, val_loader
Traceback (most recent call last):
  File "train_yolo3_custom.py", line 364, in <module>
    validate(net, val_data, ctx, eval_metric)
  File "train_yolo3_custom.py", line 181, in validate
    eval_metric.update(det_bboxes, det_ids, det_scores, gt_bboxes, gt_ids, gt_difficults)
  File "C:\Users\yyyy\Anaconda3\lib\site-packages\gluoncv\utils\metrics\voc_detection.py", line 107, in update
    gt_bboxes, gt_labels, gt_difficults]]):
  File "C:\Users\yyyy\Anaconda3\lib\site-packages\gluoncv\utils\metrics\voc_detection.py", line 106, in <listcomp>
    *[as_numpy(x) for x in [pred_bboxes, pred_labels, pred_scores,
  File "C:\Users\yyyy\Anaconda3\lib\site-packages\gluoncv\utils\metrics\voc_detection.py", line 97, in as_numpy
    return np.concatenate(out, axis=0)
ValueError: zero-dimensional arrays cannot be concatenated

My goal is make the script support any rec file, I use pikachu_train.rec in this post because I want to make sure the data is fine.

Full codes are put here pastebin

Thanks for your help

Given you’re using YOLO v3 I’d expect the opposite actually! It uses a Feature Pyramid Network which is supposed to give improved performance on small objects.

Many thanks for sharing your code by the way. I’ll try running it and get back to you. Cheers, Thom

Weird, maybe training part got some bugs.

Thanks too. By the way, following is the command I use

python train_yolo3_custom.py --epochs 1 --lr 0.0001 --train_dataset pikachu_train.rec --validate_dataset pikachu_train.rec --classes_list pikachu_list.txt --batch-size 4

In order to make the training codes work, I comment out codes of validation
You can saw the results apply on “pikachu_test.jpg” by enable following codes(last 4 lines)

x, image = gcv.data.transforms.presets.yolo.load_test('pikachu_test.jpg')
cid, score, bbox = net(x)
ax = viz.plot_bbox(image, bbox[0], score[0], cid[0], class_names=classes)
plt.show()

You can download the pikachu_train.rec by following codes

url = 'https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/dataset/pikachu/train.rec'
idx_url = 'https://apache-mxnet.s3-accelerate.amazonaws.com/gluon/dataset/pikachu/train.idx'
download(url, path='pikachu_train.rec', overwrite=False)
download(idx_url, path='pikachu_train.idx', overwrite=False)

pikachu_list.txt only has one line of text

pikachu

Do some change to the training options, now the results can compete with ssd.

Put the codes after update at pastebin

The command I use

python train_yolo3_custom.py --epochs 10 --lr 0.001 --train_dataset pikachu_train.rec --validate_dataset pikachu_train.rec --classes_list pikachu_list.txt --batch-size 8 --no-random-shape

Notes :

  1. yolo3 converge slower compare with ssd(same learning rate, 0.001) and the random-shape eat many memory
  2. if random-shape was on, it will eat a lot of memory and the learning rate need to smaller(0.0001), else the loss will be nan

About validate function, I am still finding a way to make it work, if possible I do not want to manually manipulate the array but use the function in the library.

Edit : Find out the bug of validate

Looks like validate is a bug of gluoncv(I am using gluoncv on windows), my solution is

  1. Copy voc_detection on github
  2. Change the file name to voc_detection_2.py
  3. Move it to the folder of gluoncv.utils.metrics(mine is C:\my_folder\Anaconda3\Lib\site-packages\gluoncv\utils\metrics)
  4. Change the codes from from gluoncv.utils.metrics.voc_detection import VOC07MApMetric to from gluoncv.utils.metrics.voc_detection_2 import VOC07MApMetric

Glad you manage to get competitive results! Still need me to run the code?

Was there a particular Github issue you found referencing the validation issue?
If not, it might be a good idea for us to add one for this.

And an alternative to copying files from the repository and renaming is to install the nightly build using:

pip install gluoncv --pre --upgrade

Thanks, I think don’t need anymore by now

The issue is as_numpy function, original implementation did not consider the case when the array shape
not able to concatenate

def as_numpy(a):
            """Convert a (list of) mx.NDArray into numpy.ndarray"""
            if isinstance(a, (list, tuple)):
                out = [x.asnumpy() if isinstance(x, mx.nd.NDArray) else x for x in a]
                out = np.array(out)
                return np.concatenate(out, axis=0)
            elif isinstance(a, mx.nd.NDArray):
                a = a.asnumpy()
            return a

It should change to

def as_numpy(a):
            """Convert a (list of) mx.NDArray into numpy.ndarray"""
            if isinstance(a, (list, tuple)):
                out = [x.asnumpy() if isinstance(x, mx.nd.NDArray) else x for x in a]
                try:
                    out = np.concatenate(out, axis=0)
                except ValueError:
                    out = np.array(out)
                return out
            elif isinstance(a, mx.nd.NDArray):
                a = a.asnumpy()
            return a

just catch the exception and the problem could be solved

Thanks, but I would prefer to stick with the “stable” version

Thank you for your great conversation. I have a question about SSDDefaultTrainTransform or YOLO3DefaultTrainTransform. Do they do the augmentation? Is it possible to select which data augmentation? I saw some different (maybe parallel) functions to do augmentation such as CreateDetAugmenter. is there any tutorial or example to show how to use data augmentation in object detection?
thanks