Finetuning YOLO and FRCNN

Hi there,

I am wanting to finetune SSD, YOLO and FRCNN detection models all pretrained with coco, using images of trains from higher angles. I made a custon .rec dataset and followed the gluon tutorial to finetune an SSD model, which has worked.

I am now wondering how to finetune the other models. A few have pointed to this code here for finetuning however I am wondering how can i modify this code so it will accept my .rec data and finetune an already trained coco model?

Thanks, David

Hi David,

If i understand correctly, you have already finetuned SSD, and want to modify the YOLOV3 and Faster RCNN scripts. Also, you are using the RecordFileDetection format. I managed to finetune SSD and YoloV3 with their tutorials, so i can assist with YoloV3 but I am struggling to finetune Faster-RCNN myself.

Basically, there are 2 main steps I have determined so far:

  1. Modify the DataLoader, so that the network’s architecture receives the appropriate targets during training. My dataloader only provides the training data loader, but you can extend it appropriately.
  2. Modify the Training Loop.

To help you get started, I provide my training code for YOLOv3:

DataLoader:

def get_dataloader(net, train_dataset, data_shape, batch_size, num_workers):

    width, height = data_shape, data_shape
    # use fake data to generate fixed anchors for target generation
    batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))  # stack image, all targets generated
    # disable shuffle for now
    train_loader = gluon.data.DataLoader(
        train_dataset.transform(YOLO3PaddedTrainTransform(width, height, net, fill=-1)),
        batch_size, False, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
    return train_loader

# interchangable with .rec format as train_dataset = gcv.data.RecordFileDetection('val.rec', coord_normalized=True)
# I tried RecordFile before determining LstDetection was faster for my setup
train_dataset = gcv.data.LstDetection(GCVRECORD_PATH + f"{DATASET}.lst", ROOT_PATH) 

#obtain the dataloader
train_batcher = get_dataloader(net, train_dataset, data_size, batch_size, 0)

Training Loop:

for epoch in range(start_epoch, end_epoch):
    #timers
    tic = time.time() #currently unused
    btic = time.time() #batch time
    #setup the network into static computation graph for faster computation
    net.hybridize(static_alloc=True, static_shape=True)
    #main training loop, batch
    for i, batch in enumerate(train_batcher):
        #get batch size
        batch_size = batch[0].shape[0]
        #load the gpu context into the data, class targets and box targets
        data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
        fixed_targets = [gluon.utils.split_and_load(batch[it], ctx_list=ctx, batch_axis=0) for it in range(1, 6)]
        gt_boxes = gluon.utils.split_and_load(batch[6], ctx_list=ctx, batch_axis=0)    
        #record the computation steps then perform gradient descent training
        sum_losses = []
        obj_losses = []
        center_losses = []
        scale_losses = []
        cls_losses = []
        with autograd.record():
            for ix, x in enumerate(data):
                obj_loss, center_loss, scale_loss, cls_loss = net(x, gt_boxes[ix], *[ft[ix] for ft in fixed_targets])
                sum_losses.append(obj_loss + center_loss + scale_loss + cls_loss)
                obj_losses.append(obj_loss)
                center_losses.append(center_loss)
                scale_losses.append(scale_loss)
                cls_losses.append(cls_loss)
            autograd.backward(sum_losses)
        #apply the gradients; must be done outside of .record() and after a backward()
        #note that the losses are normalized across the batch; 1 is used instead of batch_size because loss is already normalized 
        trainer.step(batch_size)
        
        # it appears that the loss is normalized across the batch, thus we need to multiply by the batch_size
        # to obtain the individual sample loss
        obj_metrics.update(0, obj_losses)
        center_metrics.update(0, center_losses)
        scale_metrics.update(0, scale_losses)
        cls_metrics.update(0, cls_losses)

        #retrieve the name of the metric and the loss value
        name1, loss1 = obj_metrics.get()
        name2, loss2 = center_metrics.get()
        name3, loss3 = scale_metrics.get()
        name4, loss4 = cls_metrics.get()
        if i % batch_reporter == 0:
            #Note that sum loss can be computed by the mean of (sum_loss[0]*batch_size)
            print('[Epoch {}][Batch {}], LR: {:.7f}, Speed: {:.3f} samples/sec, {}={:.3f}, {}={:.3f}, {}={:.3f}, {}={:.3f}'.format(
                    epoch, i, trainer.learning_rate, batch_size/(time.time()-btic), name1, loss1, name2, loss2, name3, loss3, name4, loss4))
            #print(f"{loss1},{loss2},{loss3},{loss4}")
        btic = time.time()

These were derived from the full training scripts at the end of the tutorial, so you can also seek reference from there. Note that the PaddedTrainTransform is my modification of the DefaultTrainTransform to suit my dataset needs.

If you do manage to get FasterRCNN working, please let me know :slight_smile: , I too am struggling myself and have opened a forum thread here

1 Like

Hi Lee, thanks so much.

I’ll give these a go and will no doubt work on the FRCNN. If i have a break though I’ll let you know!

Thanks,
David

1 Like

thank you
so muchhhhhhh