Gluoncv yolov3 training

Hi,
I can’t make sense of the official gluoncv yoloV3 tuto

In the beginning of the tuto it is said to use:

batchify_fn = Tuple(Stack(), Pad(pad_val=-1))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
                          batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
val_loader = DataLoader(val_dataset.transform(val_transform), batch_size, shuffle=False,
                        batchify_fn=batchify_fn, last_batch='keep', num_workers=num_workers)

then out of nowhere I the training loop in the bottom of the snippet the code change to being this:

train_transform = presets.yolo.YOLO3DefaultTrainTransform(width, height, net)
# return stacked images, center_targets, scale_targets, gradient weights, objectness_targets, class_targets
# additionally, return padded ground truth bboxes, so there are 7 components returned by dataloader
batchify_fn = Tuple(*([Stack() for _ in range(6)] + [Pad(axis=0, pad_val=-1) for _ in range(1)]))
train_loader = DataLoader(train_dataset.transform(train_transform), batch_size, shuffle=True,
                          batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)

What are the differences between those two things that seem to conflict?

Hi Olivier,

I think it has to do with the fact that in the beginning of the tutorial, it’s just going through the basic transformations for the network in inference mode. i.e this transformation only does stuff on the image that you pass in (normalization, to tensor, maybe some random augmentations etc).

However, when you want to train the model, for yolo specifically (and for other obj detection models), you need to generate some more targets for the yolo loss e.g objectness score, scale targets. This is because the yolo loss consists of comparing what the model predicts for each of those (objectness, scale, center) to the targets generated. In order to this, you have to pass the net to the train_transform function, which converts the network to training mode and causes the output of the network to change. Instead of just predicting bounding boxes and class labels, now the network returns the losses.

See the source code for the YoloDefaultTrainTransform for more: https://gluon-cv.mxnet.io/_modules/gluoncv/data/transforms/presets/yolo.html#YOLO3DefaultTrainTransform.