Data Augmentation for Semantic Segmentation


We are training a ResNet-based network for semantic image segmentation.
Input – RGB image. Output/GroundTruth – labels mask.
We are using a RecordIO data iterator and would like to add to it image augmentation (e.g. flip, rotation, etc.)
We know that there is a built-in MxNet tool for augmenting image data. The type of augmentation is randomly selected for every image in a batch.
The question is how we can use it for augmentation in our case, when the same augmentation is to be applied for both input image and ground truth mask?
The images and the ground truth masks are saved as two separate RecordIO files.


Hi, I’m just come up to the same problem. I want to create an augmentation for degree-wise rotation. Is there already any progress made, or is is already possible to implement it with nd arrays?



Hi @gnamor, @dfferstl ,

Currently there’s no rotation augmentation, as for most use cases flips and crops are sufficient. Given augmentation happens on the CPU, you’re free to use another library for this such as OpenCV, and wrap this as a transform function. You should provide this to whichever Gluon Dataset you’re using for images.

You’re doing semantic segmentation so you’ll want to apply the same rotation as the image to the label. We’ve got a tutorial available here that shows how to apply the same augmentation to the image as the label.


Ok, thanks for the quick answer, I’ve already implemented it with opencv affine warp but my intention of using nd arrays was to get GPU support and make it faster and avoiding spending a lot of time for augmentation. But since, as you said, this is not supported with nd arrays as well, there is no reason to use them for augmentation.
Do you know if this feature will be added at some point in the future?


Hey Guys!

The way I implemented simultaneous data augmentation of image and mask for image segmentation task works as follows:

  1. I changed the algorithm of crops selection to enable selecting crops with original aspect ratio with respect of the original image (lets say 640x480) instead of selecting random square crops.
  2. I packed the RGB image and a mask as an RGBA image and when a random rotation is chosen inside C++ code, I applied it both on image or on the mask. There are at least two options here: one is applying the rotation on the whole RGBA image. OpenCV supports it. The other is to split to RGA and Alpha and rotate RGB with Bilinear interpolation and mask(Alpha) with NN. (Nearest neighbour).
    You can then use regularly the class from python MXNet and define the rotation and crop range there.
    The file which I changed is: src/io/
    There is only a requirement to pack image+mask together as one RGBA file and this is quite easy by using the regular RecordIO pack on RGBA image instead of RGB image.
    We were able to use these on the fly augmentation during training, and we were able to scale linearly to 8 GPU’s for training along with these augmentations.


@cheboris Cheers for pointing out the rotation augmentation (max_rotate_angle) in, I didn’t realize this was implemented, as rotation isn’t one of the Augmenter classes and it’s not included as a Gluon transformation quite yet.

@dfferstl if you’re using Gluon, you can set the num_workers of to the number of CPUs, to take advantage of multiprocessing for your augmentation. GPU versions might not necessarily be much quicker for rotation augmentation.


@thomelane Thanks ! I am not using Gluon, but MXNet API. The augmentation is running multithreaded in MXNet when using ImageRecordIter, and I am using the number of my cpus as num workers.


Hi guys,

any tips on performance? The bottleneck on my training is data augmentation (for semantic segmentation). I’ve written custom algorithms based on scikit-image, and I have parallelized the application of functions to images in the batch using pathos library. However when I try to do random rotations/shifts/zooms it’s slow. I am using cubic interpolations for this (so far … actually faster than quadratic currently on skimage), as I’ve found they give better results with respect to preserving masks. Am experimenting to see if I can lower it.

The basis of my transforms of numpy arrays is scikit.transform.warp.

Many thanks,

edit: with some first tests I see huge speed up with opencv, getting there …


@feevos I would indeed suggest using either opencv or if you can, have a look at the gluon Transform package or the mx.image package that has some transforms available there as well. I believe you are already aware of that, but make sure you are using num_workers >> 1 on your dataloader.

The next step if it is still too slow is to pre-process your data offline. First make sure you are resizing your images close to what you need, resizing large images is usually a typical CPU hog. Next step if resizing is not enough, is to prepare N versions of your dataset with your transforms and cycle through them and load them already processed using for example the ImageFolderDataset.