MXNet terminated by ImageRecordIter augmentations


#1

Hi, I’m training a gluon resnet in the AWS SageMaker MXNet 1.3 container. At some point I’m using some augmentation via the mxnet.io.ImageRecordIter https://mxnet.incubator.apache.org/api/python/io/io.html.

When using this, things train correctly:

def get_data(path, augment, num_cpus, batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
    
    return mx.io.ImageRecordIter(
        path_imgrec=path,
        resize=resize,
        data_shape=data_shape,
        batch_size=batch_size,
        rand_crop=augment,
        #random_resized_crop=augment,
        #max_rotate_angle=25,
        #max_aspect_ratio=0.2,
        #max_shear_ratio=0.2,
        #brightness=0.2,
        #contrast=0.2,
        #saturation=0.2,
        #pca_noise=0.2,
        rand_mirror=augment,
        preprocess_threads=num_cpus,
        num_parts=num_parts,
        part_index=part_index)

When using the thing below (couple extra augmentations), the whole thing errors, logging a

def get_data(path, augment, num_cpus, batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
    
    return mx.io.ImageRecordIter(
        path_imgrec=path,
        resize=resize,
        data_shape=data_shape,
        batch_size=batch_size,
        rand_crop=augment,
        random_resized_crop=augment,
        max_rotate_angle=25,
        max_aspect_ratio=0.2,
        max_shear_ratio=0.2,
        brightness=0.2,
        contrast=0.2,
        saturation=0.2,
        pca_noise=0.2,
        rand_mirror=augment,
        preprocess_threads=num_cpus,
        num_parts=num_parts,
        part_index=part_index)

the error is

  terminate called recursively
  terminate called after throwing an instance of 'dmlc::Error'

Let me know if the question is more appropriate for AWS. Cheers


#2

this specific parameter random_resized_crop=True is killing the kernel


#3

I’m not sure why you don’t get the full error, maybe it’s related to SageMaker not showing you the full error. When I run your call, I get this errors:

mxnet.base.MXNetError: [18:13:22] src/io/image_aug_default.cc:363: Check failed: param_.min_random_scale == 1.0f && param_.max_random_scale == 1.0f && param_.min_crop_size == -1 && param_.max_crop_size == -1 && !param_.rand_crop 
Setting random_resized_crop to true conflicts with min_random_scale, max_random_scale, min_crop_size, max_crop_size, and rand_crop.

This is because you cannot have both random_resized_crop and rand_crop set to True. Removing rand_crop=augment, line should fix your problem. The documentation, misleadingly, states that rand_crop is ignored when random_resized_crop is True, but, as can be seen in this code, the code checks for incorrect configuration and returns an error if conflicting params are set.