Non-square input image into ssd

hello,
I want to feed an image into ssd. The image size is non-square and I want to square and feed it into ssd using gcv.data.transforms.presets.ssd.load_test(), but it gets just the short side size. How can I square it withoud using imresize function from mxnet.image()? The gcv.data.transforms.presets.ssd.load_test() command also get the file path and load it. Is there any other command to get a pre-loaded image? I have my image as a numpy array but want to use this function to do normalization and whatever it does.

@kargarisaac, SSD should support by default rectangular images as well. Are you sure you want to crop the image to a square?

You can use this transform to transform your image and put it in your model:

import mxnet as mx
import gluoncv as gcv
import cv2

MIN_SIZE=300
MAX_SIZE=500

mx.test_utils.download('https://helpx.adobe.com/in/stock/how-to/visual-reverse-image-search/_jcr_content/main-pars/image.img.jpg/visual-reverse-image-search-v2_1000x560.jpg', 'test.jpg')
img = cv2.imread('test.jpg')[:,:,::-1]
print('Original',img.shape)
img_tf, img_np = gcv.data.transforms.presets.ssd.transform_test(mx.nd.array(img), short=MIN_SIZE, max_size=MAX_SIZE)
print('NDArray for MXNet', img_tf.shape)
print('Numpy image', img_np.shape)
Original (560, 1000, 3)
NDArray for MXNet (1, 3, 280, 500)
Numpy image (280, 500, 3)

However if you really want to get your image as a square I would suggest to simply write your own transform:

SIZE=300
​
transform = mx.gluon.data.vision.transforms.Compose([
    mx.gluon.data.vision.transforms.Resize(size=SIZE, keep_ratio=True),
    mx.gluon.data.vision.transforms.CenterCrop(SIZE),
    mx.gluon.data.vision.transforms.ToTensor(),
    mx.gluon.data.vision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
    
])
​
mx.test_utils.download('https://helpx.adobe.com/in/stock/how-to/visual-reverse-image-search/_jcr_content/main-pars/image.img.jpg/visual-reverse-image-search-v2_1000x560.jpg', 'test.jpg')
img = cv2.imread('test.jpg')[:,:,::-1]
img_tf = transform(mx.nd.array(img)).expand_dims(axis=0)
​
print('Original', img.shape)
print('Transformed', img_tf.shape)
Original (560, 1000, 3)
Transformed (1, 3, 300, 300)

However this calls the mx.image under the hood. If you really don’t want to use the opencv related function, you can use the mx.nd.contrib.BilinearResize2D operator, however you would need to compute yourself the correct width and height in order to keep the ratio. You would need to do the center cropping yourself, but it’s all pretty simple using the img.shape informations.

you can define a function like this

def resize_and_crop(x):
    ...

and then add it to the pipeline like this:

transform = mx.gluon.data.vision.transforms.Compose([
    mx.gluon.data.vision.transforms.ToTensor(),
    mx.gluon.data.vision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
    mx.gluon.nn.Lambda(lambda x: resize_and_crop(x.expand_dims(axis=0))
])

@ThomasDelteil
Thank you for your answer.
When we train our model with 300*300 input image, can we feed an image with different size into it? I thought maybe this decrease the performance. I’m not sure.

Two things to consider:

  • Within a batch all images must have the same size, in order to be able to stack them and lay them out nicely in memory
  • If you use different sizes, MXNet runs a convolution optimization algorithm for every new shape it encounters, consider disabling it MXNET_CUDNN_AUTOTUNE_DEFAULT=0 and check what performance you get.