Segmentation tutorial outputs all masks as single colour

Hi,
I’ve gotten segmentation sample code working having followed the tutorial

Sample Code

I’m confused about how to control the level of mask instances generated by model. For example, if I have an image with two people and a bird I generate a mask that shows all three objects; however, the two people will be assigned the same colouring - so if they are overlapping I can’t distinguish between them. The bird is assigned another colour.

Looking at the code I see:

##############################################################################

Load the pre-trained model and make prediction

----------------------------------------------

get pre-trained model

model = gluoncv.model_zoo.get_model(‘fcn_resnet50_voc’, pretrained=True)

##############################################################################

make prediction using single scale

output = model.evaluate(img)
predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy()

##############################################################################

Add color pallete for visualization

from gluoncv.utils.viz import get_color_pallete
import matplotlib.image as mpimg
mask = get_color_pallete(predict, ‘pascal_voc’)
mask.save(‘output.png’)

So the colourings are assigned from the get_color_pallete function. How would I go about iterating the detected instances so I might output (for example) one object mask per image? i.e although it found [person, person, bird] I might like to only generate a mask for the largest person detected?

Very new to this so looking through the documentation, but any pointers with this would be much appreciated.

Hi @davmx,

You’re looking at an example of Semantic Segmentation, where each pixel is classified as a certain class (including background class), but individual objects are not separated. Check out ‘Fully Convolutional Networks for Semantic Segmentation’ for more details of this particular model. You’d need to design the system that works with this segmentation map and finds the largest person.

What you need is Object Detection, which aims to identity individual objects. You can find examples of this in Gluon CV here. Given bounding boxes, you can then calculate the approximate area and find the largest person for example. Although you lose the exact outlines of the objects.

If exact outlines of individual instances are important for your application, what you really need is an Instance Segmentation model. I don’t think there are any Instance Segmentation models in the Gluon CV library quite yet, but there are a few other references for this. Check out ‘Mask R-CNN’, and the MXNet implementation by TuSimple. It looks like they only provide pre-trained models on the Cityscapes dataset, which may work for people as it’s a class in the dataset, but if your images don’t look like dashcam footage I wouldn’t expect good performance. You’d need to train the model on a more appropriate dataset, possibly MS COCO.

Hi,

Thanks for the advice - looks like I’ve got some reading up to do! I’ll definitely be wanting to keep the exact outlines so think Mask R-CNN will be my next thing to try out.

As an aside, you mention the semantic segmentation identifying a background class in the image. How might one go about assigning the background class a solid color? (Or you simply invert the resultant mask of all other classes I guess)

One of your classes in your model will be the background class. So you can treat it just like any other class. Usually you’d set the color to black, but you can choose something else.

And it’s also common to visualize only the non background classes and overlay them over the original image, something like: