Any example show us how to train the FCN/PSP/DeepLab segmentation network by custom data?

Hi @stereomatchingkiss,

the PSPNet has output two layers, that they do the same job, i.e. try to classify the same output. This is evident from the definition of the MixSoftmaxCrossEntropyLoss2D where they use the same label, for both prediction outputs (see the hybrid_forward call). (One of) the reason(s) of using multiple loss functions, at different levels of the network has to do with vanishing gradients - especially when the goal is the same in both tasks (see An Overview of Multi-Task Learning for Deep Learning for an overview on multitasking). If you look at this slide, you’ll notice that each vector (which represents the gradient flow) becomes dashed at some point. This represents the fact that if the network is too deep, the terms in the partial derivative chain rule get greater and the gradients diminish. This is like trying to calculate the exponential (something_small^n). The larger the n the smaller the output number, assuming something_small < 1. According to the chain rule, the more layersr you have in the network, the larger is the number of the products in the partial derivative. Then by using complementary losses at different depths “keeps the gradients flowing”. Caution needs to be taken on how you combine different loss functions (hence the term self.aux_weight):

 def _aux_mixup_forward(self, F, pred1, pred2, label1, label2, lam):
        """Compute loss including auxiliary output"""
        loss1 = self._mixup_forward(F, pred1, label1, label2, lam)
        loss2 = self._mixup_forward(F, pred2, label1, label2, lam)
        return loss1 + self.aux_weight * loss2
  1. get_segmentation_dataset: If you look at the definition in the source code, you will see that this function only returns a predefined dataset. So you just need to create your own custom dataset (subclass the gluon.data.DataSet class), you can find an example here and you can see an analog using hdf5 files here on how to create your own dataset based on a set of (images, masks). For your custom dataset. Then, as long as the output of the model has the correct number of channels (i.e. number of classes), the algorithm will learn on it’s own the mapping of each of the channels to a specific class. That is, it will learn for example that the mask car is always in the first channel (if in the ground truth in the 1hot representation this is the case).

  2. I understand that it is difficult to port the examples, I’ve crossed the same path many times. My best advice - from my experience - is to try and write everything on your own, using the examples as a road map. It is time consuming, I know, but I am afraid I cannot help more on this as it requires a substantial portion of my time (have duties at work).

The loss function is not used during inference, therefore I do not understand this question?

Hope the above help,
All the best

2 Likes