Hi,

I need to implement a simpler version of the loss in the following papers (using gluon):

- Associative Embedding: End-to-End Learning for Joint Detection and Grouping
- Semantic Instance Segmentation via Deep Metric Learning

The first one has a pytorch implementation here https://github.com/umich-vl/pose-ae-train.

The network is a segmentation network (U-net) and its output is an embedding vector for each pixel in the input image.

For training we have a “tag” label for each of the pixels in the output of the network. If two tags are the same, then we want the L2 distance of their embedding vectors to be close to zero. If the tags are not the same we want the L2 distance between the embedding vectors to be big.

The loss input is

- network predictions (embedding vector per pixel):
- feature map of size:
`[N] x [channels] x [height] x [width]`

- feature map of size:
- labels “tags” mask:
- array of ints of size:
`[N] x [width] x [height]`

- array of ints of size:

To practically calculate the loss we would like to sample a small set of pixels `(x1, x2, x3, ...)`

from the network output feature map and to apply loss to pairs of pixels according to their tag values. If the tags values are the same we apply `L2_loss(x1-x2)`

, if not we apply `Exp(-L2_loss(x1-x2))`

.

Can anyone point me to where to start?