Potential Bug using nd.tile after Convolutional Layers

I’ve encountered an issue using mxnet.ndarray.tile to upscale CNN embeddings so that they may be used as features in another convolutional map.

The issue is reproduced in the code below. First, embeddings are computed for a context set using a CNN; these embeddings are tiled to match their original shape and then appended as channels to a query set. Finally, a CNN is used to make predictions from the query set.

## Setup ##

import numpy as np
import mxnet as mx
import mxnet.gluon as gluon
import mxnet.gluon.nn as nn

embedding_block = nn.Sequential()
# make a small CNN to embedd the "context"
embedding_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu', padding=(2,2)))
embedding_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2))

# make a CNN classifier for the query set
query_block = nn.Sequential()
query_block.add(nn.Conv2D(channels=6, kernel_size=5, strides=1, activation='relu'))
query_block.add(nn.AvgPool2D(pool_size=(2,2), strides=2))


## Data Generation ##

# create a simple squared loss problem
w = np.random.normal(size=(28,28))

# features should be multi-channel images
features = np.random.normal(size=(200, 6, 28, 28))
temp = np.sum(w * features, axis=(1,2,3))
targets = np.sign(np.add(temp[:100], temp[100:]))
context_features = mx.nd.array(features[:100])
query_features = mx.nd.array(features[100:])
targets = mx.nd.array(targets)

## Data Generation ##

loss = 0.
with mx.autograd.record():
    # Add features via nd.tile
    context_embedding = mx.nd.sum(embedding_block(context_features), axis=0)
    channel = context_embedding.tile((100, 1, 2, 2))
    # append new channel to image features
    task_features = mx.nd.concat(query_features, channel)

    preds = query_block(task_features)
    loss = loss + mx.nd.sum(mx.nd.square(mx.nd.squeeze(preds) - targets))


The following error is thrown when loss.asscalar() is called.

Too many reduction axes from [100,1,1,6,2,14,2,14] to [1,1,1,6,1,14,1,14]

As far as I know, this error is only thrown when context_embedding is computed using convolutional layers. I initially tried nd.tile on a pre-defined nd.array and was not able to replicate the issue. The error is also not thrown if the context embeddings are not tiled (e.g. when they are already the same size as the query image channels).

Can anyone shed some light on this issue?

The problem is the number of dimensions of your tensor (see code below where the error is thrown). Special operators like broadcast only support a maximum number of dimensions which is currently 5 (MXNET_SPECIAL_MAX_NDIM).

    const int ndim = (j <= 2? 2 : MXNET_SPECIAL_MAX_NDIM);
    new_small->assign(new_small->begin(), new_small->begin() + ndim);
    new_big->assign(new_big->begin(), new_big->begin() + ndim);
  } else {
    LOG(FATAL) << "Too many reduction axes from " << big << " to " << small;

None of the tensors that I manually create in the example above have a dimension greater than five. I suppose the implementations of the backward operators produce tensors with dimension six, which causes the issue?

How can I avoid reaching this ceiling?

I tried running your code and you are right: after mx.nd.tile the shape is (100, 6, 28, 28) but mx.nd.concat is throwing an error. I will open a Github issue.

For the time being you could trying using mx.nd.repeat instead of tile:

    channel = mx.nd.repeat(context_embedding, repeats=2, axis=1) 
    channel = mx.nd.repeat(channel, repeats=2, axis=2)
    channel = channel.expand_dims(axis=0)
    channel = mx.nd.repeat(channel, repeats=100, axis=0)

Unfortunately, repeat expands the matrix dimensions by repeating entires adjacent to each-other, which is not the behavior that I am looking for.

I’ve opened an issue on the Github page here. It would be excellent if we could followup on this discussion there.