Reshaping/broadcasting without hardcoding target dimensions


#1

Hi all,

I’ve identified a pattern of issues that has come up for us during the translation process between Mathematica’s high level net representations and MXNet symbols.

Consider a net that takes a matrix M of size (n, 5) and a vector V of size 5. Let’s say we wish to catenate these, by first broadcasting the vector V from size (5) to size (n, 5), so that it is compatible with the matrix M. Then we catenate to produce a matrix of size (n, 10).

Fundamentally, n here is a dynamic dimension: we wish to take the exact same MX symbol and via MXSymbolInferShape create new symbols for various values of n.

This is not possible currently as far as I know. While you could imagine a broadcast_catenate symbol that makes this possible, this is not the only place that this particular issue comes up.

Another example is when using ‘batch-flattening’ to map a FC layer over a sequence. In this example, say you have a sequence X of size (b, t, 3). Here, b is the batch dimension and t is the sequence(time) dimension, 3 is the feature count.

Then let’s say you have an FC layer that wants an input of size (b, 3). By using the reshape spec {0, -3, -2} code, we can use Reshape to obtain a version of X in which the batch and time dimensions have been flattened together. Now, this reshaped tensor X’ has dimensions (b * t, 3).

Next, we apply the FC layer as normal, to obtain, say, an output tensor Y with shape (b * t, 4). Now, we must reshape this to have dimension (b, t, 4). Again, there is no way of doing this such that the same MX symbol will work for all values of t via appropriate calls to MXSymbolInferShape.

We wish to do this via InferShape for two reasons: the compilation process from Mathematica’s high level networks to MXNet is expensive, and repeating it for different n is wasteful given that the resulting graphs are otherwise identical, and when deploying the net outside of our high-level framework, InferShape is a very easy way to create new graphs for a specific input length from e.g. C or C++ code. It is not possible to call Mathematica’s MXNet compiler in that case.

Does the MXNet community have any suggestions for how to handle this general class of issue? It seems like there is a missing operator, a more flexible version of reshape_like, that would allow us to use a second ‘shape target’ but take only specific dimensions from it to produce the target shape, rather than take all of the dimensions from it.

Thanks,
Tali


#2

Here’s another example of where something like this is needed, it came up just today:

The current “pad” operation doesn’t allow padding on dims 0 and 1. Let’s say we wish to pad on level 1 (say we are using a NHWC representation of images). The expensive way to work around this is to transpose axes to NCHW representation, pad that, and swap back. It would be much cheaper to simply trick the “pad” layer into doing the right thing by reshaping to (N, 1, H, W*C) and then padding on axis 2 (the H) via the spec (0, 0, 0, 0, p, p).

We can again use the merge code for “reshape” op to accomplish this, e.g. using the the spec (0, 1, -1, -3, -2). But this reshape operation cannot be undone, even though all the required dimensions are right there in the graph. Specifically, we cannot separate the final dimension W*C back into W and C. If we had a more flexible “reshape_like” (or more codes to “reshape”) we could do this, by referring to the C dimension from the original input as part of the split code.


#3

There are two ways I have experimented with for solving this:

  1. split_like, which was a simple layer that works in many cases and is just a bookkeeping mechanism for shape inference, basically. This PR was rejected: https://github.com/apache/incubator-mxnet/pull/8949. we are using a fork of MXNet internally that has this operator.

  2. adding a code to reshape, and also a second optional input that serves as the source of the dimensions

I didn’t get 2 to work, which may have been due to bugs elsewhere or to my own unfamiliarity with the framework and how optional inputs are supposed to work. I also have a philosophical problem with 2, which is that the reshape layer is at this point really overcomplicated. it already has the split and join codes (which I originally introduced for very similar reasons to do with reshape flexibility), and adding an optional second input makes it even more complicated.

i would prefer if that complexity was siloed in another operator that most users never have to see. i don’t want to pursue a PR that just gets rejected again, that’s a waste of my and others time, so I need productive discussion and hopefully a recognition that this is a problem that needs to be solved and figuring out the optimal way to solve it that involves all the relevant stakeholders. either we should reconsider split_like or we should design a new operator called e.g. reshape_like that explicitly takes two inputs and reshapes the first input according to a mixture of dimensions from the first and second.


#4

Hey @taliesinb, thanks for your write-ups and contributing to MXNet. I’ll bring it up with others and hopefully we can get back to you with some suggestions as to the best way forward!

I personally don’t have anything against 1) though I agree with the feedback which was:

  • naming could lead to confusion
  • the layer needs testing

#5

@taliesinb I spoke with @piiswrong and the best way forward is probably to extend the reshape_like operator to be able to target specific dimensions of the input tensors, for example:
reshape_like(x, y, src_begin=0, src_end=2, target_begin=0, target_end=1)

That should provide you the flexibility needed to reshape back to to your original tensor if needed so.

What are your thoughts about that?


#6

I like that idea of putting this functionality in reshape like.

How will {src,target}_{begin,end} work exactly?


#7

My understanding is that you would reshape x dimensions’ between src_begin and src_end following y dimensions’ between target_begin and target_end.

if x1 shape is (10, 3, 20) for .e.g (n, t, c1) and x2 is (30, 100) for e.g(n*t, c2)

You want to reshape the first dimension of x2 according to the first and second dimensions of x1

reshape_like(x2, x1, src_begin=0, src_end=1, target_begin=0, target_end=2)

would give you x2 of shape (3, 10, 100)


#8

@ThomasDelteil That design would solve all cases I can think of, and seems pretty easy to understand compared to the alternative designs that I’ve pondered :sweat_smile:.

Thanks for your help in getting this issue un-stuck!