NWC layout for Conv1D

Is the NWC layout not supported for Conv1D?

model.add(Conv1D(channels=128, kernel_size=3, layout=‘NWC’, padding=1))

Getting error:

XNetError: Invalid Input: ‘NWC’, valid values are: {None, ‘NCDHW’, ‘NCHW’, ‘NCW’, ‘NDHWC’, ‘NHWC’}, in operator Convolution(name="", layout=“NWC”, pad="(1,)", num_filter=“128”, dilate="(1,)", num_group=“1”, stride="(1,)", no_bias=“False”, kernel="(3,)")

Only NCW is supported. You can reshape the data into 2D and use cudnn for NHWC format.

Thanks! Would it be equivalent if I add a layer doing SwapAxis(1,2) before Conv1D?

class SwapAxis(HybridBlock):
    def __init__(self, dim1, dim2, **kwargs):
        super(SwapAxis, self).__init__(**kwargs)
        self.dim1 = dim1
        self.dim2 = dim2

    def hybrid_forward(self, F, x):
        return F.SwapAxis(x, self.dim1, self.dim2)

I think you need to transpose NWC into NCW in order to use Conv1D (not sure if your implementation of SwapAxis would really transpose data or not). But I suggest you reshape data into 2D to use cudnn, because transposing has heavy overhead and cudnn is faster.

Thanks @reminisce! Are you referring to using cudnn directly? We might end up doing this, though it would nice to have a Gluon (or core MXNet alternative) as our ML people don’t have C++ experience

I meant that in order to achieve the best performance, you can just reshape the layout NWC into NHWC and use Gluon Conv2D, instead of transposing the layout into NCW and using Gluon Conv1D. By default, Gluon Conv2D calls cudnn convolution in the backend for contexts on GPUs. You don’t need to interact with cudnn c++ interfaces. Note that if you are going to use NWC layout, you need to make sure that all the other layout-sensitive operators (such as pooling, deconv, etc.) in the model supports this layout as well.

Thanks again! To clarify -

we use Conv1D for text document classification. N is samples, C is elements of GloVe embedding vectors (300 channels), and W is words. The 1D kernels (of sizes 2,3,4, etc.) are sliding along the words dimension to capture bigrams, trigrams, etc. and cover 300 channels.

The initial input was NWC (samples, words, embeddings). If my understanding is correct, you are proposing to use NHWC (samples, words, embeddings, channels=1). The kernel size would be e.g. 3x300, right?

Also, is it correct that Conv1D doesn’t use cudnn convolution, but Conv2D does, which makes it more efficient?

@avolozin If my understanding is correct, the layout of your input data to convolution layer is NWC, with shape=(32, 1000, 300) and kernel=(2,) for example. In order to use Conv2D which employs CUDNN 2D convolution in the backend, you need to reshape the input data into shape=(32, 1, 1000, 300) and kernel=(1, 2), i.e. insert a dummy axis between N and W as H with dim_size=1. After the convolution is finished, you also need to reshape the 4D data output into 3D. Because the reshape operation only changes the view of NDArray without actual data copying, this approach is much more efficient than transposing NWC into NCW and using Conv1D.

Because CUDNN convolution only has implementations for 2D and 3D data input, Conv1D in MXNet employs its own implementation by default. In most cases, CUDNN convolution is more efficient than others, but that’s not always the case. You can run some benchmark and decide which version of convolution you should take as well as considering the overhead of making layout format compatible for the convolution layer.

1 Like

@reminisce, thanks for the detailed explanation! I’ll give it a try early next week