Implement custom operator with tile operation

I have a custom operator defined in python. I need to implement this operator in C++. I followed the tutorial from However, this tutorial does not provide enough information for me when I write my forward function:
The python code for the forward function is:

def forward(self, is_train, req, in_data, out_data, aux):
    bbox_pred = in_data[0]
    tile_shape = (bbox_pred.shape[0], self._num_anchors, bbox_pred.shape[2], bbox_pred.shape[3])
    bbox_mean = mx.ndarray.tile(self._bbox_mean.as_in_context(bbox_pred.context), reps=tile_shape)
    bbox_std = mx.ndarray.tile(self._bbox_std.as_in_context(bbox_pred.context), reps=tile_shape)
    bbox_pred = bbox_pred * bbox_std + bbox_mean
    self.assign(out_data[0], req[0], bbox_pred)

Currently, I have:

    template<typename xpu>
    void RpnInvNormalizeOpForward(const nnvm::NodeAttrs& attrs,
                                  const OpContext& ctx,
                                  const std::vector<TBlob>& inputs,
                                  const std::vector<OpReqType>& req,
                                  const std::vector<TBlob>& outputs) {
        mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
        const TBlob& in_data = inputs[0];
        const TBlob& outdata = outputs[0];
        const RpnInvNormalizeParam& param = nnvm::get<RpnInvNormalizeParam>(attrs.parsed);
        using namespace mxnet_op;
        MSHADOW_TYPE_SWITCH(out_data.type_flag_, DType, {
            MXNET_ASSIGN_REQ_SWITCH(req[0], req_type,  {

How can I implement the same forward function in C? Are there any materials that I can learn more about defining custom operators?

@yizhao, thanks for your post. Indeed there is little in the way of tutorial apart from the one you linked to. The best way to look at defining custom operators is to get inspired by all the other already defined operators. You can find them here: @ifeherva any advice?

I think the tutorial is a good start, but it does give very little information about the ‘big picture’ of how mxnet works. I read most of the relevant source code sections to understand it.

In the OP’s case, he is missing the fact that he needs to write the shape inference code which will tell mxnet the size of the output array. The forward code is then used to write that memory block. For speed you typically launch kernels (for each value in the output tensor) that takes some ‘position’ input and the pointer of the input memory block. There are of course many ways to further optimize this.

For the above problem I would take a look at similar operators first and learn how those work.

1 Like