Custom layer - infer shape after first forward pass


#1

Dear all,

I am designing a custom convolution layer (HybridBlock), and it is not easy to understand how to initialize the weight parameter (specifically the number of channels) after the first forward pass. I am looking at the source code of _Conv private class, but it’s a bit tricky. Any ideas?

My custom convolution operator (currently deriving from gluon.Block) is something like this:

class Conv2DS(Block):
    
    def __init__(self, nchannels, nfilters, kernel_size = 3, kernel_effective_size = 5, degree = 2, pad = None, dilation_rate=[1,1],**kwards):
        Block.__init__(self,**kwards)
        
        self.nchannels = nchannels
        self.nfilters = nfilters
        
        self.kernel_eff = kernel_effective_size
        self.dilation_rate = dilation_rate
        self.Bijkl = # This is some custom 4D matrix I use in convolution. 
         
        # Ensures padding = 'SAME' for ODD kernel selection 
        if (pad ==None):
            p0 = self.dilation_rate[0] * (self.kernel_eff - 1)/2
            p1 = self.dilation_rate[1] * (self.kernel_eff - 1)/2
            pad = (p0,p1)

    
        self.pad = pad
        with self.name_scope():
            
            # This is where I define the custom weight variable
            self.weight = self.params.get(
                'weight',
                shape=[nfilters,self.nchannels,kernel_size,kernel_size])
            
    def forward(self,_x):
         # I would like here the shape self.nchannels to be inferred from the input _x
         # Any pointers / ideas / easy small example?  
        weight = nd.sum(nd.dot(self.weight.data() , self.Bijkl),axis=[2,3])
        conv = nd.Convolution(data=_x,
                             weight=weight,
                             no_bias=True,
                             num_filter=self.nfilters,
                             kernel=[self.kernel_eff,self.kernel_eff],
                             pad=self.pad)
        
        return conv

thank you very much.


#2

Your sample code is using Block, not HybridBlock. The necessary steps are a bit different between the two and I’d recommend that you stick with HybridBlock. In __init__() of both cases, you’d want to call self.params.get() with any unknown dimension of shape argument to 0. For HybridBlock, everything is done for you and when hybrid_forward is called with data, the necessary shapes have been inferred. For Block, you then need to set the shape of the weight (self.weight.shape=(...)) using the shape of the passed-in data (in your case _x) and then call self.weight._finish_deferred_init() to initialize it.

Under the hood, HybridBlock will construct a symbolic graph of the block in order to infer the shape of the unspecified dimensions the first time data is passed into the block.


#3

Hi @safrooze, thank you very much for your reply, extremely appreciated!

Based on your suggestions, I’ve tried the following things (I’ve simplified the example):

Block version (works)

import mxnet as mx
from mxnet import nd, gluon
from mxnet.gluon import HybridBlock, Block

class Conv2DS(Block):
    # Now the nchannels variable has initial value zero, this is the variable I need to be inferred
    def __init__(self,  nfilters, nchannels=0, kernel_size = 3, kernel_effective_size = 5,**kwards):
        Block.__init__(self,**kwards)
        
        self.nchannels = nchannels
        self.nfilters = nfilters
        self.kernel_size = kernel_size
        self.kernel_eff = kernel_effective_size
        # Some custom operation that creates a "deprojection" matrix, for now a simple random NDArray
        self.Bijkl = nd.random_uniform(shape=[kernel_size,kernel_size,kernel_effective_size,kernel_effective_size])
        
        with self.name_scope():
            

            self.weight = self.params.get(
                'weight',allow_deferred_init=True,#  init=mx.init.Xavier(magnitude=2.24),
                shape=(nfilters,nchannels,kernel_size,kernel_size))
            
    
    def forward(self,_x):
        self.weight.shape = (self.nfilters,_x.shape[1],self.kernel_size,self.kernel_size)
        self.weight._finish_deferred_init()
        
        weight = nd.sum(nd.dot(self.weight.data() , self.Bijkl),axis=[2,3])
        #print weight.shape
        conv = nd.Convolution(data=_x,
                             weight=weight,
#                             bias=self.bias.data(),
                             no_bias=True,
                             num_filter=self.nfilters,
                             kernel=[self.kernel_eff,self.kernel_eff])
        
        return conv


nbatch=25
nfilters=12
nchannels=7


myConv = Conv2DS(nfilters, kernel_size=3, kernel_effective_size=5)
myConv.initialize(mx.initializer.Xavier())

so far so good, but when I try to do a forward pass:

xx = nd.random_uniform(shape=[nbatch,nchannels,128,128])
temp1= myConv(xx)
print (temp1.shape)
Output
(25L, 12L, 124L, 124L)

HybridBlock version (doesn’t work)

import mxnet as mx
from mxnet import nd, gluon
from mxnet.gluon import HybridBlock, Block


class Conv2DS(HybridBlock):
    # Now the nchannels variable has initial value zero, this is the variable I need to be inferred
    def __init__(self,  nfilters, nchannels=0, kernel_size = 3, kernel_effective_size = 5,**kwards):
        HybridBlock.__init__(self,**kwards)
        
        self.nchannels = nchannels
        self.nfilters = nfilters
        
        # Some custom operation that creates a "deprojection" kernel, for now a simple random NDArray
        self.Bijkl = nd.random_uniform(shape=[kernel_size,kernel_size,kernel_effective_size,kernel_effective_size])
        
        with self.name_scope():
            

            self.weight = self.params.get(
                'weight',allow_deferred_init=True,
                shape=(nfilters,nchannels,kernel_size,kernel_size))
            
    
    def hybrid_forward(self,F,_x):
                    
        weight = F.sum(F.dot(self.weight.data() , self.Bijkl),axis=[2,3])
        #print weight.shape
        conv = F.Convolution(data=_x,
                             weight=weight,
#                             bias=self.bias.data(),
                             no_bias=True,
                             num_filter=self.nfilters,
                             kernel=[self.kernel_eff,self.kernel_eff])
        
        return conv

then I can initialize the Conv2DS layer:


nbatch=25
nfilters=12
nchannels=7


myConv = Conv2DS(nfilters, kernel_size=3, kernel_effective_size=5)
myConv.initialize(mx.initializer.Xavier())

so far so good, but when I try to do a forward pass:

xx = nd.random_uniform(shape=[nbatch,nchannels,128,128])
temp1= myConv(xx)

I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-75-6a0caa7e4241> in <module>()
----> 1 temp1= myConv(xx)
      2 #temp2 = myConv_std(xx)

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in __call__(self, *args)
    302     def __call__(self, *args):
    303         """Calls forward. Only accepts positional arguments."""
--> 304         return self.forward(*args)
    305 
    306     def forward(self, *args):

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in forward(self, x, *args)
    507                     params = {i: j.data(ctx) for i, j in self._reg_params.items()}
    508                 except DeferredInitializationError:
--> 509                     self._finish_deferred_init(self._active, x, *args)
    510 
    511                 if self._active:

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in _finish_deferred_init(self, hybrid, *args)
    401 
    402     def _finish_deferred_init(self, hybrid, *args):
--> 403         self.infer_shape(*args)
    404         if hybrid:
    405             for is_arg, i in self._cached_op_args:

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in infer_shape(self, *args)
    460     def infer_shape(self, *args):
    461         """Infers shape of Parameters from inputs."""
--> 462         self._infer_attrs('infer_shape', 'shape', *args)
    463 
    464     def infer_type(self, *args):

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in _infer_attrs(self, infer_fn, attr, *args)
    448     def _infer_attrs(self, infer_fn, attr, *args):
    449         """Generic infer attributes."""
--> 450         inputs, out = self._get_graph(*args)
    451         args, _ = _flatten(args)
    452         arg_attrs, _, aux_attrs = getattr(out, infer_fn)(

/home/dia021/anaconda2/lib/python2.7/site-packages/mxnet/gluon/block.pyc in _get_graph(self, *args)
    369             params = {i: j.var() for i, j in self._reg_params.items()}
    370             with self.name_scope():
--> 371                 out = self.hybrid_forward(symbol, *grouped_inputs, **params)  # pylint: disable=no-value-for-parameter
    372             out, self._out_format = _flatten(out)
    373 

TypeError: hybrid_forward() got an unexpected keyword argument 'weight'

Any ideas what I am doing wrong and how to fix it? The input to the Conv2DS will be another convolution - image - operator of size (nfilters, nchannels, height, width). the dimension I need to infer on run-time is nchannels.

Thanks!


#4

This is because hybrid_forward passes the data as well as all the parameters to your block. The reason for this is that hybrid_forward may be called with F as NDArray or as Symbol and if F is a Symbol, you must have access to a symbol that represents your parameter (which you cannot do by using self.weight).

So the solution to your problem is to either add **kwargs to your hybrid_forward function signature, or simply add a weight argument to the function signature.

Another problem with your HybridBlock code is self.Bijkl. This is an NDArray instance, which cannot be used in hybrid_forward(). Remember that hybrid_forward may be called with Symbol or NDArray, so you cannot have dependency on one of the other (and that’s why F is passed in). The operation that is required to create self.Bijkl must be moved to hybrid_forward() and must be changed to utilize F instead of nd.

It seems like in your case bijkl might be a constant parameter. In that case, you can add a parameter, set differentiable to False, and use a Constant initializer for it.


#5

Hi @safrooze, again many thanks for your answer. I’ve been playing around with your suggestions - not much luck in managing to hybridize my custom layer.

Would it please be possible to provide a simple example of a HybridBlock wrapper around an nd.array object? I need to create a matrix object in numpy (it would be very time consuming to create it from scratch in nd.array - it’s basically BSplines definitions which already exist in python), and then transfer it to nd.array. Something like

class ndarray_wrap(HybridBlock):
    def __init__(self, const_ndarray, *kwards):
        HybridBlock.__init__(self,**kwards)

       # Some operations that take  constant const_ndarray 
       # transforms const_ndarray to a layer with no differentiation 
       self.constant_layer = ... 


    def hybrid_forward(self,F,x):
        return self.constant_layer

such as it can be used in combination with other HybridBlocks and eventually hybridize the whole network?
Again, many thanks for your time.