Dependency injection in Gluon blocks


#1

I’m trying to solve the following problem. I want to create a Gluon Block that depends on user-provided Block. I.e. in constructor it takes another Block. Think of seq2seq model that allows to provide custom encoders. Just passing already created Block in constructor is not ideal, because the passed Block would end up having “wrong” name scope.

My workaround for this is applying lazy Block creation. Instead of passing already created Block, I could pass a callable that creates it (e.g. partial(Block, **args). This is a slight annoyance for my Block’s users though.

Can you think of any better solution of this problem?


#2

This depends on what you’re trying to do in the internals of your own Block. You should be able to pass an already created Block in the constructor, set it to a variable and use the forward of the already-created user defined Block in the forward function of your Block. Did this not work for you?

Can you post a snippet of your own Block to see how you’re currently using the callable that creates the block?


#3

It works, obviously. The only annoyance is that the passed Block is not created under proper name scope. It might not seem like a big deal but it can be a source of confusion and I’d rather avoid it.


#4

Ah I see. I can’t really think of a less cumbersome solution than the one you already have with the callable.


#5

Forgive this silly question: how important is it that the passed in block is under the same namespace? I’ve done this kind of thing a bunch of times and usually it’s fine if they’re different namespaces.

If the concern is that the autogenerated parameter names of the passed in block change depending on runtime creation, then the simple solution to that is simply passing in a prefix when you create the encoder block so that it always has the same namespace regardless of creation time.

If there is some reason the namespace is really important, then there are maybe three options. One is the one you’re doing with a factory. The second is not to pass it in at construction, but with a setter where you create the child with the set, quite similarly to how Sequential works.

For example:

parent = MyParentBlock()
with parent.name_scope():
  parent.set_encoder(
     MyEncoder()
  )

Naturally, the limitation here is that the encoder has to be created in a similar code space as the parent block, and if your encoder is coming in as an argument to a function that won’t work.

The third option is to use a prefix when you create the encoder that will match the parent. This option is pretty brittle though.

Actually I guess there is a fourth option, which is you reassign the parameter dict of your encoder to a new with with a proper name space, but simply give it the same Parameter object references as the original parameter dict. Feels ugly, but I think that would work.


#6

First I have to say I love how you’re diving deep into Gluon!

Your fourth option won’t work because the Parameter object stores the name. You can, however create a new Parameter object and take the internals of the old Parameter object, but that’s even more ugly.

There is no particular reason for namespaces to follow proper hierarchy. That’s why the Block.save_parameters() call does not save the parameters by their name, but rather by their actual hierarchy within Gluon Block chain.

The only time you’d see the parameter names being used is when you do a HybridBlock.export() which exports the block as a symbolic graph and parameters are saved as a key:value pair. Again everything will work fine, but if someone were to open this file, the names won’t follow a logical hierarchy.

This problem is not unique to Gluon and in other platforms, like TF, you see the same issue. However in TF the task of creating the namespace hierarchy is completely manual, so it is a bit more explicit how parameters are named. Gluon tries to assign more readable parameter names for you, but that means if you diverge from the typical bottom-up network creation, your names won’t look pretty.


#7

Oh, I wanted to answer that proper namespaces are important for model saving / loading, but apparently I was not familiar with the newest API : (

But there’s one more thing where I think proper name spaces might be useful. Imagine you want to do something with a group of parameters (e.g. freeze them). It might be handy to be able to select them by a common prefix.


#8

But there’s one more thing where I think proper name spaces might be useful. Imagine you want to do something with a group of parameters (e.g. freeze them). It might be handy to be able to select them by a common prefix.

Maybe, but I think if you wanted the dependency injection to receive the parent block’s namespace, then you can also just call collect_params on the parent block to get all of the params, rather than use names (collect_params will still get the params of its injected block, because assigning a block to a python attribute of the parent object will automatically register the block with the parent).

And in the opposite case where you only want the parent block’s params and not the ones from the injected block, you can do some simple set arithmetic.

I would personally prefer this way of using collect_params and set logic for special exclusions since using string names makes me a bit uncomfortable from a software robustness standpoint :stuck_out_tongue: