Potential bug in ParameterDict get from shared dict


#1

Hey all, as the title indicates I think there’s a bug (or it’s just weird behavior) in how ParameterDict handles the additional ParameterDict’s parameters that it can be initialized with.

The Gluon Block’s collect_param method does not surface the parameters of “shared” ParameterDicts and we need this functionality for some other work.

I think this is a bug in the implementation of the ParameterDict class’s getitem functionality, but I definitely could be missing something. The get method looks in the shared dict, but the ___getitem___ method doesn’t, and the ___getitem___ is what the Gluon Block’s collect_params method uses. I don’t need it to add items to the shared dict, just for them to show up when I loop through the overall ParameterDict.

Minimum reproducible example

from mxnet.gluon import ParameterDict
outer_params = ParameterDict()
outer_params.get("a", shape=(2,2))
outer_params.initialize()

from mxnet.gluon import HybridBlock
class ExampleBlock(HybridBlock):
    def __init__(self, param_dict, prefix=''):
        super(ExampleBlock, self).__init__(prefix=prefix, params=param_dict)

eb = ExampleBlock(outer_params)


eb.collect_params()

Running this, I would expect it to output the a parameter, but instead it outputs just an empty parameter dictionary because collect_params doesn’t look inside the shared param_dict.

What have you tried to solve it?

To unblock temporarily, we’re just using a workaround in our code and just manually overriding the _params of the Block with the correct ParameterDict, but that’s not a real option even in the short term. The obvious solution I see would be to make__getitem__ also loop over the shared ParameterDict’s item’s but I don’t know what issues that might cause elsewhere.

Thanks all,
Eric

P.S. Related issue I filed here.


#2

This sounds like a bug to me. I can think of no good reason why shared params should be available from getitem but not returned to a Gluon Block when calling collect_params. If this is by design, I’d like to understand the reasoning behind it, because it seems like a recipe for subtle and hard to detect bugs. Particularly because it means that Gluon code may behave differently than other code.