Dense layer can't specify linear explicity


I’m curious why gluon.nn.Dense doesn’t support activation='linear' instead of treating both non-specification of an activation and activation='linear' as having the same behavior?

Im mostly asking, because a) for python - its usually considered better to be explicit, b) for pedagogical purposes - it seems to make the code nicer to read… Just curious why the decision was made


Thanks for the suggestion. I guess this was simply made to keep the code looking clean (otherwise you’d have to intercept the ‘linear’ argument specifically for every layer that’s being coded up, thus adding quite a bit of spurious code).


Im not sure I understand @smolix. A keyword argument means they wouldn’t have to specify it each time, but could if they wanted to. Im thinking something like:

def Dense(x, activation='linear'):

So, in case you did want to make it clear you could… And I don’t think it changes the current behavior, but I might be wrong?


I understand what you want. But at some point on the backend someone needs to take care of the argument. Not a big deal … if you feel strongly about it, why not create a pull request that implements it for all available layers. This is open source :slight_smile:.


Sure, I can have a go. I just wanted to make sure I wasn’t diving deep before discovering there was a systematic reason not to do that :slight_smile:


Best check with Junyuan. But AFAICT it’s just that nobody so far asked for this.


@piiswrong I was thinking of just modifying this line:

if activation is not None and activation != 'linear':
   self.act = Activation(activation, prefix=activation+'_')
   self.act = None


Okay, it looks like there are three real options here. None seem too appealing to me, which might be why it was never done:

  1. What I suggested above, which adds lines of code to the files,,,
  2. Adding a decorator to the __init__ functions or (slightly preferable) - using a common MetaClass/BaseClass and overriding the activation attribute to be None if it is 'linear'
  3. Changing the underlying C/cuda code to allow a linear activation - this, if possible, would be the best approach, since my guess is it will apply to all other languages - avoiding the creation of an inconsistent interface. What I don’t see is how this might impact performance or create unnecessary overhead everytime a new layer is created

Im happy to do any of the three, or to let it be if these complications were why they weren’t done in the first place


Quite honestly, I would probably let it slide. There’s no extra functionality that this offers. It’s just that it looks prettier by some standard. But it adds extra code that means that there’s extra space for introducing bugs. So, unless someone badly wants it, let’s keep it the way it is.