Sparsity penalty on hidden layers

I’d like to add a sparsity penalty on hidden layers, similar to how the penalty is applied in sparse AutoEncoders (ref: https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf).

I think this should be possible using mxnet.symbol.IdentityAttachKLSparseReg, but I’d like to do it with gluon. Looking around at some of the other operators, and with some trail and error, I ended up with something like:

class SparsityPenalty(gluon.HybridBlock):
    def __init__(self, sparseness_target, penalty, momentum, **kwargs):
        super(SparsityPenalty, self).__init__(**kwargs)
        self._kwargs = {
            'sparseness_target': sparseness_target,
            'penalty': penalty,
            'momentum': momentum
        }
        
        self.moving_avg = self.params.get('fwd_moving_avg', grad_req='null', allow_deferred_init=True)
    
    def hybrid_forward(self, F, x, moving_avg):
        self._kwargs['moving_avg'] = moving_avg
        return F.IdentityAttachKLSparseReg(x, name='fwd', **self._kwargs)

I’m not sure if this is set up the right way, especially as the moving_avg is used as auxiliary data by IdentityAttachKLSparseReg.

I also had to set up the param so that it’s name matched the one generated within IdentityAttachKLSparseReg. Otherwise, an exception is raised within net.hybridize(). Hence the ‘fwd_moving_avg’. I think this should be fixable, but haven’t found out how.

Any guidance on this would be appreciated.

There shouldn’t be a constraint on naming. We’ll look into this.

As for sparse regularization, in Gluon, it is recommended to calculate the regularizer by yourself and add it to the final loss, which is more flexible than IdentityAttachKLSparseReg.

@ssbusc1 thanks for bringing this up. Could you paste the code that could produce the exception?

Since the penalty is defined as a function of the hidden layer activations, it is not obvious to me if this can be done in a simple enough implementation of gluon.loss.Loss and to get the backprop to work as expected - if that was the suggestion.

Considering the version using the symbolic API is more straightforward, i.e. something like

	net = mx.sym.FullyConnected(data=data,name='hidden1',num_hidden=hidden)
	net = mx.sym.Activation(data=net,name='act1',act_type='sigmoid')
	net = mx.sym.IdentityAttachKLSparseReg(data=net,sparseness_target=rho,penalty=beta,name='kl1')
	net = mx.sym.FullyConnected(data=net,name='hidden2',num_hidden=hidden)

it would be good to have similar support with Gluon as well.

This was probably just a mistake on my part, and if so, my bad.

In an earlier version, I missed passing the param as an argument to IdentityAttachKLSparseReg, and the exception message led to the param name change.

class SparsityPenalty(gluon.HybridBlock):
    def __init__(self, sparseness_target, penalty, momentum=0.9, **kwargs):
        super(SparsityPenalty, self).__init__(**kwargs)
        self._kwargs = {
            'sparseness_target': sparseness_target,
            'penalty': penalty,
            'momentum': momentum
        }
    
    self.moving_avg = self.params.get('fwd_moving_avg', grad_req='null', allow_deferred_init=True)

def hybrid_forward(self, F, x, moving_avg):
    # missed setting moving_avg here <--------
    # self._kwargs['moving_avg'] = moving_avg
    return F.IdentityAttachKLSparseReg(data=x, name='fwd', **self._kwargs

Then, with a model defined like this:

def model(num_outputs):
    num_hidden = 128
    net = gluon.nn.HybridSequential()
    with net.name_scope():
        net.add(gluon.nn.Dense(num_hidden, activation="sigmoid"))
        net.add(SparsityPenalty(0.05, 0.001))
        net.add(gluon.nn.Dense(num_outputs, activation="sigmoid"))
    
    net.hybridize()
    return net

During training, this results like:

KeyError: 'hybridsequential22_sparsitypenalty0_fwd_moving_avg'

Renaming the param to ‘fwd_moving_avg’ got rid of the exception, which was confusing.