Growing network size incrementally


#1

I had this idea that it is possible to increase neural net layer sizes incrementally along the training. Could you guys look into it and feedback if it’s not worth it.

Ok, imagine you’re training a network and you want to test things out how it would work if you had a bigger layer sizes. But changing network architecture will require weights reinitialization and time for learning from scratch which is a pain if you spent a week on training your model on a cheepy hardware.

So how this can be managed theoretically:

(I’ve found it hard to describe the problem using current editor, so below is the link to image)

Explanations:
The link above shows a way to augment weight matrix (just with zero matrix) in a way that would not affect the output (even though weight matrices have different sizes, the result of matrix multiplication and addition of biases yields to the same result due to properties of matrix multiplication). If output values didn’t change in result of such augmentation then the activation function will also yield to the same results and the output of whole neural network will be the same.

There is problem to if however. Using zeros is not a good idea because all of the weights might receive the same deltas during training if number of network outputs is low. So instead of initializing a matrix with raw zeros it is possible to augment it with random values that are very close to zero matrix.

One more thing, first layer weight matrix can not be changed or it will affect the size of the input. So this change is possible only on layers 2,3,4 …

Question:
In order to test this I need to create a new neural network layer based on values of existing one which turned out an unbearable task in MxNet leading to re-implementation of layer itself or maybe I’m missing something and it is simple to create a new layer based on custom weights and biases?

Another question is about inserting additional hidden layers, in-between those that are already defined, is there a way to do that?

Thank you in advance!


#2

Hi @lu4,

Just cross posting my answer from Github so that it’s recorded here too: https://github.com/apache/incubator-mxnet/issues/11133.

You can create a clone of your network, and then make adjustments during the copy. If you’re using a Sequential Block as a container for your network, you could create another Sequential Block and add all of the layers from one network to the other, which would save redefining the network. You would make changes to the necessary layers before adding them to the new Sequential Block.

As I understand the problem, you’ll need to change the weights and biases for the layer you want to expand, and the weights for the next dense layer (as the weights shape depends on the units in the layer before which has changed). After constructing the news weights and biases (i.e. padding with 0s), you can then use set_data on the parameters of interest before adding to the new Sequential Block.

Unfortunately I don’t think you can’t mutate the original network like this, because you’re changing the shape of the parameters. You’ll hit shape assertion errors. And you can’t just swap out a single layer in the original Sequential Block because they don’t support assignment.