I’m facing difficulties when extracting intermediate layers from a pre-trained model and feed the pre-trained parameters to it. And I’m posting to ask if there is a clean way to do this.
For example, let’s suppose I have a graph g: A->B->C->D->E loaded using Gluon, and what I need is B->C->D.
Step 1: It is fairly easy to use
g_sym = g(mx.sym.var("input")) to get the symbol, and then
g_int = g_sym.get_internals() to get all internal symbols. In this way, I can get A->B->C->D, by
Step 2: When I try to further trim it to B->C->D. I find that the only way is to traverse the graph back using
get_children() until B, and then reconstruct the whole graph. For example, I use the following code snippet to reconstruct a Convolution symbol programatically.
# Suppose we have recursively generated the symbol graph B->C f = getattr(mx.sym, D_op_type) # suppose D_op_type = 'Convolution' new_D = f(data=new_C, name=old_D.name, **old_D.list_attr()) subgraph = gluon.SymbolBlock(new_D, input)
Step 3: Now, when I try to load the pretrained parameters into the new model
subgraph, the problem is: the parameters cannot be loaded successfully because the names of the implicitly created parameters (weight and bias) can be different. For a pretrained model, the weight and bias of a certain layer can be arbitrary. But for MXNET Symbol API, creating a Convolution layer
conv1 will always have its weight and bias symbol named and created as
conv1_bias automatically, which I cannot find a way to modify them.
Therefore, my solution to this is to change the corresponding parameter name in the old graph (A-B-C-D-E) instead, to keep it consistent with the new graph. But, this can be very tricky and not decent sometimes, because it is not reliable to infer if two parameters are equivalent simply by parsing their names/shapes. For example, if pre_trained model has the weight and bias named
conv1_b, whereas the new model has them named
conv1_bias. If these two parameters have the same shape (although unlikely), it’s hard to draw the connection between the same parameters of the old and new graph.
Has anyone ever done similar things before, or have any idea how to extract subgraph from an existing symbol graph and load the parameters decently?