What happen when I group two same model together and train it?


For example:

softmax1 = resnet_50()
softmax2 = resnet_50()
out = mx.sym.Group([softmax1, softmax2])
model = mx.module.Module(symbol=out, context=ctx, data_names=['data1', 'data2'], label_names=['softmax_label1','softmax_label2'])
train_dataiter = get_dataiter() #will produce the DataBatch with [('data1',[N, 3, 224,224]), ('data2',[N,3,224,224])] and the data and label of data1 and data2 are totally the same.

model.fit(train_dataiter, ......)

My code is something like above. Both resnet 50 is initialized by the same initializer. However, the outputs of these two softmax are different in the begining, Something likes 27.x vs 21.x

It is very strange I think.


Hi @huyangc,

I think this is expected. Just because you’re using the same mxnet.initializer.Initializer, it doesn’t mean that you’ll get the same initial values for the weights of each network, even if they are identical networks.

As an example, both networks might be initalized with mxnet.initializer.Normal, but the 1st network will get a different sample from that distribution for the first weight of the first layer, than the 2nd network for the corresponding weight. Giving different initial outputs for the same inputs, as you’ve found.