Hi, I am new to MXNet, and, for my thesis, I want to replicate the following architecture.
Notice the two branches. Depending on the value of t, only one branch is fitted. So if for a sample x, t equals 1, the h1 branch will be trained (analogous for 0).
I have tried the following:
# Representation Layers
data = mx.sym.Variable('data', dtype='float32')
rep_fc1 = mx.sym.FullyConnected(data=data, name='rep_fc1', num_hidden=rep_hidden_size)
rep_elu1 = mx.sym.Activation(data=rep_fc1, name='rep_elu1', act_type="relu")
rep_fc2 = mx.sym.FullyConnected(data=rep_elu1, name='rep_fc2', num_hidden=rep_hidden_size)
rep_relu2 = mx.sym.Activation(data=rep_fc2, name='rep_relu2', act_type="relu")
rep_fc3 = mx.sym.FullyConnected(data=rep_relu2, name='rep_fc3', num_hidden=rep_hidden_size)
rep_relu3 = mx.sym.Activation(data=rep_fc3, name='rep_relu3', act_type="relu")
# Hypothesis Layers for t = 1
t1_hyp_fc1 = mx.sym.FullyConnected(data=rep_relu3, name='t1_hyp_fc1', num_hidden=hyp_hidden_size)
t1_hyp_relu1 = mx.sym.Activation(data=t1_hyp_fc1, name='t1_hyp_relu1', act_type="relu")
t1_hyp_fc2 = mx.sym.FullyConnected(data=t1_hyp_relu1, name='t1_hyp_fc2', num_hidden=hyp_hidden_size)
t1_hyp_relu2 = mx.sym.Activation(data=t1_hyp_fc2, name='t1_hyp_relu2', act_type="relu")
t1_hyp_fc3 = mx.sym.FullyConnected(data=t1_hyp_relu2, name='t1_hyp_fc3', num_hidden=hyp_hidden_size)
t1_hyp_relu3 = mx.sym.Activation(data=t1_hyp_fc3, name='t1_hyp_relu3', act_type="relu")
t1_hyp_fc4 = mx.sym.FullyConnected(data=t1_hyp_relu3, name='t1_hyp_fc4', num_hidden=1)
# Hypothesis Layers for t = 0
t0_hyp_fc1 = mx.sym.FullyConnected(data=rep_relu3, name='t0_hyp_fc1', num_hidden=hyp_hidden_size)
t0_hyp_relu1 = mx.sym.Activation(data=t0_hyp_fc1, name='t0_hyp_relu1', act_type="relu")
t0_hyp_fc2 = mx.sym.FullyConnected(data=t0_hyp_relu1, name='t0_hyp_fc2', num_hidden=hyp_hidden_size)
t0_hyp_relu2 = mx.sym.Activation(data=t0_hyp_fc2, name='t0_hyp_relu2', act_type="relu")
t0_hyp_fc3 = mx.sym.FullyConnected(data=t0_hyp_relu2, name='t0_hyp_fc3', num_hidden=hyp_hidden_size)
t0_hyp_relu3 = mx.sym.Activation(data=t0_hyp_fc3, name='t0_hyp_relu3', act_type="relu")
t0_hyp_fc4 = mx.sym.FullyConnected(data=t0_hyp_relu3, name='t0_hyp_fc4', num_hidden=1)
rep_net = gluon.SymbolBlock(outputs=[t1_hyp_fc4, t0_hyp_fc4, rep_relu3], inputs=[data])
return rep_net
Which works “fine”. I compute the combined losses for t=1 and t=0 and the backward that. But this still runs both branches each time.
Would you know of a way, in which only one branch is trained depending on the value of “t”?
Thank you.