Hi,
I want to train a neural network model with mxnet. Basically it has 2 hidden layers, one with 1024 nodes and the other with 512 nodes. The input nodes are 250k while the output is 200k. It’s a fully connected network and I use the following pseudo code for the train:
net = mx.sym.load(model_file)
ctx = [mx.gpu(i) for i in range(8)]
model = mx.mod.Module(
symbol=net,
context=ctx,
data_names=['data']
label_names=['label']
)
#load train input data
#load train output data
for each epoch:
for each batch:
#prepare the train input/output data for current batch
train_iter=mx.io.NDArrayIter(train_in, train_out,batch_size)
for batch in train_iter:
model.forward(batch, is_train=True)
model.backward()
model.update()
The job is running on a 8-GPU host. There are about 2M train samples and the batch size is 256. It takes more than 3 hours for just single epoch. Some profiling shows that nearly half of the time spent on preparation the train_iter, for current batch while the other half is for the model forward/backward/update.
In addition to run this with multiple-host, may I ask is there any other suggestion to speed up this train process in single host? Increase batch number? Compile mxnet with NNPACK? KVStore with device setup? Really appreciate that.