I want to train a neural network model with mxnet. Basically it has 2 hidden layers, one with 1024 nodes and the other with 512 nodes. The input nodes are 250k while the output is 200k. It’s a fully connected network and I use the following pseudo code for the train:
net = mx.sym.load(model_file) ctx = [mx.gpu(i) for i in range(8)] model = mx.mod.Module( symbol=net, context=ctx, data_names=['data'] label_names=['label'] ) #load train input data #load train output data for each epoch: for each batch: #prepare the train input/output data for current batch train_iter=mx.io.NDArrayIter(train_in, train_out,batch_size) for batch in train_iter: model.forward(batch, is_train=True) model.backward() model.update()
The job is running on a 8-GPU host. There are about 2M train samples and the batch size is 256. It takes more than 3 hours for just single epoch. Some profiling shows that nearly half of the time spent on preparation the train_iter, for current batch while the other half is for the model forward/backward/update.
In addition to run this with multiple-host, may I ask is there any other suggestion to speed up this train process in single host? Increase batch number? Compile mxnet with NNPACK? KVStore with device setup? Really appreciate that.