Parallel execution on GoogLeNet

myz · March 28, 2018, 6:44am

Hi guys, I’m trying to figure it out whether different branches of a layer can be executed in parallel. A simple example is googlenet. If 1-1 conv, 3-3 conv etc. are deployed in different devices, can they run in parallel? Since the input data needs to be copied to other devices, it is always sequential during my testing, (use mx.profiler for visualization), which shows the convolution is computed first, then the data is copied to other devices and computed there. I’m just curious is there any way to make it copy data first so that the convolution can be executed at the same time.

thomelane · March 30, 2018, 5:37am

Hi @myx,

So with Gluon, you control when and where the data is sent, and the where the operations (e.g. convolutions) are initialized (i.e. where the weights/biases are stored). You should be able to get parallel execution across GPUs by sending the data to both GPUs, initializing the convolutions on different GPUs, and then applying the convolutions.

import mxnet as mx
from mxnet.gluon import nn

# batch size * channels * height * width
data = mx.nd.array([[[[1,0],
                      [0,1]]]])

ctx1 = mx.gpu(0)
conv1 = nn.Conv2D(channels=3, kernel_size=(1,1))
conv1.initialize(ctx=ctx1)
data1 = data.as_in_context(ctx1)
out1 = conv1(data1)

ctx2 = mx.gpu(1)
conv2 = nn.Conv2D(channels=3, kernel_size=(1,1))
conv2.initialize(ctx=ctx2)
data2 = data.as_in_context(ctx2)
out2 = conv2(data2)

You should be aware of the transfer costs associated with moving data between GPUs though.

Topic		Replies	Views
Documentation Request: Model Parallelism Tutorial Performance	6	1847	March 10, 2018
Gluon CNN training with GPU inactive while ctx = mx.gpu(0) Gluon	2	1904	September 27, 2018
Multi-Threaded Inference Question	1	1013	July 4, 2019
The Gluon API framework mp Gluon	3	521	May 14, 2018
Single-machine multi-GPU training, time is not speeding up Gluon	5	2167	November 16, 2018

Parallel execution on GoogLeNet

Related Topics