How does mxnet allocate computation tasks to the devices

yemao · March 28, 2019, 3:57am

We know that the computation tasks are conducted by the dependency engine, however, for the code in base_module.py, as shown in the following figure.. How does mxnet allocate computation and communication tasks to the devices in each iteration? Does the allocation of tasks in the next iteration require the completion of all tasks in the last iteration?

thomelane · April 4, 2019, 12:35am

Operations from the frontend language (e.g. Python) are queued for processing by the backend engine. Computation location will depend on the location of the input data for a given operator (i.e. CPU or specific GPU device). Computation ordering will depend on the dependencies, and this also enables parallel processing (if certain operators are independent).

thomelane · April 4, 2019, 12:41am

It depends. If operations in one iteration depend on all computation from the previous iteration, then yes. One example would be training a neural network that updates its parameters after each batch iteration. You could certainly have cases where this isn’t the case though. You can keep queuing operations from the frontend though, even though the backend hasn’t finished processing the previous iterations. And often it’s a good idea to have a blocking operation in the frontend (such as updating a metric or logging out the loss) after each iteration to prevent too many operation being queued (and running out of memory).

yemao · April 4, 2019, 1:00am

Thank you, thomelane. After reading the code about implementing the dependency engine in MXNet, I figure out the relationships between different iterations. It is exactly what you are talking about, and thanks again for your kindness.

Topic		Replies	Views
Overlap gradient communication with backward pass Performance	2	583	January 16, 2019
Parallelize Operators Performance	0	321	August 3, 2020
Best practices for prediction on a machine with multiple GPUs	3	1193	November 8, 2017
Parallel execution on GoogLeNet Discussion	1	467	March 30, 2018
How MxNet average the workload among different workers while distributed training? Discussion	0	222	February 23, 2021

How does mxnet allocate computation tasks to the devices

Related Topics