in some servers, one server will have more than one netcards.
mxnet distribution code can’t choose right netcard by ip and transfer data.
here, i propose an idea which is to add one more column into the hosts file. the second column is netcard name.
such as follows:
1 10.1.3.3 p6p1
2 10.1.3.104 p15p1
3 10.1.3.102 p15p1
4 10.1.3.2 p6p1
5 10.1.3.101 p15p1
6 10.1.3.4 p6p1
i did some changes in distribution codes and let it be able to choose right netcard to transfer data by cardname.
the code is simple, see the following link form my github, search the key word
This file has been truncated.
DMLC submission script by ssh
One need to make sure all slaves machines are ssh-able.
from __future__ import absolute_import
import os, subprocess, logging
from threading import Thread
from . import tracker
def sync_dir(local_dir, slave_node, slave_dir):
sync the working directory from root node into slave node
remote = slave_node + ':' + slave_dir
logging.info('rsync %s -> %s', local_dir, remote)
prog = 'rsync -az --rsh="ssh -o StrictHostKeyChecking=no -p %s" --exclude-from %s %s %s' % (
slave_node, local_dir + './exclude.list', local_dir, remote)