Thanks for you precious answer!
I have tried to open all TCP ports and setting the PS_VERBOSE to 2.
Then I see the logging command in train() is called successfully, and there are barrier command from 2 server and 2 worker with rank (8,9,10,11). (just like the video shows)
However, when the training begin, I can see the info about decoding the data set, then the program is get stuck.
Here is the last part of the logging info:
[05:08:12] src/van.cc:398: 10 => 1. Meta: request=1, timestamp=1, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 1
[05:08:12] src/van.cc:373: ? => 1. Meta: request=1, timestamp=4, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:398: 1 => 1. Meta: request=1, timestamp=4, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 2
[05:08:12] src/van.cc:398: 8 => 1. Meta: request=1, timestamp=1, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 3
[05:08:12] src/van.cc:398: 11 => 1. Meta: request=1, timestamp=1, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 4
[05:08:12] src/van.cc:398: 9 => 1. Meta: request=1, timestamp=1, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 5
[05:08:12] src/van.cc:373: ? => 9. Meta: request=0, timestamp=5, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:373: ? => 11. Meta: request=0, timestamp=6, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:373: ? => 8. Meta: request=0, timestamp=7, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:373: ? => 10. Meta: request=0, timestamp=8, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:373: ? => 1. Meta: request=0, timestamp=9, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:398: 1 => 1. Meta: request=0, timestamp=9, control={ cmd=BARRIER, barrier_group=1 }
[05:08:12] src/van.cc:373: ? => 1. Meta: request=1, timestamp=10, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:398: 1 => 1. Meta: request=1, timestamp=10, control={ cmd=BARRIER, barrier_group=7 }
[05:08:12] src/van.cc:183: Barrier count for 7 : 1
INFO:root:test logger
INFO:root:[debug]running get_model()
INFO:root:[debug]Ready to call main()
INFO:root:[debug]Calling main
INFO:root:Calling train()
[05:08:12] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar/train.rec, use 1 threads for decoding…
[05:08:13] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar/test.rec, use 1 threads for decoding…
[05:08:14] src/van.cc:398: 11 => 1. Meta: request=1, timestamp=2, control={ cmd=BARRIER, barrier_group=4 }
[05:08:14] src/van.cc:183: Barrier count for 4 : 1