Mxnet build source code errorL Leaving directory: mxnet/3rdparty/ps-lite

build-error
unix-based

#21

Thanks, I do it now. I am looking forward to the results.


#22

It is the same problem. When the program meets “Leaving directory: /3rd/part/ps-lite”. it stops, no more information, no more errors.
The following is the log information:
https://drive.google.com/file/d/1H-HJfngypW2Ex-hcjMoFKsNMhbyjisPy/view?usp=sharing

Thanks.


#23

Confirming

  1. You’re compiling a clean MXNet, that is no modifications made to it (you mentioned you’re working on some C++ code)
  2. After compilation, are there files in lib/* that have been generated? Like libmxnet.so

If the answer is “yes” and then “no”, then there is something up with the build with your configuration. We can create an issue. It would be helpful to have the make -j1 log output. The reason for -j1 is that j8 runs 8 threads/processes so an error may be much further up the log. If you run j1 then the error will be right before the issue. Apologies for asking for another build output, but this is the last time :wink:

Kind regards,
Vishaal


#24

No problem, I am willing to try to find problems, thank you.

  1. There are some files in the ‘lib/’ folder, but there is no libmxnet.so; the files in the /lib/ folder are the following:
    engines-1.1 libffi.a libgomp.so liblzma.so libncurses.so.6.1 libprotobuf-lite.a libquadmath.so libstdc++.so.6 libtsan.so.0 tclooConfig.sh
    itcl4.1.1 libffi.la libgomp.so.1 liblzma.so.5 libncurses++w.a libprotobuf-lite.so libquadmath.so.0 libstdc++.so.6.0.25 libtsan.so.0.0.0 tdbc1.0.6
    libasan.so libffi.so libgomp.so.1.0.0 liblzma.so.5.2.4 libncursesw.a libprotobuf-lite.so.17 libquadmath.so.0.0.0 libtcl8.6.so libubsan.so tdbcmysql1.0.6
    libasan.so.5 libffi.so.6 libhistory.a libmenu.a libncursesw.so libprotobuf-lite.so.17.0.0 libreadline.a libtclstub8.6.a libubsan.so.1 tdbcodbc1.0.6
    libasan.so.5.0.0 libffi.so.6.0.4 libhistory.so libmenu.so libncursesw.so.6 libprotobuf.so libreadline.so libtinfo.a libubsan.so.1.0.0 tdbcpostgres1.0.6
    libatomic.so libform.a libhistory.so.7 libmenu.so.6 libncursesw.so.6.1 libprotobuf.so.17 libreadline.so.7 libtinfo.so libz.a terminfo
    libatomic.so.1 libform.so libhistory.so.7.0 libmenu.so.6.1 libpanel.a libprotobuf.so.17.0.0 libreadline.so.7.0 libtinfo.so.6 libz.so thread2.8.2
    libatomic.so.1.2.0 libform.so.6 libitm.so libmenuw.a libpanel.so libprotoc.a libsqlite3.a libtinfo.so.6.1 libz.so.1 tk8.6
    libcrypto.a libform.so.6.1 libitm.so.1 libmenuw.so libpanel.so.6 libprotoc.so libsqlite3.so libtinfow.a libz.so.1.2.11 tkConfig.sh
    libcrypto.so libformw.a libitm.so.1.0.0 libmenuw.so.6 libpanel.so.6.1 libprotoc.so.17 libsqlite3.so.0 libtinfow.so pkgconfig
    libcrypto.so.1.1 libformw.so liblsan.so libmenuw.so.6.1 libpanelw.a libprotoc.so.17.0.0 libsqlite3.so.0.8.6 libtinfow.so.6 python3.7
    libedit.a libformw.so.6 liblsan.so.0 libncurses.a libpanelw.so libpython3.7m.a libssl.a libtinfow.so.6.1 sqlite3.21.0
    libedit.so libformw.so.6.1 liblsan.so.0.0.0 libncurses++.a libpanelw.so.6 libpython3.7m.so libssl.so libtk8.6.so tcl8
    libedit.so.0 libgcc_s.so liblzma.a libncurses.so libpanelw.so.6.1 libpython3.7m.so.1 libssl.so.1.1 libtkstub8.6.a tcl8.6
    libedit.so.0.0.59 libgcc_s.so.1 liblzma.la libncurses.so.6 libprotobuf.a libpython3.7m.so.1.0 libstdc++.so libtsan.so tclConfig.sh

  2. I put the customized c++ layers in the src/operator and contrib/ folders

  3. I am compiling source code using make -j1

Let’s see what happened, thank you very much!


#25

Thanks,

'make -j1 will help you more easily determine what the error is for your case, but if you do want to do a smoke test, be sure that your build is working without any modifications (purely clean branch). :slight_smile:

Vishaal


#26

I get your point, I will do smoke test.
Let’s see what happen this time.
Thank you very much.


#27

I used ‘make -j1’, it shows the following error:
/bin/sh: /usr/local/cuda/bin/nvcc: No such file or directory
make: *** [build/src/operator/nn/cudnn/cudnn_batch_norm_gpu.o] Error 127

Before compiling, I add some modules using the following commands:
module add Anaconda3/python-3.6
module add CUDA/9.1.85
module add cuDNN/7.0.5-CUDA-9.1.85
module add OpenBLAS/0.2.19-GCC-5.4.0-2.26-LAPACK-3.7.0

For the config.mk:
USE_OPENCV = 0
USE_BLAS = openblas
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda
USE_CUDNN = 1
USE_NCCL = 0
USE_DIST_KVSTORE = 1

Any advice about that?
Thanks


#29

Great! nvcc should be found :slight_smile:

nvcc should be included with CUDA - I would recommend debugging your CUDA installation. You’ve installed 9.1, right? Is it installed in /usr/local/cuda or elsewhere? Are there files in /usr/local/cuda/bin/* ? Is nvcc in there? Can you run it if you type /usr/local/cuda/bin/nvcc? Maybe the CUDA installation in in a diff directory?

Vishaal


#30

I am checking CUDA and NVCC setting.
Actually, this cluster is managed by other people, I do not have sudo right to check CUDA installation, let me send email to manager first.
Waiting for news, Thanks for your help.


#31

My pleasure!

Vishaal


#32

The problem has been solved, thank you very much!