Mxnet build source code errorL Leaving directory: mxnet/3rdparty/ps-lite

Sorry about unclear explanation.
First, when I try to compile MXNET source code, I meet the following problem:
make[1]: Leaving directory `/home/lshi22/mxnet/3rdparty/ps-lite’
make: *** wait: No child processes. Stop.

From here, the program stops, no more information.

Second, I used “cd python, python setup.py install” to install mxnet, it shows the following error:
Traceback (most recent call last):
File “setup.py”, line 46, in
LIB_PATH = libinfo’find_lib_path’
File “mxnet/libinfo.py”, line 74, in find_lib_path
‘List of candidates:\n’ + str(’\n’.join(dll_path)))
RuntimeError: Cannot find the MXNet library.
List of candidates:
/project/cacds/apps/easybuild/software/OpenBLAS/0.2.19-GCC-5.4.0-2.26-LAPACK-3.7.0/lib/libmxnet.so
/project/cacds/apps/easybuild/software/binutils/2.26-GCCcore-5.4.0/lib/libmxnet.so
/project/cacds/apps/easybuild/software/GCCcore/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/libmxnet.so
/project/cacds/apps/easybuild/software/GCCcore/5.4.0/lib64/libmxnet.so
/project/cacds/apps/easybuild/software/GCCcore/5.4.0/lib/libmxnet.so
/project/cacds/apps/easybuild/software/cuDNN/7.0.5-CUDA-9.1.85/lib64/libmxnet.so
/project/cacds/apps/easybuild/software/CUDA/9.1.85/extras/CUPTI/lib64/libmxnet.so
/project/cacds/apps/easybuild/software/CUDA/9.1.85/lib/libmxnet.so
/project/cacds/apps/easybuild/software/CUDA/9.1.85/lib64/libmxnet.so
/home//mxnet/python/mxnet/libmxnet.so
/home//mxnet/python/mxnet/…/…/lib/libmxnet.so
/home//mxnet/python/mxnet/…/…/build/libmxnet.so
…/…/…/libmxnet.so

So I can not import mxnet, because there is not MXNET library.
Any advice will be appreciated, thanks

I presume there are no files in lib/* Would you be able to include your entire build log as an attachment.

Thanks!

Thanks for your reply.
The following link is a google drive folder, it has my log file. Because my screen only shows last part of log information:
https://drive.google.com/file/d/1mnp7drLxKz_PivcpVupJMMtZjLRAFCiD/view?usp=sharing

The following files exist in /lib/ folder:
engines-1.1 libffi.a libgomp.so liblzma.so libncurses.so.6.1 libprotobuf-lite.a libquadmath.so libstdc++.so.6 libtsan.so.0 tclooConfig.sh
itcl4.1.1 libffi.la libgomp.so.1 liblzma.so.5 libncurses++w.a libprotobuf-lite.so libquadmath.so.0 libstdc++.so.6.0.25 libtsan.so.0.0.0 tdbc1.0.6
libasan.so libffi.so libgomp.so.1.0.0 liblzma.so.5.2.4 libncursesw.a libprotobuf-lite.so.17 libquadmath.so.0.0.0 libtcl8.6.so libubsan.so tdbcmysql1.0.6
libasan.so.5 libffi.so.6 libhistory.a libmenu.a libncursesw.so libprotobuf-lite.so.17.0.0 libreadline.a libtclstub8.6.a libubsan.so.1 tdbcodbc1.0.6
libasan.so.5.0.0 libffi.so.6.0.4 libhistory.so libmenu.so libncursesw.so.6 libprotobuf.so libreadline.so libtinfo.a libubsan.so.1.0.0 tdbcpostgres1.0.6
libatomic.so libform.a libhistory.so.7 libmenu.so.6 libncursesw.so.6.1 libprotobuf.so.17 libreadline.so.7 libtinfo.so libz.a terminfo
libatomic.so.1 libform.so libhistory.so.7.0 libmenu.so.6.1 libpanel.a libprotobuf.so.17.0.0 libreadline.so.7.0 libtinfo.so.6 libz.so thread2.8.2
libatomic.so.1.2.0 libform.so.6 libitm.so libmenuw.a libpanel.so libprotoc.a libsqlite3.a libtinfo.so.6.1 libz.so.1 tk8.6
libcrypto.a libform.so.6.1 libitm.so.1 libmenuw.so libpanel.so.6 libprotoc.so libsqlite3.so libtinfow.a libz.so.1.2.11 tkConfig.sh
libcrypto.so libformw.a libitm.so.1.0.0 libmenuw.so.6 libpanel.so.6.1 libprotoc.so.17 libsqlite3.so.0 libtinfow.so pkgconfig
libcrypto.so.1.1 libformw.so liblsan.so libmenuw.so.6.1 libpanelw.a libprotoc.so.17.0.0 libsqlite3.so.0.8.6 libtinfow.so.6 python3.7
libedit.a libformw.so.6 liblsan.so.0 libncurses.a libpanelw.so libpython3.7m.a libssl.a libtinfow.so.6.1 sqlite3.21.0
libedit.so libformw.so.6.1 liblsan.so.0.0.0 libncurses++.a libpanelw.so.6 libpython3.7m.so libssl.so libtk8.6.so tcl8
libedit.so.0 libgcc_s.so liblzma.a libncurses.so libpanelw.so.6.1 libpython3.7m.so.1 libssl.so.1.1 libtkstub8.6.a tcl8.6
libedit.so.0.0.59 libgcc_s.so.1 liblzma.la libncurses.so.6 libprotobuf.a libpython3.7m.so.1.0 libstdc++.so libtsan.so tclConfig.sh

Thank you very much!

I noticed you’re hitting a compiler warning in ps-lite which was fixed Sept 28 2018 https://github.com/dmlc/ps-lite/commit/aa084a214550c30f01a6be9568de38e53424d015#diff-105837db4c293d1386590506cef16eac meaning your repo might be old. You might try re-cloning and building with a current repo. Out of curiosity, there are pre-built packages available in many flavors, any reason not to use those?

Vishaal

Thanks for your reply. I am waiting for your reply every day. :grin: .

I use the following commands to clone mxnet:
git clone https://github.com/apache/incubator-mxnet mxnet
cd mxnet
git checkout 1.3.1
git submodule update --init --recursive
Do you think it has a problem?

The reason I compile the mxnet from source code is that there are some c++ customized layers that I want to add to mxnet and test them. And I always use pip to install MXNET.

lol, no problem :slight_smile:

That git command looks good.

git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet should work as well. Be sure to check each step against http://mxnet.incubator.apache.org/versions/master/install/ubuntu_setup.html just to be safe . And we’ll see what happens this time :slight_smile:

Vishaal

Thanks, I do it now. I am looking forward to the results.

It is the same problem. When the program meets “Leaving directory: /3rd/part/ps-lite”. it stops, no more information, no more errors.
The following is the log information:
https://drive.google.com/file/d/1H-HJfngypW2Ex-hcjMoFKsNMhbyjisPy/view?usp=sharing

Thanks.

Confirming

  1. You’re compiling a clean MXNet, that is no modifications made to it (you mentioned you’re working on some C++ code)
  2. After compilation, are there files in lib/* that have been generated? Like libmxnet.so

If the answer is “yes” and then “no”, then there is something up with the build with your configuration. We can create an issue. It would be helpful to have the make -j1 log output. The reason for -j1 is that j8 runs 8 threads/processes so an error may be much further up the log. If you run j1 then the error will be right before the issue. Apologies for asking for another build output, but this is the last time :wink:

Kind regards,
Vishaal

No problem, I am willing to try to find problems, thank you.

  1. There are some files in the ‘lib/’ folder, but there is no libmxnet.so; the files in the /lib/ folder are the following:
    engines-1.1 libffi.a libgomp.so liblzma.so libncurses.so.6.1 libprotobuf-lite.a libquadmath.so libstdc++.so.6 libtsan.so.0 tclooConfig.sh
    itcl4.1.1 libffi.la libgomp.so.1 liblzma.so.5 libncurses++w.a libprotobuf-lite.so libquadmath.so.0 libstdc++.so.6.0.25 libtsan.so.0.0.0 tdbc1.0.6
    libasan.so libffi.so libgomp.so.1.0.0 liblzma.so.5.2.4 libncursesw.a libprotobuf-lite.so.17 libquadmath.so.0.0.0 libtcl8.6.so libubsan.so tdbcmysql1.0.6
    libasan.so.5 libffi.so.6 libhistory.a libmenu.a libncursesw.so libprotobuf-lite.so.17.0.0 libreadline.a libtclstub8.6.a libubsan.so.1 tdbcodbc1.0.6
    libasan.so.5.0.0 libffi.so.6.0.4 libhistory.so libmenu.so libncursesw.so.6 libprotobuf.so libreadline.so libtinfo.a libubsan.so.1.0.0 tdbcpostgres1.0.6
    libatomic.so libform.a libhistory.so.7 libmenu.so.6 libncursesw.so.6.1 libprotobuf.so.17 libreadline.so.7 libtinfo.so libz.a terminfo
    libatomic.so.1 libform.so libhistory.so.7.0 libmenu.so.6.1 libpanel.a libprotobuf.so.17.0.0 libreadline.so.7.0 libtinfo.so.6 libz.so thread2.8.2
    libatomic.so.1.2.0 libform.so.6 libitm.so libmenuw.a libpanel.so libprotoc.a libsqlite3.a libtinfo.so.6.1 libz.so.1 tk8.6
    libcrypto.a libform.so.6.1 libitm.so.1 libmenuw.so libpanel.so.6 libprotoc.so libsqlite3.so libtinfow.a libz.so.1.2.11 tkConfig.sh
    libcrypto.so libformw.a libitm.so.1.0.0 libmenuw.so.6 libpanel.so.6.1 libprotoc.so.17 libsqlite3.so.0 libtinfow.so pkgconfig
    libcrypto.so.1.1 libformw.so liblsan.so libmenuw.so.6.1 libpanelw.a libprotoc.so.17.0.0 libsqlite3.so.0.8.6 libtinfow.so.6 python3.7
    libedit.a libformw.so.6 liblsan.so.0 libncurses.a libpanelw.so libpython3.7m.a libssl.a libtinfow.so.6.1 sqlite3.21.0
    libedit.so libformw.so.6.1 liblsan.so.0.0.0 libncurses++.a libpanelw.so.6 libpython3.7m.so libssl.so libtk8.6.so tcl8
    libedit.so.0 libgcc_s.so liblzma.a libncurses.so libpanelw.so.6.1 libpython3.7m.so.1 libssl.so.1.1 libtkstub8.6.a tcl8.6
    libedit.so.0.0.59 libgcc_s.so.1 liblzma.la libncurses.so.6 libprotobuf.a libpython3.7m.so.1.0 libstdc++.so libtsan.so tclConfig.sh

  2. I put the customized c++ layers in the src/operator and contrib/ folders

  3. I am compiling source code using make -j1

Let’s see what happened, thank you very much!

Thanks,

'make -j1 will help you more easily determine what the error is for your case, but if you do want to do a smoke test, be sure that your build is working without any modifications (purely clean branch). :slight_smile:

Vishaal

I get your point, I will do smoke test.
Let’s see what happen this time.
Thank you very much.

I used ‘make -j1’, it shows the following error:
/bin/sh: /usr/local/cuda/bin/nvcc: No such file or directory
make: *** [build/src/operator/nn/cudnn/cudnn_batch_norm_gpu.o] Error 127

Before compiling, I add some modules using the following commands:
module add Anaconda3/python-3.6
module add CUDA/9.1.85
module add cuDNN/7.0.5-CUDA-9.1.85
module add OpenBLAS/0.2.19-GCC-5.4.0-2.26-LAPACK-3.7.0

For the config.mk:
USE_OPENCV = 0
USE_BLAS = openblas
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda
USE_CUDNN = 1
USE_NCCL = 0
USE_DIST_KVSTORE = 1

Any advice about that?
Thanks

Great! nvcc should be found :slight_smile:

nvcc should be included with CUDA - I would recommend debugging your CUDA installation. You’ve installed 9.1, right? Is it installed in /usr/local/cuda or elsewhere? Are there files in /usr/local/cuda/bin/* ? Is nvcc in there? Can you run it if you type /usr/local/cuda/bin/nvcc? Maybe the CUDA installation in in a diff directory?

Vishaal

I am checking CUDA and NVCC setting.
Actually, this cluster is managed by other people, I do not have sudo right to check CUDA installation, let me send email to manager first.
Waiting for news, Thanks for your help.

My pleasure!

Vishaal

The problem has been solved, thank you very much!

Hey hdjsjyl,
How did you solve the problem?? I’m having the same error here
Thanks

Hi gabrielkoyama,
Please using “make -j1” to check what is the problem. I don’t remember the problem clearly. Thanks

Thanks for your reply!

I’m trying to install mxnet to use with FCIS, so i’m following these instructions:

git clone --recursive github. com/dmlc/mxnet.git
git checkout 998378a
git submodule init
git submodule update

cp -r FCIS_ROOT/fcis/operator_cxx/channel_operator* MXNET_ROOT/src/operator/contrib/

And then,

make -j1 with

USE_OPENCV=1
USE_BLAS=openblas
USE_CUDA=1
USE_CUDA_PATH=/usr/local/cuda
USE_CUDNN=1

Output:

src/operator/./cudnn_rnn-inl.h(435): error: argument of type “cudnnRNNDescriptor_t” is incompatible with parameter of type “cudnnHandle_t”
detected during:
instantiation of “void mxnet::op::CuDNNRNNOp::Forward(const mxnet::OpContext &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &, const std::vector<mxnet::OpReqType, std::allocator< mxnet::OpReqType>> &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &, const std::vector<mxnet::TBlob, std::allocator< mxnet::TBlob>> &) [with DType=float]”
(54): here
instantiation of “mxnet::op::CuDNNRNNOp::CuDNNRNNOp(mxnet::op::RNNParam) [with DType=float]”
src/operator/rnn.cu(20): here

8 errors detected in the compilation of “/tmp/tmpxft_00003b43_00000000-11_rnn.compute_61.cpp1.ii”.
Makefile:274: recipe for target ‘build/src/operator/rnn_gpu.o’ failed
make: *** [build/src/operator/rnn_gpu.o] Error 1

I’m using cuda 10.2 and cudnn 7.6.5, could be version error?

Thank you.