Runtime errors during multi-label learning using Ubuntu, but not Mac


#1

I created a block that returns two output layers:

import mxnet as mx
from mxnet import gluon, nd, autograd

class DummyBlock(gluon.Block):
    def __init__(self, **kwargs):
        super(DummyBlock, self).__init__(**kwargs)
        with self.name_scope():
            self.fc1 = gluon.nn.Dense(5)
            self.fc2 = gluon.nn.Dense(5)

    def forward(self, x):
        z1 = self.fc1(x)
        z2 = self.fc2(x)
        return (z1, z2)

Then, I created a dummy dataset for multi-label classification:

X = nd.array([
    [[1, 0, 0, 0, 0], [0, 1, 0, 0, 0]],
    [[0, 1, 0, 0, 0], [1, 0, 0, 0, 0]],
    [[0, 0, 1, 0, 0], [0, 0, 0, 0, 1]],
    [[0, 0, 0, 1, 0], [0, 0, 1, 0, 0]],
    [[0, 0, 0, 0, 1], [0, 0, 0, 1, 0]]
])

Y = nd.array([
    [0, 1],
    [1, 0],
    [2, 4],
    [3, 2],
    [4, 3]
])

Basically, I’m expecting self.f1 and self.f2 would predict the first and the second columns of Y, respectively. I tested the model using the following code:

ctx = mx.cpu()
net = DummyBlock()
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

batch_size = 2
loss_func = gluon.loss.SoftmaxCrossEntropyLoss()
data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, Y), batch_size=batch_size)

for e in range(11):
    for x, y in data:
        x = x.as_in_context(ctx)
        y = y.as_in_context(ctx)
        with autograd.record():
            output = net(x)
            for i, o in enumerate(output):
                l = nd.slice_axis(y, axis=1, begin=i, end=i+1)
                loss = loss_func(o, l)
                loss.backward()
        trainer.step(x.shape[0])

    acc = mx.metric.Accuracy()
    for x, y in data:
        x = x.as_in_context(ctx)
        y = y.as_in_context(ctx)
        output = net(x)
        for i, o in enumerate(output):
            l = nd.slice_axis(y, axis=1, begin=i, end=i+1)
            p = nd.argmax(o, axis=1)
            acc.update(preds=p, labels=l)
    print(acc.get()[1])

This ran fine on my Mac and showed the following accuracies:

0.5
0.6
0.7
0.8
0.8
0.8
0.9
0.9
0.9
0.9
1.0
1.0

However, when I tested it on Ubuntu 16.04 using Python 3.5.2, I got the following errors:

[04:39:01] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [04:39:01] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/./any.h:286: Check failed: type_ != nullptr The any container is empty requested=N5mxnet10Imperative6AGInfoE

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x272c4c) [0x7f2bd962dc4c]
[bt] (1) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2146ebc) [0x7f2bdb501ebc]
[bt] (2) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x213f29e) [0x7f2bdb4fa29e]
[bt] (3) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(MXAutogradBackwardEx+0x778) [0x7f2bdb44e378]
[bt] (4) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f2bf0ee9e20]
[bt] (5) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f2bf0ee988b]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f2bf0ee401a]
[bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f2bf0ed7fcb]
[bt] (8) python3(PyObject_Call+0x47) [0x5b5da7]
[bt] (9) python3(PyEval_EvalFrameEx+0x4eb6) [0x528956]

Traceback (most recent call last):
  File "elit-dev/elit/component/postag.py", line 370, in <module>
    loss.backward()
  File "/home/ubuntu/.local/lib/python3.5/site-packages/mxnet/ndarray/ndarray.py", line 1761, in backward
    ctypes.c_void_p(0)))
  File "/home/ubuntu/.local/lib/python3.5/site-packages/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [04:39:01] /home/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/./any.h:286: Check failed: type_ != nullptr The any container is empty requested=N5mxnet10Imperative6AGInfoE

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x272c4c) [0x7f2bd962dc4c]
[bt] (1) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x2146ebc) [0x7f2bdb501ebc]
[bt] (2) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(+0x213f29e) [0x7f2bdb4fa29e]
[bt] (3) /home/ubuntu/.local/lib/python3.5/site-packages/mxnet/libmxnet.so(MXAutogradBackwardEx+0x778) [0x7f2bdb44e378]
[bt] (4) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f2bf0ee9e20]
[bt] (5) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f2bf0ee988b]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f2bf0ee401a]
[bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f2bf0ed7fcb]
[bt] (8) python3(PyObject_Call+0x47) [0x5b5da7]
[bt] (9) python3(PyEval_EvalFrameEx+0x4eb6) [0x528956]

Could someone explain what kind of configurations I’m missing on Ubuntu to cause these errors? Thank you!


#2

After a few investigation, I’m suspecting if it is caused by the different version of cpython as the original of the error is caused by _ctypes.cpython-35m-x86_64-linux-gnu.so. I would very appreciate if anyone gives a hint for making this work on Ubuntu. Thank you!


#3

This is similar to an issue we addressed last week. What version of MXNet are you using? If you do pip install mxnet --pre is the error still there?