Python CustomOp with auxiliary states


#1

Does anyone have any experience with implementing CustomOps in Python that have auxiliary states? I haven’t been able to find any examples or clear documentation on this feature.

I have been able to implement a CustomOp that computes a running statistic of its input and saves in a parameter. It has the correct behavior except that (a) it blows up in Symbol graphs (gluon hybridize) probably due to bug 8312 and (b) I get random crashes when the python shell shuts down.

Not sure whether I am doing something wrong or whether this is not yet a usable feature (or both).


#2

The random crashes appear to be the result of using more than one auxiliary state (bug 8640)


#3

Your init method should be changed to this:

     def __init__(self):
      super(BadAuxProp, self).__init__(need_top_grad=False)

I was not able to reproduce your issue after I changed it.


#4

Unfortunately, that doesn’t fix the problem. Note that sometimes you have to run the program several times before you see the crash. For instance, I ran it 13 times in a row without a crash and then it crashed on the 14th.

I see no crashes when I use only one auxiliary state.


#5

This is a known issue. I’m working on a fix here: https://github.com/apache/incubator-mxnet/pull/8637
You can try it.


#6

Thanks! I will check it out when I get some time…


#7

I built that pull request and it did stop the crash, but I see something strange behavior in NDArray with this build.

There appears to be a rounding difference between assignment from a numpy scalar float32 and an array float32:

>>> import numpy as np
>>> from mxnet import nd
>>> a = np.array([47.844944], dtype=np.float32)
>>> b = nd.zeros(1,dtype=np.float32)
>>> b[0] = a
>>> b

[ 47.844944]
<NDArray 1 @cpu(0)>
>>> b[0] = a[0]
>>> b

[ 47.84489822]
<NDArray 1 @cpu(0)>

I do not see this using my currently installed mxnet build (0.12.1b20171113).

This does not appear to be related to your fix, so I assume this is either due to some problem with my build or a bug in the code line from which the pull request was started.


#8

Hmm. Just installed 0.12.1b20171115 and see the same rounding problem there.