I wrote the following custom layer:
def __init__(self, temp=1): super(HybridGumbleSoftmax, self).__init__() self.temp = temp def hybrid_forward(self, F, x): noise = F.random_uniform() # G = μ − log(− log(U )) noise = F.negative((noise.__add__(1e-10).log())) noise = F.negative((noise.__add__(1e-10).log())) gumble_trick_log_prob_samples = x + noise soft_samples = F.softmax(gumble_trick_log_prob_samples / self.temp, axis=1) return soft_samples
This does what I want if call hybridize() before training. However, I am unable to modify this to get a HybridBlock that can run in both symbolic and imperative mode. (I also have one that can do only imperative mode by using the ndarray API, but thats also not what I want). If I just run the code above without hybridize, the result is
mxnet.base.MXNetError: [21:02:32] ../src/imperative/./imperative_utils.h:122: Check failed: infershape[attrs.op](attrs, &in_shapes, &out_shapes)
Now if I edit the call to add the shape parameter to the random uniform call:
And try to run it on the gpu, I get an error in the add function:
mxnet.base.MXNetError: [21:11:58] ../src/imperative/./imperative_utils.h:70: Check failed: inputs[i]->ctx().dev_mask() == ctx.dev_mask() (1 vs. 2) Operator broadcast_add require all inputs live on the same context. But the first argument is on gpu(0) while the 2-th argument is on cpu(0)
Which really astonishes me. I can not just call
__add__ with an hardcoded scalar? How else would I do that, as I cant hardcode the creation of an nd.array or of an symbol there, as I would lose the hybride property again
But I guess the first error (infer shape) is more relevant here, as I cannot do
x.shape with symbols anyway.
I actually know the exact shape of the noise that I need, but passing a tuple for the shape argument results in
Deferred initialization failed because shape cannot be inferred. if I hybridize and again the different context error in the plus operator if I do not hybridize.
I am puzzled why mxnet can infer the shape with symbols, but not if I use ndarrays. Can I somehow specify shapes (similiar to custom operator property class) for hybrid blocks? Or is there something else wrong with the above code for imperative mode?
Please let me know if I did not provide necessary information like full stack traces. Any help is much appreciated.