Unable to perform inference on arbitrary-size input


#1

(Edited to emphasize that the issue is independent of MMS.)

I’m writing a simple NLP classifier using MXNet, and I’d like to be able to send an arbitrary number of vectors for prediction.

I created the model with Reshape layers having 0 as the leading dimension, in order to accommodate an unknown batch size.

However, at inference time, I have to bind the data_shapes like so, which is where it errors out:

mod = mx.mod.Module(symbol=sym, context=mx.gpu(0))
mod.bind(for_training=False, data_shapes=[('data', (0, 20))], label_shapes=[('softmax_label', (0, ))])
mod.set_params(arg_params, aux_params, allow_missing=True, allow_extra=True)

ValueError: Too many slices. Some splits are empty.

This works fine if I change the leading dimension to a non-zero value (and send in a correspondingly-sized tensor). But I don’t always know the size of the incoming inferencing request.

How do I accomplish serving an arbitrarily-sized prediction request?

Thank you.


#2

Hey Samir,
Just wanted to understand this the problem a little better. You have this code in the “inference” method and not the “init()” right? If yes, why do you have to bind to “0”, when the inference method is called you would have the non-zero value for batch size right? Why would you need to bind to 0?


#3

@vamshidhardk – thank you for responding.

I’m invoking the Module constructor as well as bind() in some method like init() – which is invoked only once when the prediction server starts up. (I’m assuming that it is expensive to create and/or bind() a Module for every inference request – is that a correct assumption?)

Then in the predict() method, I invoke forward() with a DataBatch that wraps the incoming data. The problem is that the incoming data may have any number of vectors, and because I prepare the Module in init() only once – i.e., I call bind() only once – I have to specify a fixed number of dimensions up front.

So what I’m looking for is the equivalent of None or -1 (in NumPy / TensorFlow) as a placeholder dimension so that I can supply any number of inputs to the prediction function.

Are you saying it is ok (and/or recommended) to invoke bind() with the dimensions of the incoming request every time?

Thank you.


#4

Hi Samir,
I just wanted to understand the problem a little better. I am not suggesting the "bind"ing for every inference is the only option.

But, I can only see two options here.

  1. Re-bind for every request. This might be less efficient depending on the model size itself. But for smaller models this might be very efficient.
  2. Make a fixed size batch and pad it if you don’t receive the enough requests to fill the data batch.

These are the methods I am aware of, I would like to learn if there are other better options as well :slight_smile:


#5

Thanks again @vamshidhardk – appreciate your input.

Unfortunately, when I try to re-bind(), it throws an error saying that the module is already bound. So the only way to re-bind() is to re-load the model (i.e., re-initialize the module), and that is expensive. (I also tried reshape(), but that didn’t work either.)

Your other suggestion (padding) is probably the only viable option I know.

I hope there is another way … because the available options are not satisfactory.

Thank you.