First time MXNet user here.
I am trying to deploy a Seq2Seq model using Sockeye which uses MXNet. At inference time, the model seems pretty quick except for when it sees a piece of text which is longer than the text blobs it has seen before.
I figured out a work-around for this by initializing the model at the beginning with texts of varying length that I would expect at inference time - small to large.
It did the job, but now the model initialization time seems too long for my use case.
I am wondering how can I initialize the model with a custom set of text data and serialize the cache and carry it along the model to deployment.
Then, I would just need to a. load the architecture + b. load the weights + c. load the cache.
Has anybody tried anything like this? Or have an opinion whether his would work (reduce the model initialization time)? Or any other ideas to do the same?
I am open to suggestions.
If have tried something like this or have come across cases where this has been tried, can you point me towards it?