MXNet Forum

Where to write NLP pre-processing steps at inference time?


Hi, for NLP models in mxnet or gluon that involve data pre-processing, where can the pre-processing code be written at inference time?


If the data processing is done per item in dataset, the accepted pattern is to do the processing in a transform function as part of the getitem function of the Dataset. For an example, see _DownloadedDataset.


thanks for the reply! I think the getitem option you mention refers to training?
My question was about inference time, when sentences are coming up one by one and needs to be passed into the model: who does the pre-processing?


Sorry I misread your question. Similar to how during training it is done outside of the model, it’d have to be done outside of the model and in this case, your script would simply have to write a function that processes your data (most likely in some sort of numpy-based preprocessor) before calling forward on model.