I am building a ranking model that will train on implicit interaction data. Each record is a user/item pair where there was an interaction. I need to build an efficient
gluon.data.DataLoader, that can sample negatives during training and evaluation.
My trainng/evaluation process will be based on the steps outlined in neural collaborative filtering (https://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf). In short:
- Load all interaction data
- Take the latest n interactions per user at the test set
- Reserve x negatives per user test interaction (in the paper n = 100)
- Build a train dataloader that randomly samples X negatives per train interaction (in the paper X=4) each time a batch is fed to the network. These negatives must not be those reserved in 3.
- Build a test dataloader that feeds the negatives from 3 and the positives from 2 into the network.
I’ve done the above with an MXNet iterator (https://github.com/opringle/collaborative_filtering/blob/master/libs/iterators.py).
I’m looking for guidance on how to best leverage the gluon data api in order to do this. I get the feeling this can be done in far less code by leveraging samplers, gluon datasets & mxnet sparse ND arrays (https://mxnet.incubator.apache.org/tutorials/sparse/csr.html)