Sparse interaction dataloader


#1

I am building a ranking model that will train on implicit interaction data. Each record is a user/item pair where there was an interaction. I need to build an efficient gluon.data.DataLoader, that can sample negatives during training and evaluation.

My trainng/evaluation process will be based on the steps outlined in neural collaborative filtering (https://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf). In short:

  1. Load all interaction data
  2. Take the latest n interactions per user at the test set
  3. Reserve x negatives per user test interaction (in the paper n = 100)
  4. Build a train dataloader that randomly samples X negatives per train interaction (in the paper X=4) each time a batch is fed to the network. These negatives must not be those reserved in 3.
  5. Build a test dataloader that feeds the negatives from 3 and the positives from 2 into the network.

I’ve done the above with an MXNet iterator (https://github.com/opringle/collaborative_filtering/blob/master/libs/iterators.py).

I’m looking for guidance on how to best leverage the gluon data api in order to do this. I get the feeling this can be done in far less code by leveraging samplers, gluon datasets & mxnet sparse ND arrays (https://mxnet.incubator.apache.org/tutorials/sparse/csr.html)


#2

Hi @opringle,

I think a combination of a custom dataset and sampler would do the trick for your problem.

Gluon-nlp has a bunch of complex sampling and batchification classes that you might find useful: