Sparse interaction dataloader

opringle · November 21, 2018, 5:57pm

I am building a ranking model that will train on implicit interaction data. Each record is a user/item pair where there was an interaction. I need to build an efficient gluon.data.DataLoader, that can sample negatives during training and evaluation.

My trainng/evaluation process will be based on the steps outlined in neural collaborative filtering (https://www.comp.nus.edu.sg/~xiangnan/papers/ncf.pdf). In short:

Load all interaction data
Take the latest n interactions per user at the test set
Reserve x negatives per user test interaction (in the paper n = 100)
Build a train dataloader that randomly samples X negatives per train interaction (in the paper X=4) each time a batch is fed to the network. These negatives must not be those reserved in 3.
Build a test dataloader that feeds the negatives from 3 and the positives from 2 into the network.

I’ve done the above with an MXNet iterator (https://github.com/opringle/collaborative_filtering/blob/master/libs/iterators.py).

I’m looking for guidance on how to best leverage the gluon data api in order to do this. I get the feeling this can be done in far less code by leveraging samplers, gluon datasets & mxnet sparse ND arrays (https://mxnet.incubator.apache.org/tutorials/sparse/csr.html)

ThomasDelteil · November 23, 2018, 6:44pm

Hi @opringle,

I think a combination of a custom dataset and sampler would do the trick for your problem.

Gluon-nlp has a bunch of complex sampling and batchification classes that you might find useful:

Sampled blocks for Noise Contractive Estimation and Importanec Sampling https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py
Examples of batchification function and sampling function: https://github.com/dmlc/gluon-nlp/tree/master/src/gluonnlp/data/batchify https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/data/sampler.py

Topic		Replies	Views
Distributed Training / Model Parallelism with sparse embeddings in Gluon Gluon	2	537	June 19, 2019
Loading sparse data into gluon's DataLoader? Gluon	2	515	December 1, 2019
Gluon NLP Batchify Gluon	1	389	November 26, 2019
Feeding sparse data to DataLoader Gluon	1	558	November 7, 2018
Guidance for big data loading with MXNet Performance	1	1369	October 17, 2018

Sparse interaction dataloader

Related Topics