Neural Collaborative Filtering for Personalized Ranking

https://d2l.ai/chapter_recommender-systems/neumf.html

Good example:)
Can we define a gluon.dataset like below for putting negative sampling inside train iter instead of calling this function per iteration? I think this may be helpful for simplifying the training progress and reducing memory usage especially dealing with larger dataset like movielens 20 million and other.

def __getitem__(self, idx):
    if idx % (self.nb_neg + 1) == 0:
        idx = idx // (self.nb_neg + 1)
        return self.data[idx][0], self.data[idx][1], np.ones(1, dtype=np.float32).item()
    else:
        idx = idx // (self.nb_neg + 1)
        u = self.data[idx][0]
        j = mx.random.randint(0, self.nb_items).asnumpy().item()
        while (u, j) in self.mat:
            j = mx.random.randint(0, self.nb_items).asnumpy().item()
        return u, j, np.zeros(1, dtype=np.float32).item()
1 Like

Thank you very much for providing the code, I will give it a test ! If it helps, we will revise it.:smiley:

1 Like

thank you so much. i will try

Hi, I think the ‘num_users’ in evaluate_ranking function should be ‘num_items’.

def evaluate_ranking(net, test_input, seq, candidates, num_users, num_items,
                 ctx):
    ranked_list, ranked_items, hit_rate, auc = {}, {}, [], []
    # all_items = set([i for i in range(num_users)])
    all_items = set([i for i in range(num_items)])

You have a typo - eastimating should be estimating. Your typo did however make me hungry :slight_smile:

Have a question for “negative sampling”. the book says

samples negative items randomly for each user from the candidate set of that user.

But the code says otherwise: all negative samples are drawn from exactly NOT from that candidate set:
list(self.all - set(self.cand[int(self.users[idx])]))

And all the formula for computing AUC is using I\S_u to exclude the candidate set.

How can I make sense of it?

Agreed.

According to

  1. the definition of function load_data_ml100k
  2. this line users_train, items_train, ratings_train, candidates = d2l.load_data_ml100k(train_data, num_users, num_items, feedback="implicit")

candidates should be the list of items users have interacted with.

Hi, Thanks for sharing the detailed information. Can you provide us with an example or how to do prediction once you built this model. May be we can use the movielens itself. It would mean a lot or just guide me to the some reference.

Thanks in advance :slight_smile: