Rank


Learn to rank or ranking
pointwise. each doc/query pair gets a score. utimately we sort by score. and final performance is given by ndcg
Linear regression get features and produce the score.
pairwise
Rank net: orders each pair
Lambda rank: orders each pair, gives smaller push for documents that are way down the list 
Listwise: gets a whole list of docs
LambdaMart: uses decision tree to rank all docs. decision tree continues to split, until each document is at its own node

On a dataset, we have features, and score 1-4, 1: very relevant, 4 not relevant. 





Why is it different from classification or regression?
- classify relevant from non-relevant
- if you evaluate the quality of documents for a query 0-10, then you can use regression to rank: point-wise learn to rank



Metrics
MAP - mean average precision
MRR - mean reciprocal rank
NDCG - normalized discounted cumulative gain
Expected reciprocal rank
Kendall's tau (pairwise comparison)

Explicit feedback: hire judges, expensive, is different from real users
Implicit feedback: user engagements, noisy, already has position bias. As you show results tp the user, user have bias to pick first shown items, so if you use this for training, it becomes an echo chamber with self fulfilling profecy
- randomly show results to 1% of users to get unbiased feedback
- remove unseen data e.g. remove items that occur after the item that were clicked on. probbaly the user didn't see them?
- estimate position bias weight, use inverse as sample weights
- use position as a feature to the model ( Google's rules of ML)


Don't just use features that are about the curent document only. have context aware features (e.g. average price of other items that are in the candidate list)

lack of diversity
- penalize similar results: regullarize based on pairwise similarity / distance matrix
- re-ranking: rank according to relevance, then rank according to diversity subject to original ranking is not affected too much
- greedy optimization: have the original ranking, keep adding items that don't affect diversity of result set too much



Facebook ML ranking:
each user has 1500+ items they could see
for feed ranking of items in news feed,
come up with various ml models such as click through rate (CTR) for each event (long video watch, click like comment share hide )
Assign each event some number of points
For each event predict probability the user takes that action
compute expected value










Comments