2013年6月28日 星期五

The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects.

There are advantages with taking the pairwise approach.
1. Existing methodologies on classification can be directly applied.
2. The training instances of document pairs can be easily obtained in certain scenarios.

There are also problems with the approach
1. The objective of learning is formalized as minimizing errors in classification of document pairs, rather than minimizing errors in ranking of documents
2. The training process is computationally costly, as the number of document pairs is very large.
3. The assumption of that the document pairs are generated.
4. The number of generated document pairs varies largely from query to query, which will result in training a model biased toward queries with more document pairs

This paper propose employing what we call the listwise approach, in which document lists instead of document pairs are used as instances in learning.

The major contributions of this paper include
 (1) proposal of the listwise approach
 (2) formulation of the listwise loss function on the basis of probability models
 (3) development of the ListNet method
 (4) empirical verification of the effectiveness of the approach.


Listwise Approach

The objective of learning is formalized as minimization of the total losses with respect to the training data.















Probability Models

Permutation Probability

Define the permutation probability, so that it has desirable properties for representing the likelihood of a permutation (ranking list), given the ranking function.
Given two lists of scores, we can first calculate two permutation probability distributions from them, and then calculate the distance between the two distributions as the listwise loss function. Since the number of permutations is n!, however, the calculation might be intractable.

Top One Probability

The top one probability of an object represents the probability of its being ranked on the top, given the scores of all the objects.
Theorem 6 shows that we can calculate top one probability, which is efficient.

With the use of top one probability, given two lists of scores we can use any metric to represent the distance
(listwise loss function) between the two score lists. For example, when we use Cross Entropy as metric.
listwise loss function becomes:

Learning Method: ListNet

Authors employ a new learning method for optimizing the listwise loss function based on top one probability, with Neural Network as model and Gradient Descent as optimization algorithm.



The key issue for the listwise approach is to define a listwise loss function.
Authors proposed employing a probabilistic method to solve it. Specifically, we make use of probability models: permutation probability and top one probability to transform ranking scores into probability distributions.
We can then view any metric between probability distributions (e.g., Cross Entropy) as the listwise loss function.

沒有留言:

張貼留言