There are advantages with taking the pairwise approach.
1. Existing methodologies on classification can be directly applied.
2. The training instances of document pairs can be easily obtained in certain scenarios.
There are also problems with the approach
1. The objective of learning is formalized as minimizing errors in classification of document pairs, rather than minimizing errors in ranking of documents
2. The training process is computationally costly, as the number of document pairs is very large.
3. The assumption of that the document pairs are generated.
4. The number of generated document pairs varies largely from query to query, which will result in training a model biased toward queries with more document pairs
This paper propose employing what we call the listwise approach, in which document lists instead of document pairs are used as instances in learning.
The major contributions of this paper include
(1) proposal of the listwise approach
(2) formulation of the listwise loss function on the basis of probability models
(3) development of the ListNet method
(4) empirical verification of the effectiveness of the approach.
Listwise Approach
The objective of learning is formalized as minimization of the total losses with respect to the training data.Probability Models
Permutation Probability
Given two lists of scores, we can first calculate two permutation probability distributions from them, and then calculate the distance between the two distributions as the listwise loss function. Since the number of permutations is n!, however, the calculation might be intractable.
Top One Probability
The top one probability of an object represents the probability of its being ranked on the top, given the scores of all the objects.
Theorem 6 shows that we can calculate top one probability, which is efficient.
With the use of top one probability, given two lists of scores we can use any metric to represent the distance
(listwise loss function) between the two score lists. For example, when we use Cross Entropy as metric.
Learning Method: ListNet
Authors employ a new learning method for optimizing the listwise loss function based on top one probability, with Neural Network as model and Gradient Descent as optimization algorithm.
The key issue for the listwise approach is to define a listwise loss function.
Authors proposed employing a probabilistic method to solve it. Specifically, we make use of probability models: permutation probability and top one probability to transform ranking scores into probability distributions.
We can then view any metric between probability distributions (e.g., Cross Entropy) as the listwise loss function.
沒有留言:
張貼留言