2013年3月13日 星期三

Aggregating local descriptors into a compact image representation



Image search on a very large scale, where three constraints have to be considered jointly:
the accuracy of the search, its efficiency, and the memory usage of the representation


Author propose aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation.

Obtains a significantly higher accuracy with, typically, a 20-byte representation. This is obtained by optimizing:

  1. the representation, i.e., how to aggregate local image descriptors into a vector representation;
  2. the dimensionality reduction of these vectors;
  3. the indexing algorithm.

Two contribution of this work:


  1. proposing a representation that provides excellent search accuracy with a reasonable vector dimensionality
  2. show the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm

VLAD: vector of locally aggregated descriptors


  1. learn a codebook C = {c1, ...ck} of k visual words with k-means. Each local descriptor x is associated to its nearest visual word
  2. The idea of the VLAD descriptor is to accumulate, for each visual word c , the differences    x−ci of the vectors x assigned to ci, This characterizes the distribution of the vectors with respect to the center.                                                                                                                         
  3. A Component of v is obtained as a sum over all the image descriptors:


Images and corresponding VLAD descriptors, fork=16 centroids (D=16×128). The components of the descriptor are represented like SIFT, with negative components (see Equation 1) in red




VLAD performance and dimensionality reduction 



Compare VLAD descriptors with BoF: INRIA Holidays Dataset (mAP,%)

Dimension is reduced to from D to D’ dimensions with PCA


Observations:

  • VLAD better than BoF for a given descriptor size 
    • comparable to Fisher kernels for these operating points 
  • Choose a small D if output dimension D’ is small 


Indexing algorithm: searching with quantization 


  • Search/Indexing = distance approximation problem 
  • The distance between a query vector x and a database vector y is estimated by 
 where q(.) is a quantizer 
    • vector-to-code distance 
  • The choice of the quantizer is critical 
    • needs many centroids 
    • regular k-means and approximate k-means can not be used 
      • we typically want k=2^
        64 
        for 64-bit codes 


Product quantization for nearest neighbor search

  • Vector split into m subvectors: 
  • Subvectors are quantized separately by quantizers 
where each qi is learned by k-means with a limited number of centroids 
  • Example: y = 128-dim vector split in 8 subvectors of dimension 16