BOF (bag of feature)
- Detect all local features in an image, and represent them in feature space.
- Clustering into k clusters with center $c_i$
VLAD: vector of locally aggregated descriptions
- Find out $c_i$ is the same as BOF model
- Each local descriptor x is associated to one nearest cluster $c_i$. The idea of VLAD is to accumulate:
\[ c_i = c_i + (x_i - c_i) \]
or
\[
v_{i, j} = \sum_{\text{x such that NN(x) = }c_i} (x_j - c_{i,j})
\]
where, $v_{i, j}$ is $j^{th}$ component of $i^{th}$ cluster It characterizes the difference of vectors with respect to center.
- v is subsequently $L_2$ normalized by $v = v/||v||_2$
The Image representation vector dimension is $D = k \times d$ with d = 128 (SIFT descriptor dimension) and k = 16 to 256 by experiment.
References:
[1] Hervé Jégou, Matthijs Douze, Cordelia Schmid and Patrick Pérez. "Aggregating local descriptors into a compact image representation". Proc. IEEE CVPR'10, June,
- Detect all local features in an image, and represent them in feature space.
- Clustering into k clusters with center $c_i$
VLAD: vector of locally aggregated descriptions
- Find out $c_i$ is the same as BOF model
- Each local descriptor x is associated to one nearest cluster $c_i$. The idea of VLAD is to accumulate:
\[ c_i = c_i + (x_i - c_i) \]
or
\[
v_{i, j} = \sum_{\text{x such that NN(x) = }c_i} (x_j - c_{i,j})
\]
where, $v_{i, j}$ is $j^{th}$ component of $i^{th}$ cluster It characterizes the difference of vectors with respect to center.
- v is subsequently $L_2$ normalized by $v = v/||v||_2$
The Image representation vector dimension is $D = k \times d$ with d = 128 (SIFT descriptor dimension) and k = 16 to 256 by experiment.
References:

Không có nhận xét nào:
Đăng nhận xét