Thứ Năm, 28 tháng 10, 2010

[Paper] Aggregating local descriptors into a compact image representation

BOF (bag of feature)
- Detect all local features in an image, and represent them in feature space.
- Clustering into k clusters with center $c_i$

VLAD: vector of locally aggregated descriptions
- Find out $c_i$ is the same as BOF model

- Each local descriptor
x is associated to one nearest cluster $c_i$. The idea of VLAD is to accumulate:
\[ c_i = c_i + (x_i - c_i) \]
or
\[
v_{i, j} = \sum_{\text{x such that NN(x) = }c_i} (x_j - c_{i,j})
\]
where, $v_{i, j}$ is $j^{th}$ component of $i^{th}$ cluster It characterizes the difference of vectors with respect to center.
- v is subsequently $L_2$ normalized by $v = v/||v||_2$
The Image representation vector dimension is $D = k \times d$ with d = 128 (SIFT descriptor dimension) and k = 16 to 256 by experiment.

References:
[1] Hervé Jégou, Matthijs Douze, Cordelia Schmid and Patrick Pérez. "Aggregating local descriptors into a compact image representation". Proc. IEEE CVPR'10, June,

Thứ Ba, 26 tháng 10, 2010

Probability Theory

1. Probability
Probability is measure of expressing the belief that an event will occur or has occurred.
Consider a discrete variable X taking values from definition set U and a is value on such set. Probability of that X takes a is expressed as:
\[
P(X = a) = \frac{\text{the number of }a}{\sum_{i \in U }\text{the number of } i}
\]
P(X = a) is real-value in integral [0, 1] and $\sum_{i \in U} P(X = i) = 1$
2. Expectation:
a) Expectation value (mean) of X is weighted average of possible values that X can take. Weight is exactly the probability of that value occurrence:
\[
E[X] = \sum p(X = x) \times x
\]
Expectation is the value that you expect the result of your experiment to be average.
Example:
The studying outcome (in 5-point scale) of a student A is given by :
Grade (X)Probability
53/20
49/20
34/20
23/20
11/20

In order to obtain the studying quality of a student A, we consider Expectation of grade X as representation value:
\[
\begin{align}
E[x] & = p(X = 1) \times 1 + p(X = 2) \times 2 + p(X = 3) \times 3 + p(X = 4) \times 4 + p(X=5) \times 5\\
& = 1/20 \times 1 + 3/20 \times 2 + 4/20 \times 3 + 9/20 \times 4 + 3/20 \times 5\\
& = 3.5
\end{align}
\]
b) Expectation value of a function
To find E[f(X)], where f(X) is a function of X, uses this formula:
\[
E[f(X)] = \sum p(X = x) f(x)
\]

Example:
Given a function $f(X) = X^2$ with:
f(X = 1) = 1 ; p(X = 1) = 1/6
f(X = 2) = 4 ; p(X = 2) = 3/6
f(X = 3) = 9 ; p(X = 3) = 2/6
The expectation value are obtained as:
\[
\begin{align}
E[f(X)] & = p(X=1) \times f(X=1) + p(X=2) \times f(X=2) + p(X=3) \times f(X=3)\\
& = 1/6 \times 1 + \3/6 \times 4 + 2/6 \times 9 \\
& = 31/6
\end{align}
\]

c) Expectation property
- Linear operator
\[
E[a X + b] = a E[X] + b
\]
where a, b are constants

3. Variance
a) Variance
Variance of X characterizes the spread of possible values of X and is written as:
\[
Var(X) = E[(X - E(X))^2]
\]
This can be also written as:
\[
Var(X) = E[X^2] - E^2(X)
\]
Proof:
\[
\begin{align}
Var(X) & = E[(X - E[X])^2] \\
& = \sum p(X = x) ( x - E[X])^2\\
& = \sum p(X = x) ( x^2 - 2xE[X]+E^2[X])\\
& = \sum p(X = x) x^2 - \sum p(X = x) 2xE[X]+\sum p(X = x)E^2[X])\\
& = E[X^2] - 2E[X]\sum p(X = x) x + E^2[X]\\
& = E[X^2] - 2E^2[X]+ E^2[X]\\
& = E[X^2] - E^2[X]
\end{align}
\]

b) Variance Property:
+ $Var(aX + b) = a^2 Var(X)$
where a, b are constants
Proof:
\[
\begin{align}
Var(aX +b) & = E[(aX+b)^2] - E^2[(aX+b)]\\
& = E[a^2X^2+2abX+ b^2] - (aE[X]+b)^2\\
& = a^2E[X^2]+2abE[X]+ b^2 - (a^2E^2[X]+2abE[X] + b^2)\\
& = a^2E[X^2] - a^2E^2[X]\\
& = a^2(E[X^2] - E^2[X])\\
& = a^2Var(X)
\end{align}
\]

4. Standard deviation
Standard deviation is square root of variance.

5. Probability function
Probability function covers both probability mass function (discrete probability function) and probability density function (continuous probability function)

$\circ$ Discrete probability function (probability mass function)
Suppose that U is a discrete set of possible values. Then $X: U \rightarrow R$ is a discrete variable. Probability mass function $f(x): R \rightarrow [0, 1]$ is defined as:
\[
f(X = x) = P(X = x) = P({s \in U| X(s) = x})
\]
Example:
Suppose the U is sample of simple toss of fair coin and X is random variaele on U by assigning 1 to "head" and 0 to "tail". Since coin is fair, so probability mass function of coin toss is:
\[
f(x) = \left\{ \begin{matrix} \frac{1}{2} \text{ , } x \in \{0, 1\}\\ 0 \text{ , } x\not\in \{0, 1\} &\end{matrix} \right
\]

$\circ$ Continuous probability function (probability density funcition - pdf, probability distribution fucntion or just density)
A function f(X) is considered as density function of continuous random variable X if it satisfies :
\[
P(a \leq X \leq b) = \int_{a}^{b} f(x) dx
\]
This definition leads to:
+ $0 \leq f(x) \leq 1$ with any x
+ $\int f(x) dx = 1$
Because continuous probability function is considered for the infinite number of points over continuous intervals, probability at a single point is always zero. That is why continuous probability function is defined over an interval rather than at one point.

6. Cumulative distribution function (Cumulative frequency function)
Cumulative distribution function is sum of probability values of probability function f(x) found at values less than or equal to x:
\[
F(x) = \int_{-\inf}^x f(x) dx
\]
Therefor, F(x) is an area under probability distribution curve from negative infinite to x.
For discrete probability function:
\[
F(x) = \int_{-\inf}^x f(x) dx = P(X \leq x)
\]

Note:

- The term "probability distribution function" is usually used to denote the "probability density function", but there isn't the standard among statisticians and probabilists. That leads to that the term "probability distribution function" are considered as "accumulative density function " or even "probability mass function" in some special cases.

7. Correlation Coefficient
http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_correlation_coefficient.htm
http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc542.htm

Reference:
[1] Grading
[2] Expectation & Variance on Mathsrevision
[3] Distribution function on Wolfram Mathworld
[4] Probability funciton on Wolfram Mathworld
[5] Probability Distribution on Engineering Statistics handbook
[6] Probability density function on Wikipedia
[7] Cumulative distribution function on Wikipedia
[8] Probability mass function on Wikipedia

Thứ Ba, 12 tháng 10, 2010

[CV] Source code list

See this link

[CV][Descriptor] Local Self-Similarity & Global Self Similarity

1. Introduction
Based on the observation on self-similarity of object. In Figure 1, although the heart in images are variant but they are created by one repeated texture to make "heart" shape. The only difference is the texture to make the heart in each image. This idea pushed Eli Shechtman and Michal Irani to reach Local Self-Similarity Descriptor [1].

Figure 1: Object with variance in color, texture and edge.
In [1] this descriptor is used in context of Template Matching (Figure 2).
Input:
+ Object Image (template, usually small image and about 150-200 pixels in [1])
+ Large image (containing object in template image)
Output:
Location of template object in large image.

Figure 2: An example of template matching

2. Applied Area
- Template matching (with template image is a real image or hand sketched image)

3. Details

a) Descriptor
To extract local self-similarity $d_q$ at a point q, following this steps:
- Convert image in CIE L*a*b
- Correlate image patch (centered at q) 5x5 with a surrounding region (centered at q) 40 x40 using Sum of Square Difference (SSD) between patch colors, resulting in "distance surface" $SSD_q(x,y)$
- Normalize "distance surface" into "correlation surface" $S_a(x, y)$
(Figure 2a):

\[
S_q(x, y) = exp \left( - \frac{SSD_q(x,y)}{max(var_{noise}, var_{auto}(q)}} \right)
\]

where:
+ $var_{noise}$ is constant corresponding to variant photometric variations
+ $var_{auto}(q)$ is patch difference contrast in the meaning that sharp patches are more toleratable than smooth patches. In their implementation,
$var_{auto}(q)$ is the maximal variance of difference of patches in neighborhood of radius 1 related to centered patch.

- Correlation surface is transformed into binned log-polar representation with 80 bins (4 in log radius and 20 in angle)

- The maximum value in each bin is chosen --> make local affine transformation tolerance --> descriptor vector of size 80.

- Normalize this vector into range [0..1] by linearly stretching --> invariant to color and pattern distribution of surrounding patterns.

b) Descriptor for Video (Figure 2b)
- Correlate 5 x5x1 (patch without time axis) with 3D region 60 x 60 x 5 (2 previous and 2 next frames).
- Log polar space is changed to log- log-polar (log in time and space, polor in space) to take the descriptor vector of size 182.

c) Matching:
F: template image
G: input image
- Compute $d_q$ densely in both F and G (in multiple scales with Gaussian pyramid) .
- Remove meaningless descriptors (both F and G) including :
+ Centered patches are salient but no patch is similar to it in region (all un-normalized vector element are less than a threshold $t_{non\_info}$)
+ Descriptors with high self-similarity in everywhere in region (large homogeneous region). Maybe, there is a threshold $t_{high}$ here (didn't report).
- Essemble Matching - find the subset of descriptors in G close to all descriptor in F under geometric contrast. To do that, author modified essemble matching in [2]. Location with highest likelihood value considered as detected location.

d) Parameter:

NameMeaningValues
$var_{noise}$acceptable photometric variations(in illumination, color or noise)?
$t_{non\_info}$Threahold to remove non-informative descriptor?
$t_{high}$Homogeneous region threshold?

Figure 3: Extract $d_q$

e) Result:
- Outperform GLOH, Shape Context on template matching (Figure 4).

Figure 4: Result comparison with other descriptors.

f) Global Self-Similarity [3]
For each patch $t_q$ centered at q, correlate with whole H x W image I resulting in the correlation surface $C_q$ (here, region in Local Self-Similarity is actually image). In $C_q$, $C_q(q^{\prime})$ is correlation between $t_q$ and $t_q^{\prime}$.
Global Self-Similarity descriptor for image is given by:
\[
S_I(q, q^{\prime}) = C_q(q^{\prime})
\]

where, $S_I$ is quadratic in size of image I: H x W x H x W

4. Properties

Invariance capability:


Inner changeTransformationClutter Backgound
ShapeTextureIlluminationTransitionRotationScaleAffine
Local Self-Similarity
+++x

x
+++

x: have ability (
+: low,
++: medium,
+++: high)
- Local affine transformation invariance: find maximum in each bin of log polar
- Texture Invariance: find positions similar to local texture (local texture similarity)
- Illumination Invariance: due to 2 nomalization steps: normalize "distance surface" and normalize vector into [0..1]
- Scale Invariance: apply multiple scale
- Clutter background invariance: usually, if centered patch belongs to object then Local Self-Similarity finds neighbour patches like it, in another hand, only takes interest in object texture. Therefore, any background (not quite similar to object region) doesn't influence.

5.
Source code
- Source code byVarun Gulshan in Visual Geometry Group
- See also: Global Self-Sililarity Descriptor by Thomas Deselaers on his paper
- Local Self-Similarity by Rainer Lienhart in OpenCV

6. My conclusion
- Very exciting descriptor with novel idea.

- Good for object with characteristics:
+ Only influenced by scale
+ Texture changes on difference background (human, human-made object on real image)
+ May change but keep main form

- I haven't seen anything interst in Global Self-Similarity descriptor (at least now)

References:
[1] Eli ShechtMan,Michal Irani. "Matching Local Self-Similarities across Images and Videos". IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[2] O. Boiman, M. Irani. "Detecting irregularities in images and videos". in ICCV, 2005.
[3] Thomas Deselaers and Vittorio Ferrari. "Global and Efficient Self-Similarity for Object Classification and Detection". CVPR, 2010.

Thứ Hai, 11 tháng 10, 2010

[CV][Detector][Contour] gPb

1. Introduction
- gPB is state of the art contour detector [2] proposed by Michael Maire et al. at CalTech.
- Bryan Catanzaro et al (Berkeley) re-implemented this detector on CUDA [1].
They contributed a parallel version with rotated integral image for histogram calculation and used Lanczos algorithms with Cullum-Willoughby test for eigensolver to speed-up about 100 times (4 mins down to 1.8 secs).
They claimed in [1] that their implementation gives the same result as in [2] - 0.7 on Berkeley dataset (Figure 6 and Figure 7 is results) and tested on several GPU kinds (Table 2).
2. Dataset
They tested on Benchmark Dataset in Countour detection [Berkeley Segmentation Dataset]:

3. Source code:
- gPb: Software and source code can be downloaded from Michael Maire's page
- Damascene: parallel gPb on CUDA by ParLab at Beckeley)

4. My conclusion:
- The result of gPb seems good and Catanzaro's implementation takes the advantages of speed-up ---> should try


References:
[1] Bryan Catanzaro, Narayanan Sundaram, Bor-Yiing Su, Yunsup Lee, Mark Murphy, Kurt Keutzer. "Efficient, High-Quality Image Contour Detection". In 2009 IEEE 12th International Conference on Computer Vision (September 2009)
[2] M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. "Using contours to detect and localize juncitons in natural images". CVPR, pages 1-8, June 2008.
[3] Contour detecton survey slide by Michael Maire.

Thứ Sáu, 8 tháng 10, 2010

[CV][Detector] Ferns

1. Authors:

Proposed by Mustafa Ozuysal et al in PAMI 2010 [1]. A detector based on machine learning, particularly, Naive-Bayes combination of classifiers using ferns

2. Details
a) There are two phases: training and evaluation

- Training:
Select some images as model images (to extract patches for training). Run Harris detector on each image to extract keypoints and take $K \times K$ patch at each keypoint. This is done by deforming the images many times, applying Harris detector. Select top-n patches with highest detection frequency on original image and its deformed ones (Figure 1) to make N classes .

Figure 1: Some classes for training (From [1])

Affine deformation to create variant images is given by:

\[
R_{\theta}R_{-\phi}\text{diag}(\lambda _1, \lambda _2) R_{\theta}
\]
where $R_{\theta}$,$R_{-\phi}$ are rotations of angle $\theta$,$\phi$ respectively. One way to make training set for each class is to randomly create 30 affine deformations per degree of rotation.

- Evaluation:
Run trained detector on new image (containing whole or a part of training image). The detected patch on new image will be assigned with appropriate class label associated with corresponding patch on training image.

b) Naive-Bayes combination of classifiers method:
$C_i$ is $i^{th}$ in N classes:
\[
\hat{c_i} = argmax_{c_i} P(C = c_i | f_1, f_2, ..., f_N)
\]
Apply Bayes' formula:
\[
P(C = c_i | f_1, f_2, ..., f_N) = \frac{P(f_1, f_2, ..., f_N|C = c_i) P(C = c_i)}{P(f_1, f_2, ..., f_N)}
\]
Assuming a uniform prior $P(C = c_i)$ and $P(f_1, f_2, ..., f_N)$ is the same for all classes, reduce to find:
\[
\hat{c_i} = argmax_{c_i} P(f_1, f_2, ..., f_N|C = c_i)
\]
In this formula, feature $f_j$ depends on two locations $d_{j,1}$ and $d_{j,2}$ on the patch I:


\[
f_j = \left\{\begin{matrix} 1 & \text{if } I_{j, 1} <> I_{j, 2} || 0 & \text{otherwise} \end{matrix} \right.\]
Group features into M groups of size $S = \frac{N}{M}$. This group is called fern.
\[
P(f_1, f_2, ..., f_N| C= c_i) = \prod_{k=1}^K P(F_k|C = c_i)
\]
where, $F_k$ is $k^{th}$ fern.


c) Parameters:
ParameterMeaningValue
S# features per fern11
M# groups (ferns)30-50
$K \times K$The path size for training and evaluation32x32
$\theta$angle of deformation$[0:2\pi]$
$\phi$angle of deformation$[0:2\pi]$
$\lambda _1, \lambda _2$$[0.6:1.5]$
Nthe number of classes(1500)

3. Conclusion

Novel idead !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Advantages:
- No-descriptor, no-matching procedure (because class meaning match id is returned after detection)
- 1000 faster and more accurate than SIFT (Figure 2). Training time is also fast (5 minutes on MacBook Pro laptop, 200 classes). (Using Naive-Bayes)
- Invariance depends on training set (training set containing affine image, illumination image makes detector become affine and illumination invariance)
- Run on embedded system

Disadvantages:
- Only detect on image related to training image (includes or belongs to training image)
Figure 2: Fern gives more and more accuracy than SIFT (From [1])

4. Applied Areas:
SLAM, wide baseline matching, 3D objects, panorama & 3D annotation and on hand-held device
- Panorama annotation: give a panorama image with annotation, for a input image (patch of panorama), automatically annotate on input image (Figure 3).
- 3D annotation: give annotated 3D model of scene or object, for a input image, automatically annotate on input image (Figure 3).
Figure 3: panorama and 3D annotation application (From [1])

5. Source code
Fern demo on EPFL

References:
[1] Mustafa Ozuysal, Michael calonder, Vincent Lepetit and Pascal Fua. "Fast Keypoint Recognition Using Random Ferns". PAMI. 2010

Thứ Năm, 7 tháng 10, 2010

[CV][Detector] SFOP


1. Main points:
Forstner et al. proposed SFOP detector based on [2, 3] and automatic-scale selection of Lindeberg (Idea: improve available detector with automatic-scale selection theory).
Interesting things in their paper are:

+ They defined an property of interpretability for detector evaluation. For this consideration, they show that only Lowe, MSER and SFOP have powerful interpretation. Consideration is against to thought of anything that has hight repeatability is good interest point (Is it really good idea?).

+ SFOP performs light better than some scale-invariance state of the art detectors (Figure 2) and in poorly textured image (Figure 3).

+ SFOP gives more correspondences than other detectors (Figure 2: Row 3, first image).
Figure 1: Is interpretability important?


Figure 2: SFOP is light better scale-invariant detectors on Boat set (first row).


Figure 3: Good result of SFOP on poor textured image

2. My Conclusion:
- Should try SFOP for your application.
- SFOP may be your for image visualization (as well as MSER and Lowe)

References:
[1] Wolfgang Förstner, Timo Dickscheid, Falko Schindler. "Detecting Interpretable and Accurate Scale-Invariant keypoints". ICCV. 2009
[2] L. Parida, D. Geiger, and R. Hummel. Junctions: Detection, Classification and Reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998.
[3] J. Bigun. A Structure Feature for Some Image Processing Applications Based on Spiral Function. Computer Vision, Graphics and Image Processing, 1990.

[CV][Detector] Idea for detector

1. Detector thường chỉ thiên về 1 loại corner, junction, blob.
- Đã có detector trả về nhiều loại chưa?
- Detector trả về loại thì có ứng dụng gì so với trả về 1 loại?
+ Có thể biết được thông tin về tính chất của ảnh
+ Dùng descriptor tương ứng
2. Có những mô hình nào biểu diễn rotation invariance & scale invariance?
3. RANSAC matches những hình đối xứng có tốt ko? --> giải phảp có thể tìm ở Shape Context
4. Wide baseline matching in low-texture environments

Thứ Tư, 6 tháng 10, 2010

[CV][Descriptor] Ray Feature


1. Authors
Smith et al. proposed Ray feature in [1].

2. Successfully applied area:
- Cell detection

3.Main idea
A Ray feature set contains four kinds of features (Figure 1):
- Distance difference feature $f_i^{[ diff ]}$: provide the capability of scale invariance--> reduce the number of features in feature pool
- Distance feature $f_i^{[ dist ]}$: provide distance contraint on scale
- Orientation feature $f_i^{[ ori ]}$: provide relative gradient angle
- Norm feature $f_i^{[ norm ]}$: provide norm of gradient vector

Figure 1: Four kinds of Ray feature

Four feature templates are created in feature pool. Figure 2 shows that in nuclei, mitochondria and face detection, distance feature weakly contributes to detection while distance difference feature seems good.
- Precomputation like integral image is also proposed to help compute features quickly. According to authors, precomputation process is lower than Haar precomputation but in runtime, their features are faster (Figure 3).

Figure 2: The template contribution for problem of nuclei, mitochondria and face detection

Figure 3: Precomputation and run-time cost for Rays and Haar

4. My conclusion:
- Feature for narrow area.

- Good for deformable object (Figure 4) in simple background and good-result-for-edge-detection image, for example: nuclei and mitochondria in cell environment

- Adaboost/SVM is model for automatically choosing good features in a set of created features from some feature templates.

Figure 4: cell and its shape
[1] K. Smith, A. Carleton, and V. Lepetit. "Fast Ray Features for Learning Irregular Shapes". In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, 2009.

Thứ Hai, 4 tháng 10, 2010

[CV][Conference] ICCV 2009

I. Principle:

Recognition, Detection and Matching (oral)

1. Compact Signatures for High-Speed Interest Point Description and Matching (PDF).
Michael Calonder, Vincent Lepetit, Pascal Fua, Kurt Konolige, James Bowman, Patrick Mihelich
Problem: Require to run on low memory device so neet to compact data from descriptor
Solution: Using Compressive Sensing theory
Detail: Impove a descriptor (called Signature [19]) by Compressive Sensing Theory make a new descriptor (called compact descriptor)
[6] M. Calonder, V.Lepetit, and P. Fua. Keypoint signatures for fast learning and recognition. In ECCV' 08
[19] M. Ozuysal, M. Calonder, V. Lepetit , and P. Fua. Fast Keypoint Recognition using Random Ferns. PAMI, 2009. Accepted for Publication

Learning and Recognition - 1 (poster)

2. Fast Ray Features for Learning Irregular Shapes (PDF). Kevin Smith, Alan Carleton, Vincent Lepetit

See note here


Low Level Vision and Others (poster)
3. Image Saliency by Isocentric Curvedness and Color.
Roberto Valenti, Nicu Sebe, Theo Gevers
Topic: Salient Object detection (don't care) - Finding the most interesting object in an image
Method: Weighted segmentation
4. Detecting Interpretable and Accurate Scale-Invariant keypoints (PDF).
Wolfgang Förstner, Timo Dickscheid, Falko Schindler
See note here


5. An Algebraic Model for fast Corner Detection. Andrew Willis, Yunfeng Sui

6. Contour Segment Network

II. Secondary

Recognition, Detection and Matching (oral)

1. A Shape-Based Object Class Model for Knowledge Transfer (PDF). Michael Stark, Michael Goesele, Bernt Schiele

Learning and Recognition - 2 (poster)
2. Consensus Set Maximization with Guaranteed Global Optimality for Robust Geometry Estimation.
Hongdong Li
Matching and Alignment (poster)
3. Robust Matching of Building Facades under Large Viewpoint Changes.
Jimmy A. Lee, Kin-Choong Yow, Alex Y. S. Chia

4. Feature Correspondence and Deformable Object Matching via Agglomerative Correspondence Clustering.
Minsu Cho, Jungmin Lee, Kyoung Mu Lee

5. Subspace matching: Unique solutions to point matching with geometric constraints (PDF).
Manuel Marques, Marko Stosic, Joao Costeira

6. Deformation Invariant Image Matching by Spectrally Controlled Diffeomorphic Alignment.
Christopher M. Yang, Sai Ravela

7. Wide-Baseline Image Matching Using Line Signatures (PDF, project, results).
Lu Wang, Ulrich Neumann, Suya You
Problem: Low-texture screne and non-planar screne for Wide baseline matching
Solution: Use line feature called Line Signature and propose a new matching algorithms for Line Signature matching


8. Matching as a Non-Cooperative Game.
Andrea Albarelli, Samuel Rota Bulò, Andrea Torsello, Marcello Pelillo

9. An Algebraic Approach to Affine Registration of Point Sets (PDF).
Jeffrey Ho, Adrian Peter, Anand Ranganranjan, Ming-Hsuan Yang
Low Level Vision and Others (poster)
10. Recovering Planar Homographies between 2D Shapes (PDF).
Jozsef Nemeth, Csaba Domokos, Zoltan Kato

11. GroupSAC: Efficient Consensus in the Presence of Groupings (PDF).
Kai Ni, Hailin Jin, Frank Dellaert

III Auxiliary:

Shading and Color (oral)

1. Estimating Natural Illumination from a Single Outdoor Image(PDF, project, supplementary material) - formely: Illumination Estimation from a Single Outdoor Image. Jean-François Lalonde, Alexei A. Efros, Srinivasa G. Narasimhan

Similarity Metrics and Nearest Neighbors (oral)

2. Similarity Metrics for Categorization: From Monolithic to Category Specific(PDF, abstract) - formely: Similarity Functions for Categorization: from Monolithic to Category SpecificBoris Babenko, Steve Branson, Serge Belongie
Low Level Vision and Others (poster)
21. Efficient, High-Quality Image Contour Detection (, project). Bryan Catanzaro, Bor-Yiing Su, Narayanan Sundaram, Yunsup Lee, Mark Murphy, Kurt Keutzer
3D: Shape, Geometry, and Stereo (poster)
15. Diagram Techniques for Multiple View Geometry (PDF).
Alberto Ruiz, Pedro E. Lopez-de-Teruel

Thứ Năm, 30 tháng 9, 2010

[CV][Descriptor] Shape Context

1. Authors
Shape Context is shape oriented descriptor proposed by Belongie & Malik in paper 2000 [3].

2. Implementation
Shape context for an arbitrary object (Figure 1.a) is computed by the following steps:
- Find edge of object (inside or on object boundary) - use Canny (Figure 1.c).
- Choose N sample points on object edge (regardless of inside or on boundary). They are any points on edge and not required to be interest point.
- For each sample point, calculate the distance to N-1 others (Figure 1.d)
- Normalize: compute median ($\lambda$ - Figure 1.d) of NxN distances above and divide each distance by this median
- Represent in log polar space for each sample point with N-1 normalized distances. In log-polar space, Histogram of one sample point shows that the vertical is $n_r$ bins in log of radius and horizontal is $n_{\theta}$ bins in angle related to tangent of sample point. The authors called the histogram as shape context. Figure 1.e is histogram with $n_r = 5$ and $n_{\theta} = 6$, so it contains 30 bins. Histogram is flattened by concatenating rows to get vector with $n_r \times n_{\theta}$ elements.

- Concatenate N histograms to obtain shape context representation of object (matrix with N rows and $n_r \times n_{\theta}$ columns.
Figure 1: Shape context computation and graph matching (a, b) Original image pair. (c) Edges and tangents of first letter with 50 sample points. (d) Vectors from a sample point (at left, middle) to all other points. The median distance $\lamda$ for all $N^2$ point pairs is shown at bottom for reference. (e) $\log{r, \theta}$ histogram of vectors in (d), with 5 and 6 bins, respectively. (Dark = large value). (f) Correspondences found using Hungarian method, with weights given by sum of two terms: histogram dissimilarity and tangent angle dissimilarity. (g, h) The "shape contexts" for the two letters, formed by flatterning and concatening the histograms for all points in each image; each shape context has 50 rows, one per sample point, and 30 columns, one for each histogram bin.

Parameters:

ParameterMeaningInitial implementation value
NThe number of sample points100
$n_{\theta}$The number of angular bins12 (360 degrees)
$n_r$The number of log(radius) covers from $0.125\lambda$ to $2\lambda$, values inside $0.125\lambda$ to first bin and outside $2\lambda$ to last bin5


3. Properties
- Implementation complexity: easy (and few parameters)
- The ability of transformation invariance

IlluminationTransitionRotationScaleAffine
Shape Context
xxx

+ Transition-invariance because of relative distance with respect to sample point
+ Rotation-invariance because angular bins are measured with respect to tangent at sample point
+ Scale - invariance because normalization by N x N distance median.

4. Matching
- Give 2 objects with histogram $g_i(k)$ and $h_j(k)$, here i and j are sample points.
- Compute the matching distance between point i in object 1 and point j in object 2 to make a N x N matrix:

\[
C_{ij} = (1-\beta)C_S_{ij} + \beta C_A_{ij}
\]
(ordinarily, $\beta = 0.3$)
We have $C_S_{ij}$ is shape dissimilarity (use chi-square distance) and $C_A_{ij}$ is local appearance.
\[
C_S_{ij} = \frac{1}{2}\sum_{k = 1}^{n_r \times n_{\theta}}\frac{[g_i(k) - h_j(k)]^2}{g_i(k) + h_j(k)}
\]

\[
C_A_{ij} = \frac{1}{2}\|(cos{\theta_i}-cos{\theta_j}, sin{\theta_i}-sin{\theta_j})\|
\]
($\theta_i$ and $\theta_j$ are tangent of points i and j)
- Our target is to minimize the total cost of matching with constraint that the matching to be one-one. That means we will find a permutation $\pi(i)$ so that $\sum_i C_{i\pi(i)}$ is a minimum.
--> Use Hungarian method with computation cost $O(N^3)$

5. Evaluation
- Good shape descriptor for clear object (especially synthesis object, non-noise image): handwritten, shihouettes, logos.
- Good idea for scale, rotation invariance
- Hard for real image (because of noise) (think so :D)

6. Source code

References:
[1] Shape Context on Wikipedia
[3] S. Belongie and J. Malik (2000). "Matching with Shape Contexts". IEEE Workshop on Content based Access of Image and Video Libraries (CBAIVL-2000).

Thứ Năm, 23 tháng 9, 2010

[CV][Detector] Harris Laplace alogrithm

In [2] there are 2 proposed implementation of Harris Laplace detectors as followings:
I. Standard Harris Laplace detector (without extension):
1. Build the scale-space representation with $\sigma _n = s^n \sigma_0$
2. At each scale level, detecting maximum point in 8 neighbours of that point and more than a threshold:
\[
det(\mu (x, \sigma _n)) - \alpha trace^2(\mu(x, \sigma_n))>threshold_H
\]
here, $\mu$ is second moment matrix (or auto-correlation matrix)

\[
\mu(\textbf{x}, \sigma_I, \sigma_D) = \begin{bmatrix}\mu_{11} & \mu_{12}\\ \mu_{21} & \mu_{22} \end{bmatrix} = \sigma_D^2g(\sigma_I) \star \begin{bmatrix} L_x^2(\boldsymbol{x}, \sigma_D) & L_xL_y(\boldsymbol{x}, \sigma_D)\\ L_xL_y(\boldsymbol{x}, \sigma_D)& L_y^2(\boldsymbol{x}, \sigma_D) \end{bmatrix}
\]

integration scale: $\sigma_I =\sigma_n$ and derivation scale $\sigma_D= k\sigma_I$
In details, given L is image at scale n (I is smoothed with gaussian $\sigma_n$).
- Compute three derivatives of L $(L_x, L_{xy}, L_y)$ using Gaussian-with-$\sigma_D$ derivative kernel.

-Compute $L_x^2 = L_x L_x, L_y^2 = L_y L_y$.
-Convolute $L_x^2, L_{xy}, L_y^2$ with $g(\sigma_I)$ (means that derivation value is weighted by Gaussian with $\sigma_I$) and multiplied by $\sigma_D^2$.

- Take cornerness:
\[
det(\mu (x, \sigma _n)) - \alpha trace^2(\mu(x, \sigma_n))>threshold_H
\]
- Find maximum point in 8-neighbours is greater than $threahold_H$ --> candidate point set
3 For each point in candidate set, compute Laplacian of Gaussian and take point with maximum over scale and greater than Laplacian threshold. Laplacian of Gaussian is given by:
\[
\sigma_n^2 \left| L_{xx}(\boldsymbol{x}, \sigma_n) + L_{yy}(\boldsymbol{x}, \sigma_n) \right| > threshold_L
\]
-Compute Laplacian of Gaussian with $\sigma_n$ at each candidate point and multiply this result by $\sigma_n^2$

Output: $(x, y, \sigma_n)$ - position and scale

Parameters:

Symbol
Meaning
Value
sscale factor1.2
$\sigma_0$ initial scale1 (inference from $\sigma_n = s^n$)
$\sigma_n = s^n$ sigma rangen = (1:17)
$threshold_H$ Harris function threshold1000
$threshold_L$ Laplacian of Gaussian threshold10
$\alpha$in Harris function0.06
k $\sigma_D/\sigma_I$0.6




References:
[1] K. Mikolajczyk, K. and C. Schmid. "Scale and affine invariant interest point detectors". International Journal of Computer Vision. 2004

Thứ Tư, 22 tháng 9, 2010

[Theory][Gaussian] 2D Gaussian function and derivatives





The 2D Gaussian function is given by:


\[
G = \frac{1}{2\pi\sigma^2}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

The first derivatives:

\[
G_x = -\frac{x}{2\pi\sigma^4}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]


\[
G_y = -\frac{y}{2\pi\sigma^4}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

The second derivatives:

\[
G_{x^2} = \frac{x^2-\sigma^2}{2\pi\sigma^6}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{y^2} = \frac{y^2-\sigma^2}{2\pi\sigma^6}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{xy} = \frac{xy}{2\pi\sigma^6}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

The third derivatives:

\[
G_{x^3} = \frac{3\sigma^2x-x^3}{2\pi\sigma^8}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{y^3} = \frac{3\sigma^2y-y^3}{2\pi\sigma^8}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{xxy} = \frac{y(\sigma^2 - x^2)}{2\pi\sigma^8}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{xyy} = \frac{x(\sigma^2 - y^2)}{2\pi\sigma^8}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

The fourth derivatives:

\[
G_{x^4} = \frac{x^4 - 6\sigma^2x^2 + 3\sigma^4}{2\pi\sigma^{10}}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{y^4} = \frac{y^4 - 6\sigma^2y^2 + 3\sigma^4}{2\pi\sigma^{10}}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{x^3y} = \frac{xy(x^2-3\sigma^2)}{2\pi\sigma^{10}}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{xy^3} = \frac{xy(y^2-3\sigma^2)}{2\pi\sigma^{10}}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

\[
G_{x^2y^2} = \frac{(x^2-\sigma^2)(y^2-\sigma^2)}{2\pi\sigma^{10}}e^{-\frac{(x^2 + y^2)}{2\sigma^2}}
\]

Laplacian of Gaussian:

\[
LoG = L_{x^2}+L_{y^2} = \frac{x^2 + y^2 - 2\sigma^2}{2\pi\sigma^6}e^{-\frac{(x^2+y^2)}{2\sigma^2}}
\]