1. Authors
Shape Context is shape oriented descriptor proposed by Belongie & Malik in paper 2000 [3].
2. Implementation
Shape context for an arbitrary object (Figure 1.a) is computed by the following steps:
- Find edge of object (inside or on object boundary) - use Canny (Figure 1.c).
- Choose N sample points on object edge (regardless of inside or on boundary). They are any points on edge and not required to be interest point.
- For each sample point, calculate the distance to N-1 others (Figure 1.d)
- Normalize: compute median ($\lambda$ - Figure 1.d) of NxN distances above and divide each distance by this median
- Represent in log polar space for each sample point with N-1 normalized distances. In log-polar space, Histogram of one sample point shows that the vertical is $n_r$ bins in log of radius and horizontal is $n_{\theta}$ bins in angle related to tangent of sample point. The authors called the histogram as shape context. Figure 1.e is histogram with $n_r = 5$ and $n_{\theta} = 6$, so it contains 30 bins. Histogram is flattened by concatenating rows to get vector with $n_r \times n_{\theta}$ elements.
- Concatenate N histograms to obtain shape context representation of object (matrix with N rows and $n_r \times n_{\theta}$ columns.
Figure 1: Shape context computation and graph matching (a, b) Original image pair. (c) Edges and tangents of first letter with 50 sample points. (d) Vectors from a sample point (at left, middle) to all other points. The median distance $\lamda$ for all $N^2$ point pairs is shown at bottom for reference. (e) $\log{r, \theta}$ histogram of vectors in (d), with 5 and 6 bins, respectively. (Dark = large value). (f) Correspondences found using Hungarian method, with weights given by sum of two terms: histogram dissimilarity and tangent angle dissimilarity. (g, h) The "shape contexts" for the two letters, formed by flatterning and concatening the histograms for all points in each image; each shape context has 50 rows, one per sample point, and 30 columns, one for each histogram bin. Parameters:
| Parameter | Meaning | Initial implementation value |
| N | The number of sample points | 100 |
| $n_{\theta}$ | The number of angular bins | 12 (360 degrees) |
| $n_r$ | The number of log(radius) covers from $0.125\lambda$ to $2\lambda$, values inside $0.125\lambda$ to first bin and outside $2\lambda$ to last bin | 5 |
3. Properties
- Implementation complexity: easy (and few parameters)
- The ability of transformation invariance
| Illumination | Transition | Rotation | Scale | Affine | |
| Shape Context | x | x | x |
+ Transition-invariance because of relative distance with respect to sample point
+ Rotation-invariance because angular bins are measured with respect to tangent at sample point
+ Scale - invariance because normalization by N x N distance median.
4. Matching
- Give 2 objects with histogram $g_i(k)$ and $h_j(k)$, here i and j are sample points.
- Compute the matching distance between point i in object 1 and point j in object 2 to make a N x N matrix:
\[
C_{ij} = (1-\beta)C_S_{ij} + \beta C_A_{ij}
\]
(ordinarily, $\beta = 0.3$)
We have $C_S_{ij}$ is shape dissimilarity (use chi-square distance) and $C_A_{ij}$ is local appearance.
\[
C_S_{ij} = \frac{1}{2}\sum_{k = 1}^{n_r \times n_{\theta}}\frac{[g_i(k) - h_j(k)]^2}{g_i(k) + h_j(k)}
\]
\[
C_A_{ij} = \frac{1}{2}\|(cos{\theta_i}-cos{\theta_j}, sin{\theta_i}-sin{\theta_j})\|
\]
($\theta_i$ and $\theta_j$ are tangent of points i and j)
- Our target is to minimize the total cost of matching with constraint that the matching to be one-one. That means we will find a permutation $\pi(i)$ so that $\sum_i C_{i\pi(i)}$ is a minimum.
We have $C_S_{ij}$ is shape dissimilarity (use chi-square distance) and $C_A_{ij}$ is local appearance.
\[
C_S_{ij} = \frac{1}{2}\sum_{k = 1}^{n_r \times n_{\theta}}\frac{[g_i(k) - h_j(k)]^2}{g_i(k) + h_j(k)}
\]
\[
C_A_{ij} = \frac{1}{2}\|(cos{\theta_i}-cos{\theta_j}, sin{\theta_i}-sin{\theta_j})\|
\]
($\theta_i$ and $\theta_j$ are tangent of points i and j)
- Our target is to minimize the total cost of matching with constraint that the matching to be one-one. That means we will find a permutation $\pi(i)$ so that $\sum_i C_{i\pi(i)}$ is a minimum.
--> Use Hungarian method with computation cost $O(N^3)$
References:
[1] Shape Context on Wikipedia
5. Evaluation
- Good shape descriptor for clear object (especially synthesis object, non-noise image): handwritten, shihouettes, logos.
- Good idea for scale, rotation invariance
- Hard for real image (because of noise) (think so :D)
6. Source code
References:
[1] Shape Context on Wikipedia
[2] Log-polar space
[3] S. Belongie and J. Malik (2000). "Matching with Shape Contexts". IEEE Workshop on Content based Access of Image and Video Libraries (CBAIVL-2000).








































