Learning AND-OR Templates for Object Recognition by Information Projection

Zhangzhang Si
Ph.D., 2011
Advisor: Song-Chun Zhu
Finding statistical models for the bewildering varieties of visual patterns in nat- ural scenes such as object patterns and texture patterns is at the core of under- standing the mystery of vision. The generative image models, which are automat- ically learned from observed image examples, help us understand the underlying structure of the high dimensional image space. On the other hand, they pro- vide powerful schemes for machine vision tasks such as object recognition and detection. In this work, I mainly focus on learning probabilistic generative image models as hierarchical AND-OR Templates (AOT).
More specifically, the proposed AND-OR Templates have the following char- acteristics which are advantageous in representing visual objects:
(1) hierarchical composition (AND). An object is usually composed by several constituent parts (e. g. a person is composed of head, body, arms and feet) that are relatively independent of each other. The parts can be further decomposed into smaller parts.
(2) Hierarchical coarse-to-fine deformation (Continuous/geometric OR). For example, a person can form a complicated pose and its articulation can be represented as movements of larger body parts at a coarse level, together with movements of sub-parts within each body parts, and so on.
(3) Multiple ways of composition (Discrete/structural OR). For example, a person may have small eyes or large eyes.