Integrating 3D and 2D Representations for View Invariant Object Recognition

Wenze Hu
Ph.D., 2012
Advisor: Song-Chun Zhu
This thesis presents the representations and corresponding algorithms which learn models to recognize object images taken in the full continuous view space. Particularly, we propose to integrate the 3D object-centered representations with 2D viewer-centered representations, which fills in the representation gap between the sparse and simple 3D shapes and their view variant appearances observed as image pixels. Towards this goal, this thesis studies the following models and corresponding algorithms:
1. A mixed model and a pursuit algorithm which integrate 3D object primitives and 2D image primitives according to their information contributions measured by information gains. This proposed measure is consistently used in subsequent models, and also provides a numerical answer to the debates over object-centered representation and viewer-centered representation.
2. A 2D compositional image model and a sum-max data structure which groups the 2D image primitives to represent middle level image structures, such as line segments, curves and corners. This middle level image model can be used to find sparse representations of natural images, and connects the low level 2D image representations to 3D object representations.
3. A 3D hierarchical compositional object model and an AND-OR tree structure which represents a huge number of possible 3D object templates using a limited number of nodes. This AND-OR tree hierarchically quantizes the infinite and continuous space of object geometry and appearance, and decomposes the 3D object representation into 3D panels, whose appearance on images are further decomposed into active curves and the 2D primitives. Though with multiple hierarchies, learning and inference can be done efficiently by dynamic programming, which is essentially composed of layers of sum and max operations.