A Hierarchical Compositional Model for Representation and Sketching of High-Resolution Human Images

Zijian Xu
Ph.D., 2007
Advisor: Song-Chun Zhu

This dissertation presents a composite template model, named And-Or graph for representing objects with large structural variabilities. Intuitively, an And-node represents a decomposition of certain graphical structures which expands to a set of Or-nodes with associated relations; an Or-node serves as a set of switch variable pointing to alternative And-nodes. A traversal from the root node of the And-Or graph, named the parse graph, produces a configuration of the terminal nodes (sub-templates) under (soft and hard) relations inherited from their ancestor nodes. The And-Or graph representation can generate a large set of constrained configurations with relatively small number of graph nodes, thus account for great structural variations. The And-Or graph model is tested on tasks as modeling and sketching human faces
and clothes.

A hierarchical-compositional model of human faces, as a three-layer And-Or graph is built. Faces are represented hierarchically: the first layer treats each face as a whole; the second layer refines the local facial parts jointly as a set of individual templates; the third layer further divides face into 16 zones and models detail facial features such as eye corners, marks or wrinkles. Transitions between the layers are realized by measuring the minimum description length (MDL) given the complexity of an input face image. Diverse face representations are formed by drawing from dictionaries of global faces, parts and skin detail features. A sketch captures the most informative part of a face in a much more concise and potentially robust representation. However, generating good facial sketches is extremely challenging because of the rich facial details and large structural variations, especially in the high-resolution images. The representing power of our generative model is demonstrated by reconstructing high-resolution face images and generating the cartoon facial sketches. Our model is useful for a wide variety of applications, including recognition, non-photorealisitc rendering, super-resolution, and low-bit rate face coding.

Cloth modeling and recognition is an important and challenging problem in both vision and graphics tasks, such as dressed human recognition and tracking, human sketch and portrait. We built a And-Or graph model to represent different clothes configurations, such as T-shirts, jackets, etc. In a supervised learning phase, we ask an artist to draw sketches on a set of dressed people, and we decompose the sketches into categories of cloth and body components: collars, shoulders, cuff, hands, pants, shoes etc. Each component has a number of distinct sub-templates (sub-graphs). An algorithm which integrates the bottom-up proposals and the top-down information is proposed to infer the composite clothes template efficiently from the image.