Bottom-Up/Top-Down Image Parsing with Attribute Grammar

Feng Han, Song-Chun Zhu
This paper studies a simple attribute graph grammar as a generative image representation for image parsing in a Bayesian framework. This grammar has one class of primitives as its terminal nodes – 3D planar rectangles projected on images, and six production rules for the spatial layout of the rectangular objects. All the terminal and non-terminal nodes in the grammar are described by attributes for their geometric properties and image appearance. Each production rule either instantiates an non-terminal node as a rectangular primitive or expands a node into its components. A production rule is associated with a number of equations that constrain the attributes of a parent node and those of its children. This grammar, albeit simple, can produce a combinatorial number of configurations and objects in made-made scenes, such as building, hallways, kitchens, and living rooms etc. A configuration refers to a planar graphic representation produced by a series of grammar rules in a parsing graph (augmented from a parsing tree by including horizontal constraints). A given image ins then generated from a configuration by a primal sketch model [5] which is formulated as the likelihood in the Bayesian framework. The paper will be focused on designing an effective inference algorithm which computes (or construct) a hierarchical parsing graph from an input image in the process of maximizing a Bayesian posterior probability or equivalently minimizing a description length (MDL). It is worth clarifying that the parsing graph here is only a generic interpretation for the object layout and it is beyond the scope of this paper to deal with explicit scene and object recognition. The inference algorithm integrates bottom-up rectangle detection as weighted candidates (particles) which activate the grammar rules for top-down predictions of occluded or missing components. Intuitively each grammar rule maintains a list of particles as in an “”assembly line””. The top-down process chooses the most promising particle (with heaviest weight) at each step. The acceptance of a grammar rule means a recognition of a certain sub-configuration so that the description length is decreased in a greedy way , and it also activates a number of actions: (i) creating new “”top-down”” particles and insert them into the lists; (ii) reweighing some particles in the lists; (iii) passing attributes between a node and its parent through the constraint equations associated with this production rule. When an attribute is passed from a child to a node to a parent node, it is called bottom-up; and the opposite is called top-down. The whole procedure is, in spirit, similar to the data-drive Markov chain Monte Carlo paradigm [20], [16], except that a greedy algorithm is adopted for simplicity.
2005-09-01