Integration and Goal-Guided Scheduling of Bottom-Up and Top-Down Computing Processes in Hierarchical Models

Tianfu Wu
Ph.D., 2011
Advisor: Song-Chun Zhu

Accuracy performance and computational efficiency are two of the most impor- tant issues of object detection and parsing in computer vision, and the trade-off between them is usually guided by vision goals. In the literature, to improve ac- curacy performance, hierarchical models have been widely and successfully used, but often at the expense of increasing the computational burden largely. The explosion of computing costs would practically prevent a computer vision sys- tem from scaling to hierarchical models which consist of a large number of nodes. Meanwhile, hierarchical models are studied with zero-one loss used for nodes (i.e., loss cost-insensitive).

The goal of this thesis is to present a framework of integrating and scheduling bottom-up (BU) and top-down (TD) computing processes in a recursively defined hierarchical And-Or graph (AoG) to address, with a numerical study, the vision- goal-guided trade-off between accuracy performance and computational efficiency in both loss cost-insensitive and cost-sensitive situations. The BU/TD computing processes consist of three types of processes identified

xvi for each node A in an AoG: (i) The α(A) process detects node A directly based on image features; (ii) The β(A) process computes node A by binding its child node(s) bottom-up; and (iii) The γ(A) process predicts node A top-down from its parent node(s). To evaluate their individual information contributions, the three processes are isolated and then trained separately. The learning of the three processes are formulated under the maximum likelihood estimation (MLE) framework. A numerical study of the information contribution is presented with both computer and human experiments. The experimental results show that the three processes contribute to computing node A from images in complementary ways in terms of scale and occlusion conditions.

Improve the accuracy performance, the α-β-γ computing processes are in- tegrated by breadth-first search (BFS) in object parsing with AoG formulated under the Bayesian framework. The three processes are explicitly connected to the Bayesian inference and the dynamic programming (DP) implementation. With experiments on human face parsing and hierarchical image structure pars- ing, the results show performance improvement in the same manner consistent with their evaluated information contributions.

Next, to advance computational efficiency of learnt computing processes given allowable bounds on accuracy performance in cost-sensitive object detection, near-optimal decision policies are learnt for computing processes of terminal nodes and And-nodes by minimizing the corresponding risk function which explicitly takes into account the computational cost, the false negative (FN) and false positive (FP) loss costs. Finally, a theoretical study is proposed to schedule all the α-β-γ computing processes in an AoG under the best-first heuristic search framework to adapt computing orders of nodes in an AoG to different vision tasks and image datasets.