Learning and Inferring Causality from Video

Amy Fire, Song-Chun Zhu

We present a novel probabilistic and graphical representation for the inference of causal events in video, such as determining what action or event caused a door to open. To acquire causal knowledge, a joint spatio-temporal-causal model is learned in an unsupervised way via information projection pursuit, augmenting a spatio-temporal description of events with a causal relation at each iteration. Analytic formulas for the pursuit process are provided, both for the parameters of the model and for the information gain at each iteration. After the joint model is built, the learned causal relations are compiled into a novel representation for causality, the Causal And-Or Graph (C-AOG). Bayesian inference on the C-AOG is used to reason through questions, answering why objects changed status. For inference of “why not” questions, the dual C-AOG, akin to logical negation, is proposed. In experiments, the methods correctly learn causal relations, attributing status changes of objects to causing actions amid confusing actions. The methods are explored by adding incorrect action detections, varying durations used to calculate candidate causal relations, and changing the numbers of examples used. The methods presented outperform χ2 by considering hierarchical action selection and outperform the causal effect by discounting coincidental relationships.
2013-05-08