Social Scene Understanding: Group Activity Parsing, Human-Robot Interactions, and Perception of Animacy

Tianmin Shu
PhD, 2019
Zhu, Song-chun
This dissertation proposes new computational frameworks to address three core challenges for social scene understanding — group activity parsing, human-robot interactions, and perception of animacy. The goal of these frameworks is to represent the underlying structure of social scenes and to unify the perception and concept learning of both physics and social behaviors. For this, we first develop a joint parsing of group activities that yields a hierarchical representations of groups, events, and human roles, which provides a holistic view of a social scene. In a follow up work, the idea of joint parsing is also shown to be effective for boosting the performance of deep neural networks on group activity recognition. Second, we formulate social affordances as a hierarchical representation of human interactions, which can be learned from a handful of RGB-D videos of human interactions. Based on the symbolic plans derived from the learned knowledge, we further design a real-time motion inference to enable motion transfer from human interactions to human-robot interactions, which generalizes well in unseen social scenarios. Finally, we study human perception of animacy by designing new approaches to generate Heider-Simmel animations and by developing new computational models to account for human physical and social perception. Particularly, we propose a unified framework for modeling physics and social behaviors through i) a joint physical-social simulation engine, ii) a joint physical and social concept learning as the pursuit of generalized coordinates and their potential energy functions, and iii) a unified psychological space that integrates intuitive physics and intuitive psychology.
2019