Learning Fluents for Task Representation

Yang Liu
PhD, 2019
Zhu, Songchun
This dissertation focuses on a crucial challenge which hasn't received enough attention in past years – fluents change. Fluents are time-varying attributes of an entity or a group of entities. Fluents describe the state change of humans, objects and environment, which provides the interpretable representation of them in video analysis and task planing, while popular method such as 3D convolution descriptor lacks such property and remains a black box to people. This dissertation is mainly divided into two parts. In the first part, we are discussing the nature of fluents, which can be further divided into appearance, geometry and topology. We have developed a generative model with encoder-decoder mechanics to map the fluents between image space and latent space. To disentangle fluents like appearance and geometry, we have designed different encoder-decoder networks. With this kind of design, the object in image will be mapped to appearance and geometric fluents vector separately. Moreover, in the geometric fluents space, we are trying to learn the intuitive physics with the synthesis datasets which includes object interactions like collision and gravity. In the second part, we have proposed a framework to represent the task from the perspective of fluents change, which is different from traditional approach. Task is a series of actions to finish certain goal, which can be represented by a group of fluents change. We have collected datasets in both real scene and VR scene, the experiments on both datasets demonstrate the strength of our methods.
2019