Robot Learning from Interactions with Physics-realistic Environment: Constructing Big Task Platform for Training AI Agents

Xie Xu
PhD, 2021
Zhu, Song-Chun
Robot learning from interactions is a crucial topic in the joint field of computer vision, robotics, and machine learning. Interactions are ubiquitous in daily life, concrete instances comprise object-object, robot-object, and robot-robot interactions. Learning from interactions to an intelligent robot system is important because it helps the robot to generate a sense of physics, meanwhile planning and acting reasonably. To achieve this purpose, one primary challenge that remains in the community is the absence of dataset that can be leveraged to study the diverse categories of interactions. To create those datasets, the interaction data should be realistic such that it reflects the underlying physical process. Further, we argue that learning interactions through simulations is a promising approach to synthesize and scale up diverse forms of interactions. This dissertation focuses on robot learning from interactions in Mixed Reality (MR) as well as leveraging the state-of-the-art physical simulation to construct virtual environments to afford Big Tasks. There are four major contributions along this pathway:

1. Robot learning object manipulation skills from human demonstrations. Instead of directly learning from a robot-object manipulation dataset that is hard to generalize, we alternatively seek an approach to create a human-object manipulation dataset and let the robot learn from the demonstration. We claim that the key attribute of building such dataset embodies the realistic hand-object interaction that involves a setup that can faithfully capture the fine-grained raw motion signals. This leads us to develop a tactile glove system and collect informative spatial-temporal sensory data during hand manipulations. An event parsing pipeline is proposed upon the hand interactions that are transferable to the robot's end and learn the manipulation skill.

2. A virtual testbed to construct rich interactive tasks. The major limitation of collecting real-world interaction data can be summarized as three folds: i) a specific setup is needed to trace one form of interaction, ii) amount of efforts need to spend on data cleaning and labeling, and iii) a single dataset is not capable to capture different modalities of interactions at the same time. To overcome those issues, we propose and develop a virtual testbed, VRGym platform, for realistic human-robot interactive tasks (Big Tasks). In VRGym, the pipelines we developed are able to synthesize diverse photo-realistic 3D scenes that incorporate various forms of interactions through physics-based simulation. Given available rich interactions, we expect to grow a general-purpose agent from the interactive tasks and advance the research areas of robotics, machine learning as well as cognitive science.

3. Robot learning from imperfect demonstrations — small data. In the area of learning from demonstration, interacting with objects, one essential element is the creation of expert demonstrations. However, non-trivial efforts are needed when collecting those demonstrations and a large portion of them contains failure cases. We develop the demonstration setup for learning objects grasping skills upon VRGym platform with VR human interfaces. Human performers interact with the virtual scene by teleoperating the virtual robot arm. At the same time, the demonstration is evaluated through physics simulation such that even a perfect task plan may fail during the execution. Given the sparsity of demonstrations, we think the failed ones are valuable in addition to the perfect demonstration. This enlightens us to exploit the implicit characteristics of small data in the presence of imperfect demonstrations.

4. A game platform for large-scale social interactions. Social interactions are another important branch that goes beyond physical only interactions. To develop a general-purpose agent, it has to properly infer other agents motion or intentions and applies socially acceptable behaviors when interacting in the scene. Inspired by those facts, we leverage a popular computer game platform, Grand Theft Auto (GTA), to automatically construct fruitful realistic social interactions in the simulated urban scenarios. The city transportation system, including vehicles and pedestrians, can be fully controlled by the developed modding scripts. The GTA platform is a supplement to VRGym that extends robot learning from interactions to a larger scale. We utilize it to synthesize multi-vehicle driving scenarios and study the problem of trajectories prediction as to the basis of intentions inference. We highlight the safety aspect by predicting collision-free trajectories that accord with the social norm for vehicle driving.

2021