Robot Learning from Interactions with Physics-realistic Environment: Constructing Big Task Platform for Training AI Agents
1. Robot learning object manipulation skills from human demonstrations. Instead of directly learning from a robot-object manipulation dataset that is hard to generalize, we alternatively seek an approach to create a human-object manipulation dataset and let the robot learn from the demonstration. We claim that the key attribute of building such dataset embodies the realistic hand-object interaction that involves a setup that can faithfully capture the fine-grained raw motion signals. This leads us to develop a tactile glove system and collect informative spatial-temporal sensory data during hand manipulations. An event parsing pipeline is proposed upon the hand interactions that are transferable to the robot's end and learn the manipulation skill.
2. A virtual testbed to construct rich interactive tasks. The major limitation of collecting real-world interaction data can be summarized as three folds: i) a specific setup is needed to trace one form of interaction, ii) amount of efforts need to spend on data cleaning and labeling, and iii) a single dataset is not capable to capture different modalities of interactions at the same time. To overcome those issues, we propose and develop a virtual testbed, VRGym platform, for realistic human-robot interactive tasks (Big Tasks). In VRGym, the pipelines we developed are able to synthesize diverse photo-realistic 3D scenes that incorporate various forms of interactions through physics-based simulation. Given available rich interactions, we expect to grow a general-purpose agent from the interactive tasks and advance the research areas of robotics, machine learning as well as cognitive science.
3. Robot learning from imperfect demonstrations — small data. In the area of learning from demonstration, interacting with objects, one essential element is the creation of expert demonstrations. However, non-trivial efforts are needed when collecting those demonstrations and a large portion of them contains failure cases. We develop the demonstration setup for learning objects grasping skills upon VRGym platform with VR human interfaces. Human performers interact with the virtual scene by teleoperating the virtual robot arm. At the same time, the demonstration is evaluated through physics simulation such that even a perfect task plan may fail during the execution. Given the sparsity of demonstrations, we think the failed ones are valuable in addition to the perfect demonstration. This enlightens us to exploit the implicit characteristics of small data in the presence of imperfect demonstrations.
4. A game platform for large-scale social interactions. Social interactions are another important branch that goes beyond physical only interactions. To develop a general-purpose agent, it has to properly infer other agents motion or intentions and applies socially acceptable behaviors when interacting in the scene. Inspired by those facts, we leverage a popular computer game platform, Grand Theft Auto (GTA), to automatically construct fruitful realistic social interactions in the simulated urban scenarios. The city transportation system, including vehicles and pedestrians, can be fully controlled by the developed modding scripts. The GTA platform is a supplement to VRGym that extends robot learning from interactions to a larger scale. We utilize it to synthesize multi-vehicle driving scenarios and study the problem of trajectories prediction as to the basis of intentions inference. We highlight the safety aspect by predicting collision-free trajectories that accord with the social norm for vehicle driving.