November 01, 2021
Complex physical tasks entail a sequence of object interactions, each with its own preconditions -- which can be difficult for robotic agents to learn efficiently solely through their own experience. We introduce an approach to discover activitycontext priors from in-the-wild egocentric video captured with human worn cameras. For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e.g., a knife and cutting board brought together with a tomato are conducive to cutting). We encode our video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. In this way, our model translates everyday human experience into embodied agent skills. We demonstrate our idea using egocentric EPIC-Kitchens video of people performing unscripted kitchen activities to benefit virtual household robotic agents performing various complex tasks in AI2-iTHOR, significantly accelerating agent learning.
Publisher
NeurIPS
April 30, 2024
Mikayel Samvelyan, Minqi Jiang, Davide Paglieri, Jack Parker-Holder, Tim Rocktäschel
April 30, 2024
January 06, 2024
Geng Ji, Wentao Jiang, Jiang Li, Fahmid Morshed Fahid, Zhengxing Chen, Yinghua Li, Jun Xiao, Chongxi Bao, Zheqing (Bill) Zhu
January 06, 2024
December 11, 2023
Dishank Bansal, Ricky Chen, Mustafa Mukadam, Brandon Amos
December 11, 2023
December 10, 2023
December 10, 2023
Product experiences
Foundational models
Product experiences
Latest news
Foundational models