Skip to yearly menu bar Skip to main content


Poster

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang · Minghao Chen · Qibo Qiu · Jiahao WU · Wenxiao Wang · Binbin Lin · Ziyu Guan · Xiaofei He

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain visual recognition. However, collecting data on robots executing various language instructions across multiple environments remains a challenge. This paper aims to transfer video-language models with robust generalization into a generalizable language-conditioned reward function, only utilizing robot video data from a minimal amount of tasks in a singular environment. Unlike common robotic datasets to train reward functions, human video-language datasets seldom include trivial failure videos. To enhance the model's ability to discriminate between successful and failed robot executions, we cluster failure data with the aspiration that the model identifies patterns in failure videos. For each cluster, we incorporate a newly trained failure prompt into the text encoder to bolster its performance in distinguishing failure from success in robot task executions. Our language-conditioned reward function shows outstanding generalization to new environments and new instructions for robot planning and reinforcement learning.

Live content is unavailable. Log in and register to view live content