Poster
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju · Kaizhe Hu · Guowei Zhang · Gu Zhang · Mingrun Jiang · Huazhe Xu
# 209
Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among objects, which helps to naturally transfer the interaction experience of familiar objects to novel ones. Although robots lack such a reservoir of interaction experience, the vast availability of human videos on the Internet may serve as a resource, from which we extract an affordance memory of contact points. Inspired by the natural way humans think, We propose Robo-ABC: when confronted with unfamiliar objects that require generalization, the robot can acquire affordance by retrieving objects that share visual and semantic similarities from the memory, then mapping the contact points of the retrieved objects to the new object. While such correspondence may present formidable challenges at first glance, recent research finds it naturally arises from pre-trained diffusion models, enabling affordance mapping even across disparate categories. Through the Robo-ABC framework, robots can generalize to manipulate out-of-category objects in a zero-shot manner without any manual annotation, additional training, part segmentation, pre-coded knowledge, or viewpoint restrictions. Quantitatively, Robo-ABC significantly enhances the accuracy of visual affordance inference by a large margin of 28.7% compared to state-of-the-art (SOTA) end-to-end affordance models. We also conduct real-world experiments of cross-category object-grasping tasks and achieve a success rate of 85.7%, proving Robo-ABC's capacity for real-world tasks.