Skip to yearly menu bar Skip to main content


Oral

Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation

Duo Peng · Zhengbo Zhang · Ping Hu · Qiuhong Ke · David Yau · Jun Liu

[ ] [ Visit Oral 4C: Humans: Biometrics, Pose And Motion ] [ Paper ]
Wed 2 Oct 5 a.m. — 5:10 a.m. PDT

Abstract:

Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of an arbitrary unseen category in images, based on several provided examples of that category. This is a challenging task, as the limited data of unseen categories makes it difficult for models to generalize effectively. To address this challenge, previous methods typically train models on a set of predefined base categories with extensive annotations. In this work, we propose to harness rich knowledge in the off-the-shelf text-to-image diffusion model to effectively address CAPE, without training on carefully prepared base categories. To this end, we propose a Prompt Pose Matching (PPM) framework, which learns pseudo prompts corresponding to the keypoints in the provided few-shot examples via the text-to-image diffusion model. These learned pseudo prompts capture semantic information of keypoints, which can then be used to locate the same type of keypoints from images. We also design a Category-shared Prompt Training (CPT) scheme, to further boost our PPM's performance. Extensive experiments demonstrate the efficacy of our approach.

Chat is not available.