Skip to yearly menu bar Skip to main content


Poster

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li · Junfeng Wu · Weizhi Zhao · Song Bai · Xiang Bai

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ]
Thu 3 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling PartGLEE to recognize a rich variety of parts. We conduct comprehensive empirical studies to validate the effectiveness of our method, PartGLEE achieves the state-of-the-art performance across various part-level tasks and maintain comparable results on object-level tasks. Our further analysis indicates that the hierarchical cognitive ability of PartGLEE is able to facilitate a detailed comprehension in images for mLLMs. Code will be released.

Live content is unavailable. Log in and register to view live content