Poster
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li · Junfeng Wu · Weizhi Zhao · Song Bai · Xiang Bai
# 142
Strong Double Blind |
We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts. By incorporating a large amount of object-level data, the hierarchical relationships can be extended, enabling PartGLEE to recognize a rich variety of parts. We conduct comprehensive empirical studies to validate the effectiveness of our method, PartGLEE achieves the state-of-the-art performance across various part-level tasks and maintain comparable results on object-level tasks. Our further analysis indicates that the hierarchical cognitive ability of PartGLEE is able to facilitate a detailed comprehension in images for mLLMs. Code will be released.