Skip to yearly menu bar Skip to main content


Poster

ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion

Yan Hong · Yuxuan Duan · Bo Zhang · Haoxing Chen · Jun Lan · Huijia Zhu · Weiqiang Wang · Jianfu Zhang

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Recent advancements in personalizing text-to-image (T2I) diffusion models have showcased their ability to generate images grounded in personalized visual concepts with just a few user-provided examples. However, these models often face challenges in preserving high visual fidelity, especially when adjusting scenes based on textual descriptions. To tackle this issue, we present ComFusion, an innovative strategy that utilizes pretrained models to create compositions of user-supplied subject images and predefined text scenes. ComFusion incorporates a class-scene prior preservation regularization, utilizing composites of subject class and scene-specific knowledge from pretrained models to boost generation fidelity. Moreover, ComFusion employs coarse-generated images to ensure they are in harmony with both the instance images and scene texts. Consequently, ComFusion maintains a delicate balance between capturing the subject's essence and ensuring scene fidelity. Extensive evaluations of ComFusion against various baselines in T2I personalization have demonstrated its qualitative and quantitative superiority.

Live content is unavailable. Log in and register to view live content