Poster

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation

Shanyan Guan · Yanhao Ge · Ying Tai · Jian Yang · Wei Li · Mingyu You

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Paper PDF [ Supplemental]

Abstract

Subject-driven generation for text-to-image diffusion models aims to encode and inverse specific textual prompts in order to generate personalized images with particular content. Previous studies successfully achieved such a goal by using an optimization-based textual inversion or direct-regression-based concept encoding strategy. However, there are still challenges on how to realize fast and effective prompt inversion while guaranteeing the generalization of the original diffusion models. Motivated by the advantages of both optimization-based and direct-regression-based methods, in this study we proposed a novel hybrid prompt inversion framework called ~\name~ to achieve efficient subject-driven generation of text-to-image diffusion models. In detail, we address the limitations caused by the current optimization-based and direct-regression-based methods by designing a novel hybrid prompt inversion framework and combining it with a mask-guided multi-word text encoding module to enable a fast and robust prompt inversion, additionally, we import a hybrid textual feature fusion module to enhance the representation of the textual feature during learning. As a result, our framework manages to inverse arbitrary visual concepts to a pre-trained diffusion model in an effective and fast way even learning from a single image, and maintaining the general generation ability of the original model. The extensive experiments reveal the effectiveness of our method.

Chat is not available.