Skip to yearly menu bar Skip to main content


Poster

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

Yi Yao · Chan-Feng Hsu · Jhe-Hao Lin · Hongxia Xie · Terence Lin · Yi-Ning Huang · Hong-Han Shuai · Wen-Huang Cheng

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

In spite of recent advancements in text-to-image generation, it still has limitations when it comes to complex, imaginative text prompts. Due to the limited exposure to diverse and complex data in their training sets, text-to-image models often struggle to comprehend the semantics of these difficult prompts, leading to the generation of irrelevant images. This work explores how diffusion models can process and generate images based on prompts requiring artistic creativity or specialized knowledge. Recognizing the absence of a dedicated evaluation framework for such tasks, we introduce a new benchmark, the Realistic-Fantasy Benchmark (RFBench), which blends scenarios from both realistic and fantastical realms. Accordingly, for reality and fantasy scene generation, we propose an innovative training-free approach, Realistic-Fantasy Network (RFNet), that integrates diffusion models with LLMs. Through our proposed RFBench, extensive human evaluations coupled with GPT-based compositional assessments have demonstrated our approach's superiority over other state-of-the-art methods.

Live content is unavailable. Log in and register to view live content