Poster
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu · Zhong Li · Zhang Chen · Nannan Li · Yi Xu · Bryan Plummer
# 225
Strong Double Blind |
Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited by simple correspondences or require extensive fine-tuning for complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts due to error accumulation without the need for fine-tuning. It improves error accumulation by enhancing cross-view awareness and refining the warping and inpainting processes through cross-view guidance, risky area estimation and erasing, and symmetric bidirectional guided generation for loop closure, alongside guidance-based semantic and density control for scene structure preservation. Evaluated on various panorama types—Planar, 360°, and Full Spherical Panoramas—PanoFree demonstrates significant error reduction, improved global consistency, and image quality across different scenarios without extra fine-tuning. Compared to existing methods, PanoFree is up to 5x more efficient in time and 3x more efficient in GPU memory usage, and maintains superior diversity of results (66.7% vs. 33.3% in our user study). PanoFree offers a viable alternative to costly fine-tuning or the use of additional pre-trained models.