Skip to yearly menu bar Skip to main content


Oral

FMBoost: Boosting Latent Diffusion with Flow Matching

Johannes Schusterbauer-Fischer · Ming Gui · Pingchuan Ma · Nick Stracke · Stefan Andreas Baumann · Tao Hu · Bjorn Ommer

[ ] [ Visit Oral 6A: Generative Models II ] [ Paper ]
Thu 3 Oct 4:50 a.m. — 5 a.m. PDT

Abstract: Visual synthesis has recently seen significant leaps in performance, inparticular due to breakthroughs in generative models. Diffusion models have been a key enabler as they excel in image diversity. This, however, comes at the prize of slow training and synthesis which are only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate that introducing flow matching between a frozen diffusion model and convolutional decoder enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then provide the necessary visual diversity effectively, while flow matching efficiently enhances resolution and details by mapping the small to a high-dimensional latent space. These latents are then projected to high-resolution images by the subsequent convolutional decoder of the diffusion approach. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, achieves state-of-the-art high-resolution image synthesis at $1024^2$ pixels with minimal computational cost. Cascading our model optionally boosts this further to $2048^2$ pixels. Importantly, our approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.

Chat is not available.