Skip to yearly menu bar Skip to main content


Poster

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren · Yang Zhou · Jimei Yang · Jing Shi · Difan Liu · Feng Liu · Mingi Kwon · Abhinav Shrivastava

[ ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated. To address the challenge of one-shot motion customization, we propose Customize-A-Video that models the motion from a single reference video and adapting it to new subjects and scenes with both spatial and temporal varieties. It leverages low-rank adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V diffusion model for specific motion modeling from the reference videos. To disentangle the spatial and temporal information during the training pipeline, we introduce a novel concept of appearance absorbers that detach the original appearance from the single reference video prior to motion learning. The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions of our method to various downstream tasks such as custom video generation and editing, video appearance customization and multiple motion combination.

Live content is unavailable. Log in and register to view live content