Skip to yearly menu bar Skip to main content


Poster

M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models

Seunggeun Chi · Hyung-gun Chi · Hengbo Ma · Nakul Agarwal · Faizan Siddiqui · Karthik Ramani · Kwonjoon Lee

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

We introduce the \textbf{M}ulti-\textbf{M}otion \textbf{D}iscrete \textbf{D}iffusion \textbf{M}odel (M2D2M), a novel approach in human motion generation from action descriptions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions and coherence across a series of actions. The strength of M2D2M lies in its dynamic transition probability within the discrete diffusion model, which adapts transition probabilities based on the proximity between motion tokens, facilitating nuanced and context-sensitive human motion generation. Complemented by a two-phase sampling strategy that includes independent and joint denoising steps, M2D2M effectively generates long-term, smooth, and contextually coherent human motion sequences, utilizing a model initially trained for single-motion generation. Extensive experiments demonstrate that M2D2M surpasses current state-of-the-art benchmarks in motion generation from the text tasks, showcasing its efficacy in interpreting language semantics and generating dynamic, realistic motions.

Live content is unavailable. Log in and register to view live content