Skip to yearly menu bar Skip to main content


Poster

Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation

Jinpeng Liu · Wenxun Dai · Chunyu Wang · Yiji Cheng · Yansong Tang · Xin Tong

[ ]
Wed 2 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To address these issues, we present a divide-and-conquer framework named PRO-Motion, Plan, postuRe and gO for text-to-Motion generation, which consists of three modules as motion planner, posture-diffuser and go-diffuser. The motion planner instructs Large Language Models (LLMs) to generate a sequence of scripts describing the key postures in the target motion. Differing from natural languages, the scripts can describe all possible postures following very simple text templates. This significantly reduces the complexity of posture-diffuser, which transforms a script to a posture, paving the way for open-vocabulary text-to-motion generation. Finally, the go-diffuser, implemented as another diffusion model, not only increases the motion frames but also estimates the whole-body translations and rotations for all postures, resulting in more dynamic motions. Experimental results have shown the superiority of our method with other counterparts, and demonstrated its capability of generating diverse and realistic motions from complex open-vocabulary prompts such as “Experiencing a profound sense of joy”.

Live content is unavailable. Log in and register to view live content