Skip to yearly menu bar Skip to main content


Poster

All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation

Seongho Kim · Byung Cheol Song

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

With the rise of generative models, multi-modal video generation has gained significant attention, particularly in the realm of audio-driven emotional talking face synthesis. This paper addresses two key challenges in this domain: input bias and intensity saturation. A neutralization scheme is proposed to counter input bias, yielding impressive results in generating neutral talking faces from emotionally expressive ones. Furthermore, 2D continuous emotion label-based regression learning effectively generates varying emotional intensities through frame-wise comparisons. Results from a user study quantify subjective interpretations of strong emotions and naturalness, revealing up to 78.09% higher emotion accuracy and up to 3.41 higher naturalness compared to the lowest-ranked method.

Live content is unavailable. Log in and register to view live content