Skip to yearly menu bar Skip to main content


Poster

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

Bowen Zhang · Yiji Cheng · Chunyu Wang · Ting Zhang · Jiaolong Yang · Yansong Tang · Feng Zhao · Dong Chen · Baining Guo

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Tue 1 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract: We address the task of generating high-fidelity 3D avatars, represented as triplanes, from a frontal view portrait image. Existing methods struggle to capture intricate details such as cloth textures and hairstyles which we tackle in this paper. Specifically, we first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on a large number of avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we introduce a novel data scheduling strategy called task replay and a weight consolidation regularization term, which effectively improves the decoder's capability of rendering sharper details and unleashes the full power of triplanes for high-fidelity generation. Additionally, we maximize the guiding effect of the conditional portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention. When trained on $46K$ avatars with a noise schedule optimized for triplanes, the resulting model is capable of generating 3D avatars with notably better details than previous methods and can generalize to in-the-wild portrait input. See~\cref{fig:teaser} for some examples.

Live content is unavailable. Log in and register to view live content