Skip to yearly menu bar Skip to main content


Poster

VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting

Renjie Li · Zhiwen Fan · Bohua Wang · Peihao Wang · Zhangyang Wang · Xi Wu

# 330
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ] [ Paper PDF ]
Thu 3 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

The acquisition of multi-task (MT) labels in 3D scenes is crucial for a wide range of real-world applications. Traditional methods generally employ an analysis-by-synthesis approach, generating 2D label maps on novel synthesized views, or utilize Neural Radiance Field (NeRF), which concurrently represents label maps. Yet, these approaches often struggle to balance inference efficiency with MT label quality. Specifically, they face limitations such as (a) constrained rendering speeds due to NeRF pipelines, and (b) the implicit representation of MT fields that can result in continuity artifacts during rendering. Recently, 3D Gaussian Splatting has shown promise in achieving real-time rendering speeds without compromising rendering quality. In our research, we address the challenge of enabling 3D Gaussian Splatting to represent Versatile MT labels. Simply attaching MT attributes to explicit Gaussians compromises rendering quality due to the lack of cross-task information flow during optimization. We introduce architectural and rasterizer design to effectively overcome this issue. Our VersatileGaussian model innovatively associates Gaussians with shared MT features and incorporates a feature map rasterizer. The cornerstone of this versatile rasterization is the Task Correlation Attention module, which fosters cross-task correlations through a soft weighting mechanism that disseminates task-specific knowledge. Across experiments on the ScanNet and Replica datasets shows that VersatileGaussian not only sets a new benchmark in MT accuracy but also maintains real-time rendering speeds (35 FPS). Importantly, this model design facilitates mutual benefits across tasks, leading to improved quality in novel view synthesis.

Chat is not available.