Skip to yearly menu bar Skip to main content


Poster

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Jianhao Li · Tianyu Sun · Zhongdao Wang · Enze Xie · Bailan Feng · Hongbo Zhang · Ze Yuan · Ke Xu · Jiaheng Liu · Ping Luo

# 204
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
[ Poster
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a \emph{Segment, Lift, and Fit} (SLF) paradigm to achieve this goal. Firstly, we \emph{segment} high-quality instance masks from the prompts using the Segment Anything Model ({SAM}) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then \emph{lift} 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections \emph{fit} the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an AP@0.5 IOU of near 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.

Chat is not available.