Skip to yearly menu bar Skip to main content


Poster

Revisit Self-supervision with Local Structure-from-Motion

Shengjie Zhu · Xiaoming Liu

# 93
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Paper PDF ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract: Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrated RGB or RGB-D images, we employ a depth and correspondence estimator to infer depthmaps and pair-wise correspondence maps. Then, a novel bundle-RANSAC-adjustment algorithm jointly optimizes camera poses and one depth adjustment for each depthmap. Finally, we fix camera poses and employ a NeRF, however, without a neural network, for dense triangulation and geometric verification. Poses, depth adjustments, and triangulated sparse depths are our outputs. For the first time, we show self-supervision within $5$ frames already benefits SoTA supervised depth and correspondence models. Despite self-supervision, we outperform COLMAP in pose accuracy and robustness. Finally, our method enables NeRF over arbitrary short videos. Codes and models will be released.

Chat is not available.