Skip to yearly menu bar Skip to main content


Poster

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou · Meiqin Liu · Senlin Zhang · Ping Wei · Badong Chen · Xuguang Lan

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ]
Tue 1 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in Transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment object detection, following the verification of its statistical significance using a proposed quantitative macrosopic correlation (MC) metric. Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinment, which further extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a significant improvement (+2.0% AP) on COCO 2017 compared to DINO, the previous highly optimized DETR detector, and achieves state-of-the-art performance (reaching 51.7% at 12 epochs and 52.1% AP at 24 epochs with a ResNet-50 backbone, respectively). Moreover, Relation-DETR exhibits a remarkably fast convergence speed, achieving over 40% AP with only 2 training epochs on COCO 2017 using the basic ResNet50 backbone, suppressing existing DETR detectors under the same settings. Furthermore, the proposed relation encoder serves as a universal plug-in-and-play component, bringing clear improvements for theoretically any DETR-like methods. The code will be available.

Live content is unavailable. Log in and register to view live content