ECCV Poster DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

Poster

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

Le Yang · Ziwei Zheng · Yizeng Han · Hao Cheng · Shiji Song · Gao Huang · Fan Li

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

[ Abstract ] [ Paper PDF ]

[ Poster] [ Supplemental]

2024 Poster

Abstract:

Recent neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the recent successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at different timestamps. Based on DFA, the proposed dynamic encoder layer can aggregate the temporal features within the action timestamps and guarantee the discriminability of the extracted representations. Moreover, using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters and learned receptive fields to better detect the action instances with diverse ranges from the complex scenes. With the proposed encoder layers and DyHead, the new dynamic TAD model, DyFADet, achieves promising performance on a series of challenging TAD benchmarks, including HACS-Segment, THUMOS14, ActivityNet-1.3, Epic-Kitchen~100, Ego4D-Moment QueriesV1.0, and FineAction. The code is released to \url{https://Anonymous}.

Chat is not available.