Skip to yearly menu bar Skip to main content


Poster

HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization

Sakib Reza · Yuexi Zhang · Mohsen Moghaddam · Octavia Camps

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Online video understanding often relies on processing individual frames, leading to frame-by-frame predictions. Recent advancements extend this approach to instance-level predictions, such as Online Temporal Action Localization (On-TAL). However, existing methods mainly focus on short-term context, neglecting historical information. To address this, we introduce the History-Augmented Anchor Transformer (HAT) Framework for OnTAL. By integrating historical context, our framework enhances the synergy between long-term and short-term information, improving the quality of anchor features crucial for classification and localization. We evaluate our model on both procedural egocentric (PREGO) datasets (EGTEA and EPIC) and standard non-PREGO OnTAL datasets (THUMOS and MUSES). Results show that our model outperforms state-of-the-art approaches significantly on PREGO datasets and achieves comparable or slightly superior performance on non-PREGO datasets, underscoring the importance of leveraging long-term history, especially in procedural and egocentric action scenarios. Code has been made available anonymously at: https://github.com/kfsjf/HAT-OnTAL.

Live content is unavailable. Log in and register to view live content