Skip to yearly menu bar Skip to main content


The Second Perception Test Challenge

Viorica Patraucean

Sun 29 Sep, midnight PDT

Keywords:  Multimodal  

Following the successful 2023 edition, we organise the second Perception Test Challenge to benchmark multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:
* video, audio, and text modalities
* four skill areas: Memory, Abstraction, Physics, Semantics
* four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
* six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

Live content is unavailable. Log in and register to view live content