Skip to yearly menu bar Skip to main content



Workshops
Workshop
Deblina Bhattacharjee

[ Amber 2 ]

Abstract
Workshop
Valerio Giuffrida

[ Panorama Lounge ]

Abstract
Workshop
Alexander Krull

[ Amber 5 ]

Abstract
Workshop
Kai Wang

[ Brown 2 ]

Abstract
Workshop
Giuseppe Serra

[ Amber 1 ]

Abstract
Workshop
Matej Kristan

[ Amber 4 ]

Abstract
Workshop
Tomas Hodan

[ Amber 7 + 8 ]

Abstract
Workshop
Niclas Zeller

[ Brown 1 ]

Abstract
Workshop
Miaomial Liu

[ Amber 6 ]

Abstract
Workshop
Leena Mathur

[ Suite 2 ]

Abstract
Workshop
Viorica Patraucean

[ Suite 7 ]

Abstract

Following the successful 2023 edition, we organise the second Perception Test Challenge to benchmark multimodal perception models on the Perception Test (blog, github) - a diagnostic benchmark created by Google DeepMind to comprehensively probe the abilities of multimodal models across:
* video, audio, and text modalities
* four skill areas: Memory, Abstraction, Physics, Semantics
* four types of reasoning: Descriptive, Explanatory, Predictive, Counterfactual
* six computational tasks: multiple-choice video-QA, grounded video-QA, object tracking, point tracking, action localisation, sound localisation

Workshop
Francesca Palermo

[ Suite 9 ]

Abstract

As Smart Eyewear devices become increasingly prevalent, optimizing their functionality and user experience through sophisticated computer vision applications is crucial. These devices must not only effectively process real-time data but also operate under power and computational constraints while ensuring user privacy and ethical standards are upheld.x000D
x000D
The "Eyes of the Future: Integrating Computer Vision in Smart Eyewear (ICVSE)" workshop, at ECCV 2024, aims to advance the field of Smart Eyewear by integrating cutting-edge computer vision technologies. This workshop addresses the need to bridge theoretical research and practical implementations in Smart Eyewear, a technology that will transform user interactions in everyday life through enhanced perception and augmented reality experiences.x000D
x000D
The need for this workshop stems from the rapid advancements in both computer vision and wearable technology sectors, necessitating a dedicated forum where interdisciplinary insights and experiences can be shared to accelerate practical applications. Thus, ICVSE not only aims to showcase novel research but also to inspire a roadmap for future developments in Smart Eyewear technology.

Workshop
Michael Dorkenwald

[ Space 2 ]

Abstract

From GPT to DINO to diffusion models, the past years have seen major advances in self-supervised learning, with many new methods reaching astounding performances on standard benchmarks. Still, the field of SSL is rapidly evolving with new learning paradigms coming up at an unprecedented speed. At the same time, works on coupled data, such as image-text pairs, have shown large potential in producing even stronger models capable of zero-shot tasks and benefiting from the methodology developed in SSL. Despite this progress, it is also apparent that there are still major unresolved challenges and it is not clear what the next step is going to be. In this workshop, we want to highlight and provide a forum to discuss potential research directions, from radically new self-supervision tasks, data sources, and paradigms to surprising counter-intuitive results. Through invited speakers and paper oral talks, our goal is to provide a forum to discuss and exchange ideas where both the leaders in this field, as well as the new, younger generation, can equally contribute to discussing the future of this field.

Workshop
Vivek Sharma

[ Suite 4 ]

Abstract

The focus of this workshop is to bring together researchers from industry and academia who focus on both distributed and privacy-preserving machine learning for vision and imaging. These topics are of increasingly large commercial and policy interest. It is therefore important to build a community for this research area, which involves collaborating researchers that share insights, code, data, benchmarks, training pipelines, etc., and together aim to improve the state of privacy in computer vision.

Workshop
Radu Timofte

[ Amber 1 ]

Abstract
Workshop
Jan-Nico Zaech

[ Suite 2 ]

Abstract
Workshop
Roberto Pierdicca

[ Suite 6 ]

Abstract
Workshop
Francis Engelmann · Zuria Bauer

[ Amber 4 ]

Abstract

The ability to perceive, understand and interact with arbitrary 3D environments is a long-standing goal in research with applications in AR/VR, health, robotics and so on. Current 3D scene understanding models are largely limited to low-level recognition tasks such as object detection or semantic segmentation, and do not generalize well beyond the a pre-defined set of training labels. More recently, large visual-language models (VLM), such as CLIP, have demonstrated impressive capabilities trained solely on internet-scale image-language pairs. Some initial works have shown that these models have the potential to extend 3D scene understanding not only to open set recognition, but also offer additional applications such as affordances, materials, activities, and properties of unseen environments. The goal of this workshop is to bundle these efforts and to discuss and establish clear task definitions, evaluation metrics, and benchmark datasets.

Workshop
Robin Hesse

[ Brown 1 ]

Abstract

Deep neural networks (DNNs) are an essential component in the field of computer vision and achieve state-of-the-art results in almost all of its sub-disciplines. While DNNs excel at predictive performance, they are often too complex to be understood by humans, leading to them often being referred to as “black-box models”. This is of particular concern when DNNs are applied in safety-critical domains such as autonomous driving or medical applications. With this problem in mind, explainable artificial intelligence (XAI) aims to gain a better understanding of DNNs, ultimately leading to more robust, fair, and interpretable models. To this end, a variety of different approaches, such as attribution maps, intrinsically explainable models, and mechanistic interpretability methods, have been developed. While this important field of research is gaining more and more traction, there is also justified criticism of the way in which the research is conducted. For example, the term “explainability” in itself is not properly defined and is highly dependent on the end user and the task, leading to ill-defined research questions and no standardized evaluation practices. The goals of this workshop are thus two-fold:x000D
x000D
1. Discussion and dissemination of ideas at the cutting-edge of XAI research (“Where are we?”) …

Workshop
Jean-Baptiste Weibel

[ Suite 8 ]

Abstract
Workshop
Yan Wang

[ Brown 2 ]

Abstract
Workshop
Andrea Fusiello

[ Space 2 ]

Abstract
Workshop
Lucia Schiatti

[ Tower Lounge ]

Abstract
Workshop
iuri frosio

[ Amber 5 ]

Abstract

Our scope is to bring together people working in Computer Vision (CV) and, more broadly speaking, Artificial Intelligence (AI), to talk about the adoption of CV/AI methods for videogames, that represent a large capital market within creative industries and a crucial domain for AI research at the same time. Our workshop will cover various aspects of videogames development and consumption, ranging from game creation, game servicing, player experience management, to bot creation, cheat detection, and human computer interaction mediated by large language models. We believe that focusing on CV for videogames will bring together cohesively related works with foreseeable and practical impact on today’s market, thus we will give priority to submissions specifically devoted to the application of state of the art CV/AI methods FOR videogames, while we will assign lower priority to submissions on the adoption of videogames as test beds for the creation and testing of CV/AI methods. We also plan to favour the presentation of novel datasets that can sparkle further research in this field.x000D
x000D
The committee and keynotes includes multiple genders, researchers with origins from different geographical areas (USA, EU, Asia), from both industry (NVIDIA, Activision, Blockade Labs, Microsoft, Snap) and academia (Universities of …

Workshop
Andrea Fusiello

[ Amber 2 ]

Abstract
Workshop
Federico Becattini

[ Panorama Lounge ]

Abstract
Workshop
Diego Garcia-Olano

[ Suite 3 ]

Abstract
Workshop
Hataya Ryuichiro

[ Suite 4 ]

Abstract
Workshop
Shiqi Yang

[ Suite 9 ]

Abstract

n recent years, we have witnessed significant advancements in the field of visual generation which have molded the research landscape presented in computer vision conferences such as ECCV, ICCV, and CVPR. However, in a world where information is conveyed through a rich tapestry of sensory experiences, the fusion of audio and visual modalities has become much more essential for understanding and replicating the intricacies of human perception and diverse real-world applications. Indeed, the integration of audio and visual information has emerged as a critical area of research in computer vision and machine learning, having numerous applications across various domains, from immersive gaming environments to lifelike simulations for medical training, such as multimedia analysis, virtual reality, advertisement and cinematic application. x000D
x000D
Despite these strong motivations, little attention has been given to research focusing on understanding and generating audio-visual modalities compared to traditional, vision-only approaches and applications. Given the recent prominence of multi-modal foundation models, embracing the fusion of audio and visual data is expected to further advance current research efforts and practical applications within the computer vision community, which makes this workshop an encouraging addition to ECCV that will catalyze advancements in this burgeoning field.x000D
x000D
In this workshop, …

Workshop
Hongxu Yin

[ Brown 3 ]

Abstract
Workshop
Stuart James · Peter Bell

[ Amber 1 ]

Abstract
Workshop
Marco Cotogni

[ Suite 5 ]

Abstract

In an era of rapid advancements in Artificial Intelligence, the imperative to fosterx000D
Trustworthy AI has never been more critical. The first “Trust What You learNx000D
(TWYN)” workshop seeks to create a dynamic forum for researchers, practition-x000D
ers, and industry experts to explore and advance the intersection of Trustworthyx000D
AI and DeepFake Analysis within the realm of Computer Vision. The workshopx000D
aims to delve into the multifaceted dimensions of building AI systems that arex000D
not only technically proficient but also ethical, transparent, and accountable.x000D
The dual focus on Trustworthy AI and DeepFake Analysis reflects the work-x000D
shop’s commitment to addressing the challenges posed by the proliferation ofx000D
deep fake technologies while simultaneously promoting responsible AI practices.

Workshop
Yiming Li

[ Suite 9 ]

Abstract
Workshop
Wei-Chiu Ma

[ Brown 3 ]

Abstract
Workshop
Giuseppe Fiameni

[ Amber 2 ]

Abstract
Workshop
Martin R Oswald

[ Amber 6 ]

Abstract
Workshop
Andrea Pilzer

[ Brown 2 ]

Abstract

This \textbf{UNcertainty quantification for Computer Vision (UNCV)} Workshop %aims to raise awareness about models, data, and prediction uncertainties to the vision communityx000D
aims to raise awareness and generate discussion regarding how predictive uncertainty can, and should, be effectively incorporated into models within the vision community. The workshop will bring together experts from machine learning and computer vision to create a new generation of well-calibrated and effective methods that \emph{`know when they do not know'}.

Workshop
Hao Yan

[ Tower Lounge ]

Abstract
Workshop
Vasileios Belagiannis

[ Suite 2 ]

Abstract
Workshop
Mohamed Elhoseiny

[ Suite 8 ]

Abstract
Workshop
Tzofi Klinghoffer

[ Panorama Lounge ]

Abstract

Neural fields have been widely adopted for learning novel view synthesis and 3D reconstruction from RGB images by modeling transport of light in the visible spectrum. This workshop focuses on neural fields beyond conventional cameras, including (1) learning neural fields from data from different sensors across the electromagnetic spectrum and beyond, such as lidar, cryo-electron microscopy (cryoEM), thermal, event cameras, acoustic, and more, and (2) modeling associated physics-based differentiable forward models and/or the physics of more complex light transport (reflections, shadows, polarization, diffraction limits, optics, scattering in fog or water, etc.). Our goal is to bring together a diverse group of researchers using neural fields across sensor domains to foster learning and discussion in this growing area.

Workshop
Andre Araujo

[ Amber 5 ]

Abstract
Workshop
Aron Monszpart

[ Suite 6 ]

Abstract

The Map-free Visual Relocalization workshop investigates topics related to metric visual relocalization relative to a single reference image instead of relative to a map. This problem is of major importance to many higher level applications, such as Augmented/Mixed Reality, SLAM and 3D reconstruction. It is important now, because both industry and academia are debating whether and how to build HD-maps of the world for those tasks. Our community is working to reduce the need for such maps in the first place.x000D
x000D
We host the first Map-free Visual Relocalization Challenge 2024 competition with two tracks: map-free metric relative pose from a single image to a single image (proposed by Arnold et al. in ECCV 2022) and from a query sequence to a single image (new). While the former is a more challenging and thus interesting research topic, the latter represents a more realistic relocalization scenario, where the system making the queries may fuse information from query images and tracking poses over a short amount of time and baseline. We invite papers to be submitted to the workshop.

Workshop
Shangzhe Wu

[ Suite 4 ]

Abstract
Workshop
Zane Durante

[ Amber 7 + 8 ]

Abstract
Workshop
Ashkan Khakzar

[ Amber 5 ]

Abstract
Workshop
Lucia Cascone

[ Space 2 ]

Abstract
Workshop
Ahmad Sajedi

[ Amber 2 ]

Abstract
Workshop
Yiming Wang

[ Suite 9 ]

Abstract
Workshop
Yichen Li

[ Brown 3 ]

Abstract
Workshop
Henghui Ding

[ Suite 6 ]

Abstract
Workshop
Despoina Paschalidou

[ Brown 1 ]

Abstract
Workshop
Dimitrios Kollias

[ Suite 5 ]

Abstract
Workshop
Antonio Alliegro

[ Suite 2 ]

Abstract
Workshop
Mamatha Thota

[ Panorama Lounge ]

Abstract
Workshop
Lu Fang

[ Suite 3 ]

Abstract
Workshop
Anand Bhattad

[ Brown 2 ]

Abstract
Workshop
Yao Feng

[ Tower Lounge ]

Abstract
Workshop
Linlin Yang

[ Suite 8 ]

Abstract

Our HANDS workshop will gather vision researchers working on perceiving hands performing actions, including 2D & 3D hand detection, segmentation, pose/shape estimation, tracking, etc. We will also cover related applications including gesture recognition, hand-object manipulation analysis, hand activity understanding, and interactive interfaces. x000D
x000D
The eighth edition of this workshop will emphasize the use of large foundation models (e.g., CLIP, Point-E, Segment Anything, Latent Diffusion Models) for hand-related tasks. These models have revolutionized the perceptions of AI, and demonstrate groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. However, there remains an untapped potential for exploring their applications in hand-related tasks. Our offical website is https://hands-workshop.org.