Skip to yearly menu bar Skip to main content


Poster

Multi-modal Crowd Counting via a Broker Modality

Haoliang Meng · Xiaopeng Hong · Chenhao Wang · Miao Shang · Wangmeng Zuo

# 127
Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ] [ Project Page ] [ Paper PDF ]
Tue 1 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images. This task is challenging due to the significant gap between these distinct modalities. In this paper, we propose a novel approach by introducing an auxiliary broker modality and on this basis frame the task as a triple-modal learning problem. We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models. Additionally, we identify and address the ghosting effect caused by direct cross-modal image fusion in multi-modal crowd counting. Through extensive experimental evaluations on popular multi-modal crowd counting datasets, we demonstrate the effectiveness of our method, which introduces only 4 million additional parameters, yet achieves promising results. We will release the source code upon the acceptance of the paper.

Chat is not available.