Skip to yearly menu bar Skip to main content


Poster

Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals

Camilo L Fosco · Benjamin Lahner · Bowen Pan · Alex Andonian · Emilie L Josephs · Alex Lascelles · Aude Oliva

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

The field of brain-to-stimuli reconstruction has seen significant progress in the last few years, but techniques continue to be subject-specific and are usually tested on a single dataset. In this work, we present a novel technique to reconstruct videos from functional Magnetic Resonance Imaging (fMRI) signals designed for performance across datasets and across human participants. Our pipeline accurately generates 2 and 3-second video clips from brain activity coming from distinct participants and different datasets by leveraging multi-dataset and multi-subject training. This helps us regress key latent and conditioning vectors for pretrained text-to-video and video-to-video models to reconstruct accurate videos that match the original stimuli observed by the participant. Key to our pipeline is the introduction of a 3-stage approach that first aligns fMRI signals to semantic embeddings, then regresses important vectors, and finally generates videos with those estimations. Our method demonstrates state-of-the-art reconstruction capabilities verified by qualitative and quantitative analyses, including crowd-sourced human evaluation. We showcase performance improvements across two datasets, as well as in multi-subject setups. Our ablation studies shed light on how different alignment strategies and data scaling decisions impact reconstruction performance, and we hint at a future for zero-shot reconstruction by analyzing how performance evolves as more subject data is leveraged.

Live content is unavailable. Log in and register to view live content