Poster

Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction

Jianxiong Tang ⋅ Jian-Huang Lai ⋅ Lingxiao Yang ⋅ Xiaohua Xie

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Paper PDF

Abstract

Event-to-Video (E2V) reconstruction is to recover grayscale video from the neuromorphic event streams, and Spiking Neural Networks (SNNs) are the potential energy-efficient models to solve the reconstruction problem. Event voxels are an efficient representation for compressing event streams for E2V reconstruction, but their temporal latent representation is rarely considered in SNN-based reconstruction. In this paper, we propose a spike-temporal latent representation (STLR) model for SNN-based E2V reconstruction. The STLR solves the temporal latent coding of event voxels for video frame reconstruction. It is composed of two cascaded SNNs: a) Spike-based Voxel Temporal Encoder (SVT) and b) U-shape SNN Decoder. The SVT is a spike-driven spatial unfolding network with a specially designed coding dynamic. It encodes the event voxel into the layer-wise spiking features for latent coding, approximating the fixed point of the Iterative Shrinkage-Thresholding Algorithm. Then, the U-shape SNN decoder reconstructs the video based on the encoded spikes. Experimental results show that the STLR achieves comparable performance to the popular SNNs on IJRR, HQF, and MVSEC, and significantly improves energy efficiency. For example, the STLR achieves comparable performance under 13.20% parameters and 3.33%~ 5.03% energy cost of the PA-EVSNN.

Chat is not available.