Poster
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
Sha Guo · Sui Lin · Chen-Lin Zhang · Zhuo Chen · Wenhan Yang · Lingyu Duan
# 90
Strong Double Blind |
Recent advancements in end-to-end image compression demonstrate the potential to surpass traditional codecs regarding rate-distortion performance. However, current methods either prioritize human perceptual quality or solely optimize for one or a few predetermined downstream tasks, neglecting a more common scenario that involves a variety of unforeseen machine vision tasks. In this paper, we propose a Diffusion-Based Multiple-Task Unified Image Compression framework that aims to expand the boundary of traditional image compression by incorporating human perception and multiple vision tasks in open-set scenarios. Our proposed method comprises a Multi-task Collaborative Embedding module and a Diffusion-based Invariant Knowledge Learning module. The former module facilitates collaborative embedding for multiple tasks, while the latter module distills the invariant knowledge from seen tasks to generalize toward unseen machine vision tasks. Experiments and visualizations show that the proposed method effectively extracts compact and versatile embeddings for human and machine vision collaborative compression tasks, resulting in superior performance. Specifically, our method outperforms the state-of-the-art by 52.25%/51.68%/48.87%/48.07%/6.29% BD-rate reduction in terms of mAP/mAP/aAcc/PQ-all/accuracy on the MS-COCO for instance detection/instance segmentation/stuff segmentation/panoptic segmentation and video question answering tasks, respectively.