Poster
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou · Kai Chen · Zhili LIU · Lanqing Hong · Hang Xu · ZHENGUO LI · Dit-Yan Yeung · James Kwok · Yu Zhang
# 78
Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed due to the introduction of image features. To construct safe MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of the pre-aligned LLMs in MLLMs. Experiments with five state-of-the-art (SOTA) MLLMs demonstrate that ECSO significantly enhances model safety (e.g., a 37.6% improvement on MM-SafetyBench (SD+OCR), and 71.3% on VLSafe for LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we demonstrate that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for the alignment of MLLMs without extra human intervention.