Track: Oral 7B: Adversarial Learning And Privacy

Thu 3 Oct. 23:30 - 23:40 PDT

Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks

Hunmin Yang · Jongoh Jeong · Kuk-Jin Yoon

Recent vision-language foundation models, such as CLIP, have demonstrated superior capabilities in learning representations that can be transferable across diverse range of downstream tasks and domains. With the emergence of such powerful models, it has become crucial to effectively leverage their capabilities in tackling challenging vision tasks. On the other hand, only a few works have focused on devising adversarial examples that transfer well to both unknown domains and model architectures. In this paper, we propose a novel transfer attack method called PDCL-Attack, which leverages CLIP to enhance the transferability of adversarial perturbations generated within a generative model-based attack framework. Specifically, we exploit the joint vision-language space to formulate an effective prompt-driven feature guidance by harnessing the semantic representation power of text, particularly from the input ground truth. To the best of our knowledge, we are the first to introduce prompt learning to enhance the transferable generative attacks. Extensive experiments conducted across various cross-domain and cross-model settings empirically validate our approach, demonstrating its superiority over state-of-the-art methods.

Thu 3 Oct. 23:40 - 23:50 PDT

Adversarial Robustification via Text-to-Image Diffusion Models

Daewon Choi · Jongheon Jeong · Huiwon Jang · Jinwoo Shin

Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as ``adaptable'' denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our experiments show that our data-free scheme applied to the pre-trained CLIP could improve the (provable) adversarial robustness of its diverse zero-shot classification derivatives (while maintaining their accuracy), significantly surpassing prior approaches that utilize the full training data. Not only for CLIP, we also demonstrate that our framework is easily applicable for robustifying other visual classifiers efficiently.

Thu 3 Oct. 23:50 - 0:00 PDT

Flatness-aware Sequential Learning Generates Resilient Backdoors

Hoang Pham · The-Anh Ta · Anh Tran · Khoa Doan

Recently, backdoor attacks have become an emerging threat to the security of machine learning models. From the adversary's perspective, the implanted backdoors should be resistant to defensive algorithms, but some recently proposed fine-tuning defenses can remove these backdoors with notable efficacy. This is mainly due to the catastrophic forgetting (CF) property of deep neural networks. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We begin by investigating the connectivity between a backdoored and fine-tuned model in the loss landscape. Our analysis confirms that fine-tuning defenses, especially the more advanced ones, can easily push a poisoned model out of the backdoor regions, making it forget all about the backdoors. Based on this finding, we re-formulate backdoor training through the lens of CL and propose a novel framework, named \textbf{S}equential \textbf{B}ackdoor \textbf{L}earning (\textbf{SBL}), that can generate resilient backdoors. This framework separates the backdoor poisoning process into two tasks: the first task learns a backdoored model, while the second task, based on the CL principles, moves it to a backdoored region resistant to fine-tuning. We additionally propose to seek flatter backdoor regions via a sharpness-aware minimizer in the framework, further strengthening the durability of the implanted backdoor. Finally, we demonstrate the effectiveness of our method through extensive empirical experiments on several benchmark datasets in the backdoor domain.

Fri 4 Oct. 0:00 - 0:10 PDT

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Yixiang Qiu · Hao Fang · Hongyao Yu · Bin Chen · Meikang Qiu · Shu-Tao Xia

Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic images with high fidelity and appropriate semantics. However, previous MI attacks have solely disclosed private information in the latent space of GAN priors, limiting their semantic extraction and transferability across multiple target models and datasets. To address this challenge, we propose a novel method, \textbf{I}ntermediate \textbf{F}eatures enhanced \textbf{G}enerative \textbf{M}odel \textbf{I}nversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks. This allows us to extend the optimization space from latent code to intermediate features with enhanced expressive capabilities. To prevent GAN priors from generating unrealistic images, we apply a ${l}_1$ ball constraint to the optimization process. Experiments on multiple benchmarks demonstrate that our method significantly outperforms previous approaches and achieves state-of-the-art results under various settings, especially in the out-of-distribution (OOD) scenario.

Fri 4 Oct. 0:10 - 0:20 PDT

Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks

Jiacheng Cheng · Xiang Dai · Jia Wan · Nick Antipa · Nuno Vasconcelos

The problem of designing a privacy-preserving camera (PPC) is considered. Previous designs rely on a static point spread function (PSF), optimized to prevent detection of private visual information, such as recognizable facial features. However, the PSF can be easily recovered by measuring the camera response to a point light source, making these cameras vulnerable to PSF inversion attacks. A new dynamic privacy preserving (DyPP) camera design is proposed to prevent such attacks. DyPPcameras rely on dynamic optical elements, such spatial light modulators, to implement a time-varying PSF, which changes from picture to picture. PSFs are drawn randomly with a learned manifold embedding, trained adversarially to simultaneously meet user-specified targets for privacy, such as face recognition accuracy, and task utility. Empirical evaluations on multiple privacy-preserving vision tasks demonstrate that the DyPP design is significantly more robust to PSF inversion attacks than previous PPCs. Furthermore, the hardware feasibility of the approach is validated by a proof-of-concept camera model.

Fri 4 Oct. 0:20 - 0:30 PDT

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

Changhoon Kim · Kyle Min · Yezhou Yang

In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce \textbf{R}obust \textbf{A}dversarial \textbf{C}oncept \textbf{E}rase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of concept erasure method for T2I models. RACE utilizes a sophisticated adversarial training framework to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate (ASR). Impressively, RACE achieves a 30\% reduction in ASR for the ``nudity'' concept against the leading white-box attack method. Our extensive evaluations demonstrate RACE's effectiveness in defending against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. This work underlines the essential need for proactive defense measures in adapting to the rapidly advancing field of adversarial challenges.

Fri 4 Oct. 0:30 - 0:40 PDT

Privacy-Preserving Adaptive Re-Identification without Image Transfer

Hamza Rami · Jhony H. Giraldo · Nicolas Winckler · Stéphane Lathuiliere

Re-Identification systems (Re-ID) are crucial for public safety but face the challenge of having to adapt to environments that differ from their training distribution. Furthermore, rigorous privacy protocols in public places are being enforced as apprehensions regarding individual freedom rise, adding layers of complexity to the deployment of accurate Re-ID systems in new environments. For example, in the European Union, the principles of "Data Minimization" and "Purpose Limitation" restrict the retention and processing of images to what is strictly necessary. These regulations pose a challenge to the conventional Re-ID training schemes that rely on centralizing data on servers. In this work, we present a novel setting for privacy-preserving Distributed Unsupervised Domain Adaptation for person Re-ID (DUDA-Rid) to address the problem of domain shift without requiring any image transfer outside the camera devices. To address this setting, we introduce Fed-Protoid, a novel solution that adapts person Re-ID models directly within the edge devices. Our proposed solution employs prototypes derived from the source domain to align feature statistics within edge devices. Those source prototypes are distributed across the edge devices to minimize a distributed Maximum Mean Discrepancy (MMD) loss tailored for the DUDA-Rid setting. Our experiments provide compelling evidence that Fed-Protoid outperforms all evaluated methods in terms of both accuracy and communication efficiency, all while maintaining data privacy.

Fri 4 Oct. 0:40 - 0:50 PDT

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Yifan Li · hangyu guo · Kun Zhou · Wayne Xin Zhao · Ji-Rong Wen

In this paper, we study the harmlessness alignment problem of multimodal large language models (MLLMs). We conduct a systematic empirical analysis of the harmlessness performance of representative MLLMs and reveal that the image input poses the alignment vulnerability of MLLMs. Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malicious intent within the text input, using meticulously crafted images. Experimental results show that HADES can effectively jailbreak existing MLLMs, which achieves an average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision. Our code and data will be publicly released.

Fri 4 Oct. 0:50 - 1:00 PDT

Best Paper Honorable Mention

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

Vitali Petsiuk · Kate Saenko

Motivated by ethical and legal concerns, the scientific community is actively developing methods to limit the misuse of Text-to-Image diffusion models for reproducing copyrighted, violent, explicit, or personal information in the generated images. Simultaneously, researchers put these newly developed safety measures to the test by assuming the role of an adversary to find vulnerabilities and backdoors in them. We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation. This property allows us to combine other concepts, that should not have been affected by the inhibition, to reconstruct the vector, responsible for target concept generation, even though the direct computation of this vector is no longer accessible. We provide theoretical and empirical evidence why the proposed attacks are possible and discuss the implications of these findings for safe model deployment. We argue that it is essential to consider all possible approaches to image generation with diffusion models that can be employed by an adversary. Our work opens up the discussion about the implications of concept arithmetics and compositional inference for safety mechanisms in diffusion models.

Main Navigation

Oral Session

Oral 7B: Adversarial Learning And Privacy