[2411.15244] Adversarial Prompt Distillation for Vision-Language Models

[Submitted on 22 Nov 2024 (v1), last revised 15 Apr 2025 (this version, v2)]

View a PDF of the paper titled Adversarial Prompt Distillation for Vision-Language Models, by Lin Luo and 5 other authors

View PDF
HTML (experimental)

Abstract:Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image Pre-training (CLIP) have been shown to be susceptible to adversarial attacks, raising concerns about their deployment in safety-critical applications like autonomous driving and medical diagnosis. One promising approach for robustifying pre-trained VLMs is Adversarial Prompt Tuning (APT), which applies adversarial training during the process of prompt tuning. However, existing APT methods are mostly single-modal methods that design prompt(s) for only the visual or textual modality, limiting their effectiveness in either robustness or clean accuracy. In this work, we propose Adversarial Prompt Distillation (APD), a bimodal knowledge distillation framework that enhances APT by integrating it with multi-modal knowledge transfer. APD optimizes prompts for both visual and textual modalities while distilling knowledge from a clean pre-trained teacher CLIP model. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD method over the current state-of-the-art APT methods in terms of both adversarial robustness and clean accuracy. The effectiveness of APD also validates the possibility of using a non-robust teacher to improve the generalization and robustness of fine-tuned VLMs.

Submission history

From: Lin Luo [view email]
[v1]
Fri, 22 Nov 2024 03:02:13 UTC (588 KB)
[v2]
Tue, 15 Apr 2025 01:57:20 UTC (634 KB)

Source link

Post Views: 6

Submission history

Share this:

Related Posts

Leave a Comment Cancel Reply