Poster
Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
Zihan Zhang · Zhuo Xu · Xiang Xiang
# 47
Strong Double Blind |
Recent vision-language models (VLMs) such as CLIP have shown promise in Out-of-distribution (OOD) detection through their generalizable multimodal representations. Existing CLIP-based OOD detection methods only utilize a single modality of in-distribution (ID) information (e.g., textual cues). However, we find that the ID visual information helps to leverage CLIP's full potential for OOD detection. In this paper, we pursue a different approach and explore the regime to leverage both the visual and textual ID information. Specifically, we propose Dual-Pattern Matching (DPM), efficiently adapting CLIP for OOD detection by leveraging both textual and visual ID patterns. DPM stores ID class-wise text features as the textual pattern and the aggregated ID visual information as the visual pattern. At test time, the similarity to both patterns is computed to detect OOD inputs. We further extend DPM with lightweight adaptation for enhanced OOD detection. Experiments demonstrate DPM's advantages, outperforming existing methods on common benchmarks. The dual-pattern approach provides a simple yet effective way to exploit multi-modality for OOD detection with vision-language representations.