Skip to yearly menu bar Skip to main content


Oral

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang · Sangdoo Yun · Dongyoon Han

[ ] [ Visit Oral 7C: Optimization And Theory ] [ Paper ]
Fri 4 Oct 1 a.m. — 1:10 a.m. PDT

Abstract:

This paper introduces a novel fine-tuning method for large pre-trained models, offering strong performance with further efficiency. Breaking away from traditional practices that average a multitude of fine-tuned models for accuracy improvements, our approach uses significantly fewer models to optimize final weights yet achieve superior accuracy. Based on the crucial observations of the dynamics in fine-tuned models' weight space, our novel layer-wise averaging technique could surpass state-of-the-art model averaging methods such as Model Soup only with just two fine-tuned models. This strategy can be more aptly coined like Model Stock, reflecting its reliance on selecting very few models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both in-distribution (ID) and out-of-distribution (OOD) tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models will be made publicly available.

Chat is not available.