Skip to yearly menu bar Skip to main content


Poster

A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation

Riccardo Fogliato · Pratik Patil · Mathew Monfort · Pietro Perona

# 137
[ ] [ Paper PDF ]
Fri 4 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract: Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often rely on a one-time random selection of data for model evaluation. However, by selecting the data strategically, costs can be reduced and estimation accuracy can be improved. In this paper, we propose a statistical framework for efficient model evaluation that includes stratification, sampling design, and estimation components. We examine the statistical properties of each component and evaluate their optimality. One key result of our work is that stratification via $k$-means clustering on accurate predictions of model performance leads to highly efficient estimators. Our experiments on computer vision datasets demonstrate that accuracy estimates obtained via stratified sampling designs consistently and significantly outperform those obtained through simple random sampling, with gains of up to 10x. Furthermore, we find that model-assisted estimators, which leverage predictions of model performance, are often more efficient than the commonly used naive empirical average of the errors.

Chat is not available.