Oral
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin · Chas Leichner · Manolis Delakis · Marco Fornoni · Shixin Luo · Fan Yang · Weijun Wang · Colby Banbury · Chengxi Ye · Berkin Akin · Vaibhav Aggarwal · Tenghui Zhu · Daniele Moro · Andrew Howard
We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe was also crafted to improve MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Our approach emphasizes simplicity, utilizing standard components and a straightforward attention mechanism to ensure broad hardware compatibility. To further boost efficiency, we finally introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers impressive 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.