Skip to yearly menu bar Skip to main content


Poster

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

yaomin huang · Faming Fang · Zaoming Yan · Chaomin Shen · Guixu Zhang

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, efficiently capturing essential knowledge without redundancy. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, which are then conveyed through a unified distillation framework. Numerous experiments were conducted to validate the effectiveness of the proposed method. Remarkably, the distilled student network not only significantly outperformed its original counterpart but also, in many cases, surpassed the teacher network.

Live content is unavailable. Log in and register to view live content