Poster
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
JINGYANG XIANG · Zuohui Chen · Siqi Li · Qing Wu · Yong Liu
# 166
Strong Double Blind |
Binary Neural Network (BNN) has been proven highly effective for deploying deep learning models on mobile and embedded platforms. Most existing works focus on either designing a gradient approximation to alleviate gradient mismatch for BNNs, minimizing the quantization error, or improving representation ability, while leaving the weight flip, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the update efficiency of weight signs. We observe that, for vanilla BNNs, over 50\% of the weights remain their signs unchanged during training, and these weights are unevenly distributed throughout the network. We refer to these weights as silent weights'', which slow down convergence and lead to significant accuracy degradation. Theoretically, we demonstrate this is due to the gradients of the BNN being independent of their latent weight distribution. To this end, we propose Overcome Silent Weights~(OvSW) to address the issue. OvSW first employs Adaptive Gradient Scaling~(AGS) to establish the relationship between gradient and latent weight distribution thus improving the update efficiency of signs for overall weights. Then, we design Silence Awareness Decaying~(SAD) to automatically detect
silent weights'' and apply additional penalty to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K with various architectures. OvSW obtains 61.4\% top-1 accuracy on the ImageNet1K using binarized ResNet18 architecture, exceeding the state-of-the-art by over 0.4\%. Codes are anonymously available at \url{https://anonymous.4open.science/r/OvSW-1696}. \keywords{Binary Neural Network \and Silent Weights \and Adaptive Gradient Scaling \and Silence Awareness Decaying}