The objective of data-free model extraction (DFME) is to acquire a pre-trained black-box model solely through query access, without any knowledge of the training data used for the victim model. Defending against DFME presents a formidable challenge because the distribution of attack query data and the attacker's strategy remain undisclosed to the defender beforehand. However, existing defense methods: (1) are computational and memory inefficient; or (2) can only provide evidence of model theft after the model has already been stolen. To address these limitations, we thus propose a defensive training method, named, Attack-Aware and Uncertainty-Guided (AAUG) defense method against DFME. AAUG is designed to effectively thwart DFME attacks while concurrently enhancing deployment efficiency. The key strategy involves introducing random weight perturbations to the victim model's weights during predictions for various inputs. During defensive training, the weights perturbations are maximized on simulated out-of-distribution (OOD) data to heighten the challenge of model theft, while being minimized on in-distribution (ID) training data to preserve model utility. Additionally, we formulate an attack-aware defensive training objective function, reinforcing the model's resistance to theft attempts. Extensive experiments on defending against both soft-label and hard-label DFME attacks demonstrate the effectiveness of AAUG. In particular, AAUG significantly reduces the accuracy of the clone model and is substantially more efficient than existing defense methods.
Live content is unavailable. Log in and register to view live content