A new network architecture that uses image denoising at the feature level to improve the state of the art in adversarial robustness. While adversarial perturbations are small in the pixel space, we’ve found they also lead to substantial “noise” in the feature maps of the network. Therefore, we propose a new approach that contains blocks that denoise the features using nonlocal means or other filters, akin to bilateral filters.
When combined with large-scale adversarial training, our feature denoising networks improve the state of the art in adversarial robustness. To give you an idea of the increase in performance, our method achieves 55.7 percent accuracy under white-box attacks on ImageNet, whereas previous state of the art was 27.9 percent accuracy. Additionally, our submission to the Competition on Adversarial Attacks and Defenses 2018 competition convincingly won that competition, achieving 50.6 percent accuracy, surpassing the runner-up approach by approximately 10 percent.
We add denoising blocks at intermediate layers of convolutional networks. Denoising blocks are designed to reduce feature noise, which appears only when classifying adversarial images. We experiment with four different operations in our denoising blocks: nonlocal means, bilateral filter, mean filter, and median filter, which are all visually shown to reduce noise in feature maps.
The denoising blocks are jointly trained with all layers of the network end-to-end using adversarial training. Due to the high computation cost of adversarial training, a large-scale adversarial training system is also important to the success. We train and evaluate our methods on the ImageNet classification dataset, which has roughly 1.28 million images in 1,000 classes.
Adversarial attacks on image classification systems are small perturbations to images that could lead these systems to make incorrect predictions. These perturbations are either imperceptible or perceived as noise that is not recognizable by people. Increasing adversarial robustness in training is an important step in defending against vulnerabilities in real-world applications of convolutional networks. One example is people altering images that violate our policies so they slip past our detection tools. Our feature-level denoising model improves accuracy of image classification under both white-box and black-box settings. We have released both the training system and the robust models at GitHub to facilitate future research.