Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
Aditay Tripathi
Rishubh Singh
Anirban Chakraborty
Pradeep Shenoy
Comparison of the models on robustness and shape-bias. The shape factor gives the fraction of dimensions that encode shape cues. Backbone(T) denotes texture shape debiased (TSD) models. In comparison, ELEAS denoted by Backbone(E) is more shape biased and shows better performance on ImageNet-C and ImageNet-A datasets.


Recent work has shown that deep vision models tend to be overly dependent on low-level or texture features, leading to poor generalization. Various data augmentation strategies have been proposed to overcome this so-called texture bias in DNNs. We propose a simple, lightweight \textit{adversarial augmentation} technique that explicitly incentivizes the network to learn holistic shapes for accurate prediction in an object classification setting. Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion, with the image label of the edgemap image. To classify these augmented images, the model needs to not only detect and focus on edges but distinguish between relevant and spurious edges. We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures. As an example, for ViT-S, We obtain absolute gains on classification accuracy gains up to 6 %. We also obtain gains of up to 28 % and 8.5 % on natural adversarial and out-of-distribution datasets like ImageNet-A (for ViT-B) and ImageNet-R (for ViT-S), respectively. Analysis using a range of probe datasets shows substantially increased shape sensitivity in our trained models, explaining the observed improvement in robustness and classification accuracy.



Paper and Supplementary Material

A. Tripathi, A. Mishra, A. Chakraborty.
Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
In CVPR, 2023.
(hosted on ArXiv)


This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.