Back to the Source:
Diffusion-Driven Adaptation to Test-Time Corruption

Jin Gao^*1     Jialing Zhang^*1     Xihui Liu³     Trevor Darrell⁴     Evan Shelhamer^\(\dagger\)5     Dequan Wang^{\(\dagger\)1,2}
¹Shanghai Jiao Tong University  ²Shanghai Artificial Intelligence Laboratory  ³The University of Hong Kong
⁴University of California, Berkeley  ⁵DeepMind
^*indicates equal contribution, ^\(\dagger\) indicates corresponding author
CVPR 2023

[Paper] [Code] [Demo] [Bibtex]

Abstract: Test-time adaptation harnesses test inputs to improve the accuracy of a model trained on source data when tested on shifted target data. Most methods update the source model by (re-)training on each target domain. While re-training can help, it is sensitive to the amount and order of the data and the hyperparameters for optimization. We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model. Our diffusion-driven adaptation (DDA) method shares its models for classification and generation across all domains, training both on source then freezing them for all targets, to avoid expensive domain-wise re-training. We augment diffusion with image guidance and classifier self-ensembling to automatically decide how much to adapt. Input adaptation by DDA is more robust than model adaptation across a variety of corruptions, models, and data regimes on the ImageNet-C benchmark. With its input-wise updates, DDA succeeds where model adaptation degrades on too little data (small batches), on dependent data (correlated orders), or on mixed data (multiple corruptions).

Method

One diffusion model can adapt inputs from new and multiple targets during testing. Our adaptation method, DDA, projects inputs from all target domains to the source domain by a generative diffusion model. Having trained on the source data alone, our source diffusion model for generation and source classification model for recognition do not need any updating, and therefore scale to multiple target domains without potentially expensive and sensitive re-training optimization.

Performance

DDA is more robust in the episodic setting. Episodic inference is independent across inputs, and includes the source-only model without adaptation, model updates by MEMO, and input updates by DiffPure and DDA (ours). We evaluate accuracy on standard ImageNet and the corruptions of ImageNet-C.

DDA reliably improves robustness across corruption types. We compare DDA with the source-only model, state-of-the-art diffusion for adversarial defense (DiffPure), and a simple ablation of DDA (DDA w/o Self-Ensembling (SE)). DDA is the best on average, strictly improves on DiffPure, and improves on simple diffusion in most cases. Our self-ensembling prevents catastrophic drops (on fog or contrast, for example).

DDA is invariant to batch size and data order while Tent is extremely sensitive. To analyze sensivity to the amount and order of the data we measure the average robustness of independent adaptation across corruption types. DDA does not depend on these factors and consistently improves on MEMO. Tent fails on class-ordered data without shuffling and degrades at small batch sizes.

Visual Results

Presentations

Video

Poster

Reference


			@inproceedings{chai2021ensembling,

				  title={Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption},

				  author={Gao, Jin and Zhang, Jialing and Liu, Xihui and Darrell, Trevor and Shelhamer, Evan and Wang, Dequan},

				  booktitle={CVPR},

				  year={2023}

			 }