Back to the Source:
Diffusion-Driven Adaptation to Test-Time Corruption

Jin Gao*1     Jialing Zhang*1     Xihui Liu3     Trevor Darrell4     Evan Shelhamer\(\dagger\)5     Dequan Wang\(\dagger\)1,2
1Shanghai Jiao Tong University  2Shanghai Artificial Intelligence Laboratory  3The University of Hong Kong  
4University of California, Berkeley  5DeepMind
*indicates equal contribution, \(\dagger\) indicates corresponding author
CVPR 2023

[Paper]      [Code]      [Demo]      [Bibtex]

Abstract: Test-time adaptation harnesses test inputs to improve the accuracy of a model trained on source data when tested on shifted target data. Most methods update the source model by (re-)training on each target domain. While re-training can help, it is sensitive to the amount and order of the data and the hyperparameters for optimization. We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model. Our diffusion-driven adaptation (DDA) method shares its models for classification and generation across all domains, training both on source then freezing them for all targets, to avoid expensive domain-wise re-training. We augment diffusion with image guidance and classifier self-ensembling to automatically decide how much to adapt. Input adaptation by DDA is more robust than model adaptation across a variety of corruptions, models, and data regimes on the ImageNet-C benchmark. With its input-wise updates, DDA succeeds where model adaptation degrades on too little data (small batches), on dependent data (correlated orders), or on mixed data (multiple corruptions).


One diffusion model can adapt inputs from new and multiple targets during testing. Our adaptation method, DDA, projects inputs from all target domains to the source domain by a generative diffusion model. Having trained on the source data alone, our source diffusion model for generation and source classification model for recognition do not need any updating, and therefore scale to multiple target domains without potentially expensive and sensitive re-training optimization.


DDA is more robust in the episodic setting. Episodic inference is independent across inputs, and includes the source-only model without adaptation, model updates by MEMO, and input updates by DiffPure and DDA (ours). We evaluate accuracy on standard ImageNet and the corruptions of ImageNet-C.

DDA reliably improves robustness across corruption types. We compare DDA with the source-only model, state-of-the-art diffusion for adversarial defense (DiffPure), and a simple ablation of DDA (DDA w/o Self-Ensembling (SE)). DDA is the best on average, strictly improves on DiffPure, and improves on simple diffusion in most cases. Our self-ensembling prevents catastrophic drops (on fog or contrast, for example).

DDA is invariant to batch size and data order while Tent is extremely sensitive. To analyze sensivity to the amount and order of the data we measure the average robustness of independent adaptation across corruption types. DDA does not depend on these factors and consistently improves on MEMO. Tent fails on class-ordered data without shuffling and degrades at small batch sizes.

Visual Results



  title={Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption},
  author={Gao, Jin and Zhang, Jialing and Liu, Xihui and Darrell, Trevor and Shelhamer, Evan and Wang, Dequan},