Papers
arxiv:2601.17586

Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization

Published on Jan 24
Authors:
,
,
,

Abstract

A novel Vision Transformer encoder called Stylizing ViT is proposed for domain generalization in medical image analysis, utilizing weight-shared attention blocks for anatomical consistency and style transfer, demonstrating improved robustness and performance in histopathology and dermatology tasks.

AI-generated summary

Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct image classification tasks in the context of histopathology and dermatology. Results demonstrate an improved robustness (up to +13% accuracy) over the state of the art while generating perceptually convincing images without artifacts. Additionally, we show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation. The source code is available at https://github.com/sdoerrich97/stylizing-vit .

Community

Sign up or log in to comment

Models citing this paper 21

Browse 21 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.17586 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.17586 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.