arxiv:2512.22323

SpotEdit: Selective Region Editing in Diffusion Transformers

Published on Dec 26

· Submitted by

Zhenxiong Tan on Dec 30

National University of Singapore

Upvote

Authors:

Abstract

Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.

View arXiv page View PDF Project page GitHub 36 Add to collection

Community

Yuanshi

Paper submitter about 15 hours ago

🎯 SpotEdit: Edit Only What Needs to Be Edited

Why regenerate the entire background just to add a scarf to the dog in your photo?

This is a frustrating limitation facing many current AI image editing models. Existing methods typically perform a full regeneration of the entire image even for small changes. This not only wastes a massive amount of computing time but often causes distortion or loss of detail in the background—areas that didn't need to be touched in the first place.

SpotEdit is here to solve this problem. We refuse to "overhaul" the entire image, sticking instead to a simple yet powerful principle: Edit only what needs to be edited.

🚀 What is SpotEdit?

SpotEdit is a training-free universal framework designed specifically for Diffusion Transformer (DiT) models. It automatically identifies which regions require editing and which should remain untouched, eliminating the need for you to manually paint complex masks.

✨ The Core "Magic"

Automated Detection (SpotSelector)
Acting like a pair of "sharp eyes," this mechanism uses perceptual similarity to automatically distinguish between stable backgrounds and regions that need to change. It intelligently skips heavy computation for the background, concentrating processing power strictly on the "cutting edge" where changes are actually happening.
Seamless Fusion (SpotFusion)
For the background that doesn't need changing, SpotEdit does not regenerate it; instead, it directly reuses feature information from the original image. Simultaneously, through a dynamic fusion mechanism, it ensures that newly generated objects (like that added scarf) blend perfectly with the original background's lighting and texture, creating a result with no visual inconsistencies.

💡 Why Choose SpotEdit?

⚡️ Blazing Fast: By skipping massive amounts of unnecessary background computation, inference speed is boosted by nearly 2×.
🖼️ Zero Background Loss: It achieves true "local editing," perfectly preserving every detail of the original background, so you no longer have to worry about the background "collapsing" or distorting.
🛠️ Training-Free: A plug-and-play solution that directly enhances the editing experience of existing models.

SpotEdit returns image editing to its essence—precise, efficient, and respectful of the original image.

HomePage: https://biangbiang0321.github.io/SpotEdit.github.io