---
license: apache-2.0
base_model:
- Qwen/Qwen3-VL-8B-Instruct
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: image-text-to-image
tags:
- image-to-image
- image-editing
- diffusion
- computer-vision
- spatial-editing
- vision-language
library_name: transformers
---
# SpatialEdit-16B
SpatialEdit-16B is a research model for fine-grained image spatial editing. It is designed to follow spatial instructions such as object moving, object rotation, and camera-centric editing while preserving scene realism and subject identity as much as possible.
This model is released as part of the **SpatialEdit** project:
- Paper: [SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing](https://arxiv.org/pdf/2604.04911)
- Code: [SpatialEdit GitHub Repository](https://github.com/EasonXiao-888/SpatialEdit)
- Training Data: [SpatialEdit-500K](https://huggingface.co/datasets/EasonXiao-888/SpatialEdit-500K)
- Benchmark: [SpatialEdit-Bench](https://huggingface.co/datasets/EasonXiao-888/SpatialEdit-Bench)
https://cdn-uploads.huggingface.co/production/uploads/656a12a3d848a6683a6dfb9e/uMD0fka9fN5iBfSNgmDsj.mp4
## Highlights
- Fine-grained spatial editing from an input image and instruction
- Supports object-centric and camera-centric manipulations
- Trained with the SpatialEdit-500K synthetic data engine
- Evaluated with SpatialEdit-Bench for both plausibility and geometric faithfulness
## Overview
SpatialEdit focuses on **spatially grounded image editing**. Instead of only changing appearance or style, the model aims to edit geometric attributes of a scene, including:
- object movement
- object rotation
- camera trajectory-related editing
### Task Definition
Caption suggestion: Task definition of fine-grained image spatial editing.
## Application Gallery
### 3D Point Control
The first and third examples show sparse-view point observations. The second and fourth examples illustrate how SpatialEdit can synthesize richer spatial observations from limited inputs.
### Camera Trajectory Editing
Left: input image. Middle: edited target view generated by SpatialEdit. Right: a camera-transition video synthesized from the spatially edited endpoint.
### Object Translation
Left: input image. Middle: translated target result generated by SpatialEdit. Right: an interpolated motion sequence built from the edited endpoint.
### Object Rotation
Left: input image. Middle: rotated target result generated by SpatialEdit. Right: a smooth transition sequence derived from the edited result.
## Required External Checkpoints
Before running inference, please download the following dependencies:
- [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct)
- [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B), including `Wan2.1_VAE.pth`
## Repository Contents
This model repository is expected to store the checkpoints used by the official codebase. A typical layout is:
```bash
SpatialEdit_CKPT/
├── CKPT_PT.pth
└── CKPT_CT_lora/
```
- `CKPT_PT.pth`: full DiT checkpoint
- `CKPT_CT_lora/`: LoRA checkpoint used for spatial editing
If your uploaded filenames differ, simply update the paths in the provided scripts.
A recommended local directory structure is:
```bash
your_base_path/
├── SpatialEdit_CKPT/
│ ├── CKPT_PT.pth
│ └── CKPT_CT_lora/
└── model/
├── Qwen3-VL-8B-Instruct/
└── Wan2.1-T2V-1.3B/
└── Wan2.1_VAE.pth
```
## Quick Start
The [SpatialEdit GitHub Repository](https://github.com/EasonXiao-888/SpatialEdit) provides a simple local demo script.
## Citation
If you find this project useful, please cite the SpatialEdit paper.
```bibtex
@misc{spatialedit,
title={SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing},
author={Yicheng Xiao and Wenhu Zhang and Lin Song and Yukang Chen and Wenbo Li and Nan Jiang and Tianhe Ren and Haokun Lin and Wei Huang and Haoyang Huang and Xiu Li and Nan Duan and Xiaojuan Qi},
year={2026}
}
```
Please replace the BibTeX entry above with the final official citation if needed.
## Acknowledgement
This project builds upon several excellent open-source efforts. We sincerely thank:
- [ReCamMaster](https://github.com/KlingAIResearch/ReCamMaster)
- [TexVerse](https://github.com/yiboz2001/TexVerse)
- [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image)
We also thank the contributors and collaborators who supported the development of SpatialEdit.