SpatialEdit-16B
SpatialEdit-16B is a research model for fine-grained image spatial editing. It is designed to follow spatial instructions such as object moving, object rotation, and camera-centric editing while preserving scene realism and subject identity as much as possible.
This model is released as part of the SpatialEdit project:
- Paper: SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
- Code: SpatialEdit GitHub Repository
- Training Data: SpatialEdit-500K
- Benchmark: SpatialEdit-Bench
Highlights
- Fine-grained spatial editing from an input image and instruction
- Supports object-centric and camera-centric manipulations
- Trained with the SpatialEdit-500K synthetic data engine
- Evaluated with SpatialEdit-Bench for both plausibility and geometric faithfulness
Overview
SpatialEdit focuses on spatially grounded image editing. Instead of only changing appearance or style, the model aims to edit geometric attributes of a scene, including:
- object movement
- object rotation
- camera trajectory-related editing
Task Definition
Caption suggestion: Task definition of fine-grained image spatial editing.
Application Gallery
3D Point Control
The first and third examples show sparse-view point observations. The second and fourth examples illustrate how SpatialEdit can synthesize richer spatial observations from limited inputs.
Camera Trajectory Editing
Left: input image. Middle: edited target view generated by SpatialEdit. Right: a camera-transition video synthesized from the spatially edited endpoint.
Object Translation
Left: input image. Middle: translated target result generated by SpatialEdit. Right: an interpolated motion sequence built from the edited endpoint.
Object Rotation
Left: input image. Middle: rotated target result generated by SpatialEdit. Right: a smooth transition sequence derived from the edited result.
Required External Checkpoints
Before running inference, please download the following dependencies:
- Qwen3-VL-8B-Instruct
- Wan2.1-T2V-1.3B, including
Wan2.1_VAE.pth
Repository Contents
This model repository is expected to store the checkpoints used by the official codebase. A typical layout is:
SpatialEdit_CKPT/
βββ CKPT_PT.pth
βββ CKPT_CT_lora/
CKPT_PT.pth: full DiT checkpointCKPT_CT_lora/: LoRA checkpoint used for spatial editing
If your uploaded filenames differ, simply update the paths in the provided scripts.
A recommended local directory structure is:
your_base_path/
βββ SpatialEdit_CKPT/
β βββ CKPT_PT.pth
β βββ CKPT_CT_lora/
βββ model/
βββ Qwen3-VL-8B-Instruct/
βββ Wan2.1-T2V-1.3B/
βββ Wan2.1_VAE.pth
Quick Start
The SpatialEdit GitHub Repository provides a simple local demo script.
Citation
If you find this project useful, please cite the SpatialEdit paper.
@misc{spatialedit,
title={SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing},
author={Yicheng Xiao and Wenhu Zhang and Lin Song and Yukang Chen and Wenbo Li and Nan Jiang and Tianhe Ren and Haokun Lin and Wei Huang and Haoyang Huang and Xiu Li and Nan Duan and Xiaojuan Qi},
year={2026}
}
Please replace the BibTeX entry above with the final official citation if needed.
Acknowledgement
This project builds upon several excellent open-source efforts. We sincerely thank:
We also thank the contributors and collaborators who supported the development of SpatialEdit.
Model tree for EasonXiao-888/SpatialEdit-16B
Base model
Qwen/Qwen3-VL-8B-Instruct