--- license: apache-2.0 base_model: - Qwen/Qwen3-VL-8B-Instruct - Wan-AI/Wan2.1-T2V-1.3B pipeline_tag: image-text-to-image tags: - image-to-image - image-editing - diffusion - computer-vision - spatial-editing - vision-language library_name: transformers --- # SpatialEdit-16B SpatialEdit-16B is a research model for fine-grained image spatial editing. It is designed to follow spatial instructions such as object moving, object rotation, and camera-centric editing while preserving scene realism and subject identity as much as possible. This model is released as part of the **SpatialEdit** project: - Paper: [SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing](https://arxiv.org/pdf/2604.04911) - Code: [SpatialEdit GitHub Repository](https://github.com/EasonXiao-888/SpatialEdit) - Training Data: [SpatialEdit-500K](https://huggingface.co/datasets/EasonXiao-888/SpatialEdit-500K) - Benchmark: [SpatialEdit-Bench](https://huggingface.co/datasets/EasonXiao-888/SpatialEdit-Bench) https://cdn-uploads.huggingface.co/production/uploads/656a12a3d848a6683a6dfb9e/uMD0fka9fN5iBfSNgmDsj.mp4 ## Highlights - Fine-grained spatial editing from an input image and instruction - Supports object-centric and camera-centric manipulations - Trained with the SpatialEdit-500K synthetic data engine - Evaluated with SpatialEdit-Bench for both plausibility and geometric faithfulness ## Overview SpatialEdit focuses on **spatially grounded image editing**. Instead of only changing appearance or style, the model aims to edit geometric attributes of a scene, including: - object movement - object rotation - camera trajectory-related editing ### Task Definition

SpatialEdit task definition

Caption suggestion: Task definition of fine-grained image spatial editing. ## Application Gallery ### 3D Point Control

3D point control example 1 3D point control example 2 3D point control example 3 3D point control example 4

The first and third examples show sparse-view point observations. The second and fourth examples illustrate how SpatialEdit can synthesize richer spatial observations from limited inputs. ### Camera Trajectory Editing

Camera editing input Camera editing output Camera editing transition video

Left: input image. Middle: edited target view generated by SpatialEdit. Right: a camera-transition video synthesized from the spatially edited endpoint. ### Object Translation

Object translation input Object translation output Object translation transition video

Left: input image. Middle: translated target result generated by SpatialEdit. Right: an interpolated motion sequence built from the edited endpoint. ### Object Rotation

Object rotation input Object rotation output Object rotation transition video

Left: input image. Middle: rotated target result generated by SpatialEdit. Right: a smooth transition sequence derived from the edited result. ## Required External Checkpoints Before running inference, please download the following dependencies: - [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) - [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B), including `Wan2.1_VAE.pth` ## Repository Contents This model repository is expected to store the checkpoints used by the official codebase. A typical layout is: ```bash SpatialEdit_CKPT/ ├── CKPT_PT.pth └── CKPT_CT_lora/ ``` - `CKPT_PT.pth`: full DiT checkpoint - `CKPT_CT_lora/`: LoRA checkpoint used for spatial editing If your uploaded filenames differ, simply update the paths in the provided scripts. A recommended local directory structure is: ```bash your_base_path/ ├── SpatialEdit_CKPT/ │ ├── CKPT_PT.pth │ └── CKPT_CT_lora/ └── model/ ├── Qwen3-VL-8B-Instruct/ └── Wan2.1-T2V-1.3B/ └── Wan2.1_VAE.pth ``` ## Quick Start The [SpatialEdit GitHub Repository](https://github.com/EasonXiao-888/SpatialEdit) provides a simple local demo script. ## Citation If you find this project useful, please cite the SpatialEdit paper. ```bibtex @misc{spatialedit, title={SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing}, author={Yicheng Xiao and Wenhu Zhang and Lin Song and Yukang Chen and Wenbo Li and Nan Jiang and Tianhe Ren and Haokun Lin and Wei Huang and Haoyang Huang and Xiu Li and Nan Duan and Xiaojuan Qi}, year={2026} } ``` Please replace the BibTeX entry above with the final official citation if needed. ## Acknowledgement This project builds upon several excellent open-source efforts. We sincerely thank: - [ReCamMaster](https://github.com/KlingAIResearch/ReCamMaster) - [TexVerse](https://github.com/yiboz2001/TexVerse) - [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) We also thank the contributors and collaborators who supported the development of SpatialEdit.