SpatialEdit-16B

SpatialEdit-16B is a research model for fine-grained image spatial editing. It is designed to follow spatial instructions such as object moving, object rotation, and camera-centric editing while preserving scene realism and subject identity as much as possible.

This model is released as part of the SpatialEdit project:

Paper: SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
Code: SpatialEdit GitHub Repository
Training Data: SpatialEdit-500K
Benchmark: SpatialEdit-Bench

https://cdn-uploads.huggingface.co/production/uploads/656a12a3d848a6683a6dfb9e/uMD0fka9fN5iBfSNgmDsj.mp4

Highlights

Fine-grained spatial editing from an input image and instruction
Supports object-centric and camera-centric manipulations
Trained with the SpatialEdit-500K synthetic data engine
Evaluated with SpatialEdit-Bench for both plausibility and geometric faithfulness

Overview

SpatialEdit focuses on spatially grounded image editing. Instead of only changing appearance or style, the model aims to edit geometric attributes of a scene, including:

object movement
object rotation
camera trajectory-related editing

Task Definition

SpatialEdit task definition

Caption suggestion: Task definition of fine-grained image spatial editing.

Application Gallery

3D Point Control

3D point control example 1 3D point control example 2 3D point control example 3 3D point control example 4

The first and third examples show sparse-view point observations. The second and fourth examples illustrate how SpatialEdit can synthesize richer spatial observations from limited inputs.

Camera Trajectory Editing

Camera editing input Camera editing output Camera editing transition video

Left: input image. Middle: edited target view generated by SpatialEdit. Right: a camera-transition video synthesized from the spatially edited endpoint.

Object Translation

Object translation input Object translation output Object translation transition video

Left: input image. Middle: translated target result generated by SpatialEdit. Right: an interpolated motion sequence built from the edited endpoint.

Object Rotation

Object rotation input Object rotation output Object rotation transition video

Left: input image. Middle: rotated target result generated by SpatialEdit. Right: a smooth transition sequence derived from the edited result.

Required External Checkpoints

Before running inference, please download the following dependencies:

Qwen3-VL-8B-Instruct
Wan2.1-T2V-1.3B, including Wan2.1_VAE.pth

Repository Contents

This model repository is expected to store the checkpoints used by the official codebase. A typical layout is:

SpatialEdit_CKPT/
├── CKPT_PT.pth
└── CKPT_CT_lora/

CKPT_PT.pth: full DiT checkpoint
CKPT_CT_lora/: LoRA checkpoint used for spatial editing

If your uploaded filenames differ, simply update the paths in the provided scripts.

A recommended local directory structure is:

your_base_path/
├── SpatialEdit_CKPT/
│   ├── CKPT_PT.pth
│   └── CKPT_CT_lora/
└── model/
    ├── Qwen3-VL-8B-Instruct/
    └── Wan2.1-T2V-1.3B/
        └── Wan2.1_VAE.pth

Quick Start

The SpatialEdit GitHub Repository provides a simple local demo script.

Citation

If you find this project useful, please cite the SpatialEdit paper.

@misc{spatialedit,
  title={SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing},
  author={Yicheng Xiao and Wenhu Zhang and Lin Song and Yukang Chen and Wenbo Li and Nan Jiang and Tianhe Ren and Haokun Lin and Wei Huang and Haoyang Huang and Xiu Li and Nan Duan and Xiaojuan Qi},
  year={2026}
}

Please replace the BibTeX entry above with the final official citation if needed.

Acknowledgement

This project builds upon several excellent open-source efforts. We sincerely thank:

We also thank the contributors and collaborators who supported the development of SpatialEdit.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for EasonXiao-888/SpatialEdit-16B

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(221)

this model