File size: 6,847 Bytes
ae18106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
  - reward-model
  - image-editing
  - reinforcement-learning
  - spatial-reasoning
  - vision-language-model
  - icml2026
datasets:
  - SpatialReward/SpatialReward-Train
pipeline_tag: image-text-to-text
language:
  - en
---

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/logo.png" width="65%">
</p>

<p align="center">
  <a href="https://lorangan-ddup.github.io/SpatialReward/"><img src="https://img.shields.io/badge/Project%20Page-SpatialReward-yellow" alt="project page"></a>
  <a href="https://arxiv.org/abs/2602.07458"><img src="https://img.shields.io/badge/arXiv-2602.07458-b31b1b.svg" alt="arxiv"></a>
  <a href="https://github.com/lorangan-ddup/SpatialReward"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="github"></a>
  <a href="https://huggingface.co/datasets/SpatialReward/MER-Bench"><img src="https://img.shields.io/badge/MER--Bench-πŸ€—-yellow" alt="dataset"></a>
  <a href="https://huggingface.co/datasets/SpatialReward/SpatialReward-Train"><img src="https://img.shields.io/badge/Training--Data-πŸ€—-yellow" alt="dataset"></a>
</p>

<h4 align="center">
    <p>
        <a href=#-news>News</a> |
        <a href=#-introduction>Introduction</a> |
        <a href=#-quick-start>Quick Start</a> |
        <a href=#-benchmark-evaluation>Benchmark Evaluation</a> |
        <a href=#️-citing-us>Citation</a>
    </p>
</h4>

**SpatialReward** is a state-of-the-art reward model for instruction-guided image editing that addresses the critical "Attention Collapse" problem through explicit spatial reasoning. By anchoring semantic judgments to predicted edit regions via bounding boxes, SpatialReward achieves unprecedented accuracy and reliability as both an evaluator and RL training signal.

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/attention_visualization.png" width="95%">
  <br>
  <em>Visualizing the Attention Collapse problem vs. SpatialReward's spatial grounding.</em>
</p>

## πŸ”₯ News

- **2026-05-05**: πŸŽ‰ We have open-sourced the **SpatialReward-8B** model weights, **[MER-Bench](https://huggingface.co/datasets/SpatialReward/MER-Bench)** benchmark, and **[SpatialReward-Train](https://huggingface.co/datasets/SpatialReward/SpatialReward-Train)** (260k spatial-aware training data)!
- **2026-05-01**: πŸŽ‰ **SpatialReward** has been accepted to **ICML 2026**!
- **2026-02-12**: We have released the **inference code**, **reward server**, and **training configurations**!
- **2026-02-07**: The paper is available on [arXiv](https://arxiv.org/abs/2602.07458).

## πŸ“Œ Introduction

Online Reinforcement Learning (RL) holds immense potential for advancing instruction-guided image editing, but its progress has been severely hindered by a critical perception gap we term **"Attention Collapse"**. Existing reward models frequently neglect cross-image comparisons and fail to capture fine-grained editing details, leading to inaccurate evaluations and unstable RL training.

To overcome this, we propose **SpatialReward**, which:
- **Introduces MER-Bench**: A new benchmark featuring multi-edit scenarios and expert human annotations for measuring reward model quality.
- **Enforces spatial reasoning**: Predicts bounding boxes for edit regions and anchors semantic judgments to pixel-level evidence.

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/performance_table.png" width="95%">
  <br>
  <em>Comprehensive benchmark results. SpatialReward achieves SOTA performance, outperforming GPT-4.1 and GPT-5 on MER-Bench.</em>
</p>

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/merbench_category_breakdown.png" width="70%">
  <br>
  <em>MER-Bench performance breakdown by editing category.</em>
</p>

## πŸš€ Quick Start

### Installation

```bash
git clone https://github.com/Kwai-Keye/SpatialReward.git
cd SpatialReward

conda create -n spatialreward python=3.11 -y
conda activate spatialreward

pip install torch==2.8.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
```

### Reward Server

```bash
# Start reward server
cd example/reward/server
bash start_servers.sh
bash start_proxy.sh

# Query from client
from example.reward.client.reward_client_edit import RewardClient

client = RewardClient(proxy_host="127.0.0.1", proxy_port=23456)
scores, rewards, reasoning, meta_data = client.evaluate(
    input_images=[input_img],
    output_image=[output_img],
    meta_datas=[{"instruction": "Remove the dog"}]
)
```

## πŸ“Š Benchmark Evaluation

Model and data are loaded directly from HuggingFace by default.

```bash
# MER-Bench
bash eval/MERBench/run.sh

# MMRB2
bash eval/MMRB2/run.sh

# EditReward-Bench
bash eval/EditReward-Bench/run.sh
```

## πŸ“š Datasets

| Dataset | Description | Link |
|---|---|---|
| **SpatialReward-Train** | 260k spatial-aware training data (SFT + RL) | [πŸ€— Hub](https://huggingface.co/datasets/SpatialReward/SpatialReward-Train) |
| **MER-Bench** | MultiEditReward-Bench evaluation benchmark | [πŸ€— Hub](https://huggingface.co/datasets/SpatialReward/MER-Bench) |

## 🎯 Training

### SFT (LLaMA-Factory)
```bash
llamafactory-cli train example/SpatialReward-train/sft/qwen3vl_lora_spatial_reward.yaml
```

### RL (ms-swift / GRPO)

```bash
# Replace ORM first
cp example/SpatialReward-train/rl/orm.py <ms-swift>/swift/plugin/orm.py
bash example/SpatialReward-train/rl/run_mater.sh
```

### RL Results on OmniGen2

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/omnigen2_rl_results.png" width="85%">
  <br>
  <em>SpatialReward delivers +0.90 on GEdit-EN Overall, doubling GPT-4.1's gain (+0.45).</em>
</p>

<p align="center">
  <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/rl_training_curves.png" width="95%">
  <br>
  <em>Stable RL training dynamics with SpatialReward as reward signal.</em>
</p>

## πŸ™ Acknowledgements

We thank [EditScore](https://github.com/VectorSpaceLab/EditScore) and [EditReward](https://github.com/TIGER-AI-Lab/EditReward) for valuable references.

## ❀️ Citing Us

```bibtex
@article{long2026spatialreward,
  title={SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning},
  author={Long, Yancheng and Yang, Yankai and Wei, Hongyang and Chen, Wei and Zhang, Tianke and Fan, Haonan and Liu, Changyi and Jiang, Kaiyu and Chen, Jiankang and Tang, Kaiyu and Wen, Bin and Yang, Fan and Gao, Tingting and Li, Han and Yang, Shuo},
  journal={arXiv preprint arXiv:2602.07458},
  year={2026}
}
```

## πŸ“„ License

Apache 2.0