File size: 6,149 Bytes
45f222a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="https://github.com/VectorSpaceLab/EditScore/raw/main/assets/logo.png" width="65%">
</p>

This repository contains **EditScore**, a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.

The model was presented in the paper [EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling](https://huggingface.co/papers/2509.23909).

- πŸ“š [Paper](https://huggingface.co/papers/2509.23909)
- 🌐 [Project Page](https://vectorspacelab.github.io/EditScore)
- πŸ’» [Code Repository](https://github.com/VectorSpaceLab/EditScore)

## ✨ Highlights
- **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
- **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
- **Simple and Easy-to-Use**: Get an accurate quality score for your image edits with just a few lines of code.
- **Versatile Applications**: Ready to use as a best-in-class reranker to improve editing outputs, or as a high-fidelity reward signal for **stable and effective Reinforcement Learning (RL) fine-tuning**.

## πŸ“– Introduction
While Reinforcement Learning (RL) holds immense potential for this domain, its progress has been severely hindered by the absence of a high-fidelity, efficient reward signal.

To overcome this barrier, we provide a systematic, two-part solution:

- **A Rigorous Evaluation Standard**: We first introduce **EditReward-Bench**, a new public benchmark for the direct and reliable evaluation of reward models. It features 13 diverse subtasks and expert human annotations, establishing a gold standard for measuring reward signal quality.

- **A Powerful & Versatile Tool**: Guided by our benchmark, we developed the **EditScore** model series. Through meticulous data curation and an effective self-ensembling strategy, EditScore sets a new state of the art for open-source reward models, even surpassing the accuracy of leading proprietary VLMs.

<p align="center">
  <img src="https://github.com/VectorSpaceLab/EditScore/raw/main/assets/table_reward_model_results.png" width="95%">
  <br>
  <em>Benchmark results on EditReward-Bench.</em>
</p>

We demonstrate the practical utility of EditScore through two key applications:

- **As a State-of-the-Art Reranker**: Use EditScore to perform Best-of-*N* selection and instantly improve the output quality of diverse editing models.
- **As a High-Fidelity Reward for RL**: Use EditScore as a robust reward signal to fine-tune models via RL, enabling stable training and unlocking significant performance gains where general-purpose VLMs fail.

This repository releases both the **EditScore** models and the **EditReward-Bench** dataset to facilitate future research in reward modeling, policy optimization, and AI-driven model improvement.

<p align="center">
  <img src="https://github.com/VectorSpaceLab/EditScore/raw/main/assets/figure_edit_results.png" width="95%">
  <br>
  <em>EditScore as a superior reward signal for image editing.</em>
</p>

## πŸš€ Quick Start

### πŸ› οΈ Environment Setup

#### βœ… Recommended Setup

```bash
# 1. Clone the repo
git clone git@github.com:VectorSpaceLab/EditScore.git
cd EditScore

# 2. (Optional) Create a clean Python environment
conda create -n editscore python=3.12
conda activate editscore

# 3. Install dependencies
# 3.1 Install PyTorch (choose correct CUDA version)
pip install torch==2.7.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu126

# 3.2 Install other required packages
pip install -r requirements.txt

# EditScore runs even without vllm, though we recommend install it for best performance.
pip install vllm
```

#### 🌏 For users in Mainland China

```bash
# Install PyTorch from a domestic mirror
pip install torch==2.7.1 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu126

# Install other dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

# EditScore runs even without vllm, though we recommend install it for best performance.
pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
```

---

### πŸ§ͺ Usage Example
Using EditScore is straightforward. The model will be automatically downloaded from the Hugging Face Hub on its first run.
```python
from PIL import Image
from editscore import EditScore

# Load the EditScore model. It will be downloaded automatically.
# Replace with the specific model version you want to use.
model_path = "Qwen/Qwen2.5-VL-7B-Instruct"
lora_path = "EditScore/EditScore-7B"

scorer = EditScore(
    backbone="qwen25vl", # set to "qwen25vl_vllm" for faster inference
    model_name_or_path=model_path,
    enable_lora=True,
    lora_path=lora_path,
    score_range=25,
    num_pass=1, # Increase for better performance via self-ensembling
)

input_image = Image.open("example_images/input.png")
output_image = Image.open("example_images/output.png")
instruction = "Adjust the background to a glass wall."

result = scorer.evaluate([input_image, output_image], instruction)
print(f"Edit Score: {result['final_score']}")
# Expected output: A dictionary containing the final score and other details.
```

---

## ❀️ Citing Us
If you find this repository or our work useful, please consider giving a star ⭐ and citation πŸ¦–, which would be greatly appreciated:

```bibtex
@article{luo2025editscore,
  title={EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling},
  author={Xin Luo and Jiahao Wang and Chenyuan Wu and Shitao Xiao and Xiyan Jiang and Defu Lian and Jiajun Zhang and Dong Liu and Zheng Liu},
  journal={arXiv preprint arXiv:2509.23909},
  year={2025}
}
```