Update pipeline tag, add paper ID, abstract, and GitHub link
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,8 @@
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-VL-7B-Instruct
|
| 3 |
library_name: peft
|
| 4 |
-
pipeline_tag: text-
|
|
|
|
| 5 |
tags:
|
| 6 |
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
|
| 7 |
- lora
|
|
@@ -29,6 +30,16 @@ tags:
|
|
| 29 |
</h4>
|
| 30 |
|
| 31 |
**EditScore** is a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## ✨ Highlights
|
| 33 |
- **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
|
| 34 |
- **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
|
|
@@ -165,4 +176,4 @@ If you find this repository or our work useful, please consider giving a star
|
|
| 165 |
journal={arXiv preprint arXiv:2509.23909},
|
| 166 |
year={2025}
|
| 167 |
}
|
| 168 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-VL-7B-Instruct
|
| 3 |
library_name: peft
|
| 4 |
+
pipeline_tag: image-text-to-text
|
| 5 |
+
paper: 2509.23909
|
| 6 |
tags:
|
| 7 |
- base_model:adapter:Qwen/Qwen2.5-VL-7B-Instruct
|
| 8 |
- lora
|
|
|
|
| 30 |
</h4>
|
| 31 |
|
| 32 |
**EditScore** is a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.
|
| 33 |
+
|
| 34 |
+
## Paper
|
| 35 |
+
[EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling](https://huggingface.co/papers/2509.23909)
|
| 36 |
+
|
| 37 |
+
### Abstract
|
| 38 |
+
Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
|
| 39 |
+
|
| 40 |
+
## Code Repository
|
| 41 |
+
The official code can be found on GitHub: [https://github.com/VectorSpaceLab/EditScore](https://github.com/VectorSpaceLab/EditScore)
|
| 42 |
+
|
| 43 |
## ✨ Highlights
|
| 44 |
- **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
|
| 45 |
- **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
|
|
|
|
| 176 |
journal={arXiv preprint arXiv:2509.23909},
|
| 177 |
year={2025}
|
| 178 |
}
|
| 179 |
+
```
|