Update README.md
Browse files
README.md
CHANGED
|
@@ -1,9 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# 🩺 ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
|
| 2 |
|
| 3 |
**ImageDoctor** is a unified **evaluation model** for **text-to-image (T2I) generation**, capable of producing both **multi-aspect scalar scores** and **spatially grounded heatmaps**.
|
| 4 |
It follows a **“look–think–predict”** reasoning paradigm that mimics human visual diagnosis — first localizing flaws, then reasoning about them, and finally producing an interpretable judgment.
|
| 5 |
|
| 6 |
-
---
|
| 7 |
|
| 8 |
## 🔍 Key Features
|
| 9 |
|
|
@@ -191,10 +198,3 @@ If you use **ImageDoctor**, please cite:
|
|
| 191 |
year = {2025},
|
| 192 |
url = {https://arxiv.org/abs/2510.01010},
|
| 193 |
'''
|
| 194 |
-
---
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
---
|
| 199 |
-
license: apache-2.0
|
| 200 |
-
---
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
---
|
| 9 |
# 🩺 ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
|
| 10 |
|
| 11 |
**ImageDoctor** is a unified **evaluation model** for **text-to-image (T2I) generation**, capable of producing both **multi-aspect scalar scores** and **spatially grounded heatmaps**.
|
| 12 |
It follows a **“look–think–predict”** reasoning paradigm that mimics human visual diagnosis — first localizing flaws, then reasoning about them, and finally producing an interpretable judgment.
|
| 13 |
|
|
|
|
| 14 |
|
| 15 |
## 🔍 Key Features
|
| 16 |
|
|
|
|
| 198 |
year = {2025},
|
| 199 |
url = {https://arxiv.org/abs/2510.01010},
|
| 200 |
'''
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|