GYX97
/

ImageDoctor

@@ -1,9 +1,16 @@
 # 🩺 ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
 **ImageDoctor** is a unified **evaluation model** for **text-to-image (T2I) generation**, capable of producing both **multi-aspect scalar scores** and **spatially grounded heatmaps**.
 It follows a **“look–think–predict”** reasoning paradigm that mimics human visual diagnosis — first localizing flaws, then reasoning about them, and finally producing an interpretable judgment.
----
 ## 🔍 Key Features
@@ -191,10 +198,3 @@ If you use **ImageDoctor**, please cite:
   year      = {2025},
   url       = {https://arxiv.org/abs/2510.01010},
 '''
----
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+pipeline_tag: image-text-to-text
+---
 # 🩺 ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
 **ImageDoctor** is a unified **evaluation model** for **text-to-image (T2I) generation**, capable of producing both **multi-aspect scalar scores** and **spatially grounded heatmaps**.
 It follows a **“look–think–predict”** reasoning paradigm that mimics human visual diagnosis — first localizing flaws, then reasoning about them, and finally producing an interpretable judgment.
 ## 🔍 Key Features
   year      = {2025},
   url       = {https://arxiv.org/abs/2510.01010},
 '''