MING-ZCH
/

MetaphorStar-3B

@@ -1,17 +1,21 @@
 ---
-license: apache-2.0
 library_name: transformers
 tags:
 - vision-language-model
 - reinforcement-learning
 - grpo
 - metaphor-understanding
 - visual-reasoning
-base_model: Qwen/Qwen2.5-VL
 ---
 # MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual RL
 **MetaphorStar** is the first Multi-modal Large Language Model (MLLM) family trained via an **End-to-End Visual Reinforcement Learning (RL)** framework specifically designed to bridge the gap between literal perception ("seeing things as they are") and metaphorical understanding ("seeing things as we are").
 Built upon the Qwen2.5-VL architecture, MetaphorStar achieves State-of-the-Art (SOTA) performance on image implication tasks and demonstrates robust generalization capabilities on complex visual reasoning benchmarks (e.g., MMMU, MathVerse).
@@ -65,7 +69,9 @@ messages = [
         "role": "user",
         "content": [
             {"type": "image", "image": "path/to/metaphor_image.jpg"},
-            {"type": "text", "text": "True-false questions: The wilted plant in the office implies a stressful working environment.\n\nFirst, describe the image, then analyze the image implication, and finally reason to get the answer. Output the thinking process in <think></think> and the final correct answer in <answer></answer> tags."}
         ]
     }
 ]
@@ -85,7 +91,7 @@ print(output_text)
 @article{metaphorstar2026,
   title={MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning},
   author={Chenhao Zhang, Yazhe Niu, Hongsheng Li},
-  journal={Anonymous},
   year={2026}
 }
 ```

 ---
+base_model: Qwen/Qwen2.5-VL
 library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
+arxiv: 2602.10575
 tags:
 - vision-language-model
 - reinforcement-learning
 - grpo
 - metaphor-understanding
 - visual-reasoning
 ---
 # MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual RL
+[**Paper**](https://huggingface.co/papers/2602.10575) | [**Project Page**](https://metaphorstar.github.io) | [**GitHub**](https://github.com/MING-ZCH/MetaphorStar)
 **MetaphorStar** is the first Multi-modal Large Language Model (MLLM) family trained via an **End-to-End Visual Reinforcement Learning (RL)** framework specifically designed to bridge the gap between literal perception ("seeing things as they are") and metaphorical understanding ("seeing things as we are").
 Built upon the Qwen2.5-VL architecture, MetaphorStar achieves State-of-the-Art (SOTA) performance on image implication tasks and demonstrates robust generalization capabilities on complex visual reasoning benchmarks (e.g., MMMU, MathVerse).
         "role": "user",
         "content": [
             {"type": "image", "image": "path/to/metaphor_image.jpg"},
+            {"type": "text", "text": "True-false questions: The wilted plant in the office implies a stressful working environment.
+First, describe the image, then analyze the image implication, and finally reason to get the answer. Output the thinking process in <think></think> and the final correct answer in <answer></answer> tags."}
         ]
     }
 ]
 @article{metaphorstar2026,
   title={MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning},
   author={Chenhao Zhang, Yazhe Niu, Hongsheng Li},
+  journal={arXiv preprint arXiv:2602.10575},
   year={2026}
 }
 ```