Instructions to use MING-ZCH/MetaphorStar-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MING-ZCH/MetaphorStar-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MING-ZCH/MetaphorStar-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("MING-ZCH/MetaphorStar-32B")
model = AutoModelForMultimodalLM.from_pretrained("MING-ZCH/MetaphorStar-32B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MING-ZCH/MetaphorStar-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MING-ZCH/MetaphorStar-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MING-ZCH/MetaphorStar-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MING-ZCH/MetaphorStar-32B

SGLang

How to use MING-ZCH/MetaphorStar-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MING-ZCH/MetaphorStar-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MING-ZCH/MetaphorStar-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MING-ZCH/MetaphorStar-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MING-ZCH/MetaphorStar-32B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MING-ZCH/MetaphorStar-32B with Docker Model Runner:
```
docker model run hf.co/MING-ZCH/MetaphorStar-32B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual RL

Project Page | Paper | Code

MetaphorStar is the first Multi-modal Large Language Model (MLLM) family trained via an End-to-End Visual Reinforcement Learning (RL) framework specifically designed to bridge the gap between literal perception ("seeing things as they are") and metaphorical understanding ("seeing things as we are").

Built upon the Qwen2.5-VL architecture, MetaphorStar achieves State-of-the-Art (SOTA) performance on image implication tasks and demonstrates robust generalization capabilities on complex visual reasoning benchmarks (e.g., MMMU, MathVerse).

🌟 Key Highlights

SOTA on Image Implication: Significantly outperforms GPT-4o, Claude-3.5-Sonnet, and Gemini-2.5-Pro on True-False and Open-Style image implication questions.
End-to-End Visual RL (TFQ-GRPO): Utilizes the True-False Question (TFQ) format as a dense reward signal for Group Relative Policy Optimization (GRPO), bypassing the limitations of traditional Supervised Fine-Tuning (SFT).
Overcoming the "SFT Curse": Our research identifies that SFT warmup creates an "entropy bottleneck" that harms generalization. MetaphorStar is trained with pure RL to maintain high policy entropy, enabling creative and robust reasoning.
Generalization: Training on metaphors enhances the model's general visual reasoning ability (e.g., +16.2 points on MMMU for the 32B model compared to base).

🧠 Methodology: TFQ-GRPO

Current MLLMs struggle with metaphors because they lack the sophisticated multi-hop reasoning and Theory of Mind (ToM) required. We introduce TFQ-GRPO, a framework that leverages:

TFQ-Data: A fine-grained dataset where each image is associated with multiple True/False propositions, probing both literal content and deep implications.
GRPO (Group Relative Policy Optimization): An on-policy RL algorithm that optimizes reasoning trajectories based on a combined reward of Accuracy (correct T/F judgment) and Format (structured thinking process).
Structured Reasoning: The model is trained to explicitly output <think>...</think> traces before the final answer, allowing it to "find" the correct reasoning path through exploration.

📊 Performance

Evaluation on TFQ-Bench and the High-Level Image Implication Benchmark (EN):

Model	TFQ (Acc)	MCQ (Acc)	OSQ (Score 0-5)
MetaphorStar-32B	74%	78%	3.94
MetaphorStar-7B	70%	74%	3.22
MetaphorStar-3B	62%	64%	3.06
Gemini-2.5-Pro	68%	82%	3.38
GPT-4o	50%	60%	2.94
Claude-3.5-Sonnet	38%	68%	3.22

Note: MetaphorStar-32B achieves SOTA on TFQ and OSQ, and outperforms top closed-source models on MCQ.

🚀 Quick Start

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info
import torch

model_id = "MING-ZCH/MetaphorStar-32B"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/metaphor_image.jpg"},
            {"type": "text", "text": "True-false questions: The wilted plant in the office implies a stressful working environment.

First, describe the image, then analyze the image implication, and finally reason to get the answer. Output the thinking process in <think></think> and the final correct answer in <answer></answer> tags."}
        ]
    }
]

# Inference setup (standard Qwen2.5-VL generation)
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[...], padding=True, return_tensors="pt").to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=2048)
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(output_text)

📜 Citation

@article{metaphorstar2026,
  title={MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning},
  author={Chenhao Zhang, Yazhe Niu, Hongsheng Li},
  journal={arXiv preprint arXiv:2602.10575},
  year={2026}
}

Downloads last month: 7

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for MING-ZCH/MetaphorStar-32B

Quantizations

2 models

Collection including MING-ZCH/MetaphorStar-32B

MetaphorStar

Collection

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning • 8 items • Updated Feb 13 • 2

Paper for MING-ZCH/MetaphorStar-32B

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

Paper • 2602.10575 • Published Feb 11 • 4