Instructions to use bunny127/SophiaVL-R1-Thinking-Reward-Model-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bunny127/SophiaVL-R1-Thinking-Reward-Model-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="bunny127/SophiaVL-R1-Thinking-Reward-Model-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("bunny127/SophiaVL-R1-Thinking-Reward-Model-3B")
model = AutoModelForMultimodalLM.from_pretrained("bunny127/SophiaVL-R1-Thinking-Reward-Model-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use bunny127/SophiaVL-R1-Thinking-Reward-Model-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/bunny127/SophiaVL-R1-Thinking-Reward-Model-3B

SGLang

How to use bunny127/SophiaVL-R1-Thinking-Reward-Model-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bunny127/SophiaVL-R1-Thinking-Reward-Model-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use bunny127/SophiaVL-R1-Thinking-Reward-Model-3B with Docker Model Runner:
```
docker model run hf.co/bunny127/SophiaVL-R1-Thinking-Reward-Model-3B
```

Add pipeline tag, library name and link to Github repo

by nielsr HF Staff - opened Jun 8, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+16

-2

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -1,8 +1,12 @@
 ---
 license: apache-2.0
 ---
-This is the Thinking Reward Model of SophiaVL-R1 (https://arxiv.org/abs/2505.17018).
 This model is finetuned with the [SophiaVL-R1-Thinking-156k Dataset](https://huggingface.co/datasets/bunny127/SophiaVL-R1-Thinking-156k). The base model is Qwen2.5-VL-3B.
 The input of Thinking Reward Model is a question with model response. Thinking Reward Model will output a score between 0 and 1 indicating the thinking quality of model response.
@@ -36,7 +40,17 @@ def get_process_reward(prompt_str, reasoning_str, image_path=None):
         if "<image>" not in prompt_str:
             prompt_str = f"<image> {prompt_str}"
-    prompt = f"""You are an expert reasoning evaluator. I will give you a multimodal question and an answer. Your goal is to judge a reward process and give a score between 0 and 1. You should focus on whether the reasoning process is good rather than whether the final answer is correct.### Evaluation Criteria:\n- **Logical Soundness**: Does each step follow logically from the previous one?\n- **Correct Reasoning**: Are the methods and steps used appropriate and valid? Are the facts and lemmas correctly stated and applied?\n- **Error Identification**: Are there any logical fallacies, unsupported assumptions, or incorrect steps?\n- **Language Consistency**: Is the reasoning process conducted in a single, consistent language without mixing different languages?\n- **Redundancy**: Is the reasoning concise, without unnecessary repetition or extraneous steps?\nProvide a single score from **{{0, 0.1, 0.2, ..., 1.0}}** based on the reasoning quality, where:\n - **0**: Completely flawed reasoning\n- **1**: Perfectly sound reasoning\n- Intermediate values (e.g., 0.3, 0.7) should reflect partial correctness or minor errors.\nBe strict, reward the good process and punish the bad one. You should only output the score without any explanation.
     Question: {prompt_str}
     Reasoning process: {reasoning_str}
     """

 ---
 license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
+This is the Thinking Reward Model of SophiaVL-R1 (https://arxiv.org/abs/2505.17018).
+The code for SophiaVL-R1 can be found at https://github.com/kxfan2002/SophiaVL-R1.
 This model is finetuned with the [SophiaVL-R1-Thinking-156k Dataset](https://huggingface.co/datasets/bunny127/SophiaVL-R1-Thinking-156k). The base model is Qwen2.5-VL-3B.
 The input of Thinking Reward Model is a question with model response. Thinking Reward Model will output a score between 0 and 1 indicating the thinking quality of model response.
         if "<image>" not in prompt_str:
             prompt_str = f"<image> {prompt_str}"
+    prompt = f"""You are an expert reasoning evaluator. I will give you a multimodal question and an answer. Your goal is to judge a reward process and give a score between 0 and 1. You should focus on whether the reasoning process is good rather than whether the final answer is correct.### Evaluation Criteria:
+- **Logical Soundness**: Does each step follow logically from the previous one?
+- **Correct Reasoning**: Are the methods and steps used appropriate and valid? Are the facts and lemmas correctly stated and applied?
+- **Error Identification**: Are there any logical fallacies, unsupported assumptions, or incorrect steps?
+- **Language Consistency**: Is the reasoning process conducted in a single, consistent language without mixing different languages?
+- **Redundancy**: Is the reasoning concise, without unnecessary repetition or extraneous steps?
+Provide a single score from **{{0, 0.1, 0.2, ..., 1.0}}** based on the reasoning quality, where:
+ - **0**: Completely flawed reasoning
+- **1**: Perfectly sound reasoning
+- Intermediate values (e.g., 0.3, 0.7) should reflect partial correctness or minor errors.
+Be strict, reward the good process and punish the bad one. You should only output the score without any explanation.
     Question: {prompt_str}
     Reasoning process: {reasoning_str}
     """