Add pipeline tag, library name and link to Github repo (#1)

5cb943d verified 8 months ago

4 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	This is the Thinking Reward Model of SophiaVL-R1 (https://arxiv.org/abs/2505.17018).

	The code for SophiaVL-R1 can be found at https://github.com/kxfan2002/SophiaVL-R1.
	This model is finetuned with the [SophiaVL-R1-Thinking-156k Dataset](https://huggingface.co/datasets/bunny127/SophiaVL-R1-Thinking-156k). The base model is Qwen2.5-VL-3B.

	The input of Thinking Reward Model is a question with model response. Thinking Reward Model will output a score between 0 and 1 indicating the thinking quality of model response.

	We provide a command to deploy the Thinking Reward Model using vLLM:
	```bash
	python3 -m vllm.entrypoints.openai.api_server --port 80 --model /path/to/thinking/reward/model --served-model-name thinking-reward-model --tensor-parallel-size 2 --max-num-seqs 64 --max_model_len=32768
	```

	We provide a script to query the deployed model for thinking reward:
	```python
	import httpx
	import time
	import base64
	from pathlib import Path

	openai_api_base = "vllm-url"
	reward_model = "thinking-reward-model"
	question = "your question"
	image = "your image path"
	answer = "your model response"

	def encode_image_base64(image_path):
	with open(image_path, "rb") as image_file:
	return base64.b64encode(image_file.read()).decode("utf-8")

	def get_process_reward(prompt_str, reasoning_str, image_path=None):
	image_base64 = None
	if image_path is not None:
	image_base64 = encode_image_base64(image_path)
	if "<image>" not in prompt_str:
	prompt_str = f"<image> {prompt_str}"

	prompt = f"""You are an expert reasoning evaluator. I will give you a multimodal question and an answer. Your goal is to judge a reward process and give a score between 0 and 1. You should focus on whether the reasoning process is good rather than whether the final answer is correct.### Evaluation Criteria:
	- Logical Soundness: Does each step follow logically from the previous one?
	- Correct Reasoning: Are the methods and steps used appropriate and valid? Are the facts and lemmas correctly stated and applied?
	- Error Identification: Are there any logical fallacies, unsupported assumptions, or incorrect steps?
	- Language Consistency: Is the reasoning process conducted in a single, consistent language without mixing different languages?
	- Redundancy: Is the reasoning concise, without unnecessary repetition or extraneous steps?
	Provide a single score from {{0, 0.1, 0.2, ..., 1.0}} based on the reasoning quality, where:
	- 0: Completely flawed reasoning
	- 1: Perfectly sound reasoning
	- Intermediate values (e.g., 0.3, 0.7) should reflect partial correctness or minor errors.
	Be strict, reward the good process and punish the bad one. You should only output the score without any explanation.
	Question: {prompt_str}
	Reasoning process: {reasoning_str}
	"""

	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": [{"type": "text", "text": prompt}]},
	]

	if image_base64 is not None:
	messages[1]["content"].append({
	"type": "image_url",
	"image_url": {"url": f"data:image/png;base64,{image_base64}"},
	})

	payload = {
	"model": reward_model,
	"messages": messages,
	"temperature": 0.0,
	}

	attempt = 0
	max_retry = 10
	while attempt < max_retry:
	try:
	response = httpx.post(openai_api_base, headers={"Content-Type": "application/json"}, json=payload, timeout=60)
	response.raise_for_status()
	result = response.json()["choices"][0]["message"]["content"]
	print(result)
	return 0
	except Exception as e:
	print(f"[Attempt {attempt+1}] get_process_reward failed: {e}, message: {prompt_str}")
	attempt += 1
	time.sleep(1)
	return 0

	get_process_reward(qestion,answer,image)
	```