bunny127 nielsr HF Staff commited on
Commit
5cb943d
·
verified ·
1 Parent(s): 2eb95cc

Add pipeline tag, library name and link to Github repo (#1)

Browse files

- Add pipeline tag, library name and link to Github repo (79b1c9ec911f96d016a57564a5c0e015ba295800)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -1,8 +1,12 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
- This is the Thinking Reward Model of SophiaVL-R1 (https://arxiv.org/abs/2505.17018).
5
 
 
 
 
6
  This model is finetuned with the [SophiaVL-R1-Thinking-156k Dataset](https://huggingface.co/datasets/bunny127/SophiaVL-R1-Thinking-156k). The base model is Qwen2.5-VL-3B.
7
 
8
  The input of Thinking Reward Model is a question with model response. Thinking Reward Model will output a score between 0 and 1 indicating the thinking quality of model response.
@@ -36,7 +40,17 @@ def get_process_reward(prompt_str, reasoning_str, image_path=None):
36
  if "<image>" not in prompt_str:
37
  prompt_str = f"<image> {prompt_str}"
38
 
39
- prompt = f"""You are an expert reasoning evaluator. I will give you a multimodal question and an answer. Your goal is to judge a reward process and give a score between 0 and 1. You should focus on whether the reasoning process is good rather than whether the final answer is correct.### Evaluation Criteria:\n- **Logical Soundness**: Does each step follow logically from the previous one?\n- **Correct Reasoning**: Are the methods and steps used appropriate and valid? Are the facts and lemmas correctly stated and applied?\n- **Error Identification**: Are there any logical fallacies, unsupported assumptions, or incorrect steps?\n- **Language Consistency**: Is the reasoning process conducted in a single, consistent language without mixing different languages?\n- **Redundancy**: Is the reasoning concise, without unnecessary repetition or extraneous steps?\nProvide a single score from **{{0, 0.1, 0.2, ..., 1.0}}** based on the reasoning quality, where:\n - **0**: Completely flawed reasoning\n- **1**: Perfectly sound reasoning\n- Intermediate values (e.g., 0.3, 0.7) should reflect partial correctness or minor errors.\nBe strict, reward the good process and punish the bad one. You should only output the score without any explanation.
 
 
 
 
 
 
 
 
 
 
40
  Question: {prompt_str}
41
  Reasoning process: {reasoning_str}
42
  """
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
  ---
 
6
 
7
+ This is the Thinking Reward Model of SophiaVL-R1 (https://arxiv.org/abs/2505.17018).
8
+
9
+ The code for SophiaVL-R1 can be found at https://github.com/kxfan2002/SophiaVL-R1.
10
  This model is finetuned with the [SophiaVL-R1-Thinking-156k Dataset](https://huggingface.co/datasets/bunny127/SophiaVL-R1-Thinking-156k). The base model is Qwen2.5-VL-3B.
11
 
12
  The input of Thinking Reward Model is a question with model response. Thinking Reward Model will output a score between 0 and 1 indicating the thinking quality of model response.
 
40
  if "<image>" not in prompt_str:
41
  prompt_str = f"<image> {prompt_str}"
42
 
43
+ prompt = f"""You are an expert reasoning evaluator. I will give you a multimodal question and an answer. Your goal is to judge a reward process and give a score between 0 and 1. You should focus on whether the reasoning process is good rather than whether the final answer is correct.### Evaluation Criteria:
44
+ - **Logical Soundness**: Does each step follow logically from the previous one?
45
+ - **Correct Reasoning**: Are the methods and steps used appropriate and valid? Are the facts and lemmas correctly stated and applied?
46
+ - **Error Identification**: Are there any logical fallacies, unsupported assumptions, or incorrect steps?
47
+ - **Language Consistency**: Is the reasoning process conducted in a single, consistent language without mixing different languages?
48
+ - **Redundancy**: Is the reasoning concise, without unnecessary repetition or extraneous steps?
49
+ Provide a single score from **{{0, 0.1, 0.2, ..., 1.0}}** based on the reasoning quality, where:
50
+ - **0**: Completely flawed reasoning
51
+ - **1**: Perfectly sound reasoning
52
+ - Intermediate values (e.g., 0.3, 0.7) should reflect partial correctness or minor errors.
53
+ Be strict, reward the good process and punish the bad one. You should only output the score without any explanation.
54
  Question: {prompt_str}
55
  Reasoning process: {reasoning_str}
56
  """