OpenRubrics
/

RubricARM-8B-Judge

Safetensors

qwen3

Model card Files Files and versions

xet

Community

Add metadata and link to paper

by nielsr HF Staff - opened Feb 4

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+108

-45

Files changed (1) hide show

README.md +108 -45

README.md CHANGED Viewed

@@ -1,8 +1,17 @@
 # OpenRubrics/RubricARM-8B-Judge
 This is a 8B RubricARM-Judge model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
-See our [paper](https://arxiv.org/abs/2602.01511) for more details.
 ## Usage
 ```python
@@ -20,53 +29,108 @@ Here `rubric` should be generated with a `RubricARM-Rubric`
 JUDGE_PROMPT_TEMPLATE = (
     "You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' "
     "based on a given instruction and a rubric. You will conduct this evaluation in distinct "
-    "phases as outlined below.\n\n"
-    "### Phase 1: Compliance Check Instructions\n"
-    "First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.\n"
     "- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. "
     "Key examples are: word/paragraph limits, required output format (e.g., JSON validity), "
-    "required/forbidden sections, or forbidden content.**\n"
     "- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. "
     "Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" "
-    "\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**\n"
-    f"Think step-by-step to determine this single most important Gatekeeper, then write a 1–2 sentence explanation of your decision.\n\n"
-    "### Phase 2: Analyze Each Response\n"
     "Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each "
-    "response item by item.\n"
-    "For each item, think step-by-step and cite concrete evidence from the response before assigning your judgment.\n\n"
-    "### Phase 3: Final Judgment Instructions\n"
     "Based on the results from the previous phases, determine the winner using these simple rules. "
-    "Provide a final justification explaining your decision first and then give your decision.\n"
-    "Think step-by-step to aggregate the findings and make the decision; keep the reasoning explicit and concise.\n\n"
-    "---\n"
-    "### REQUIRED OUTPUT FORMAT\n"
-    "You must follow this exact output format below.\n\n"
-    "--- Compliance Check ---\n"
-    "Gatekeeper Reasoning: <1–2 sentences citing the relevant rubric text>\n"
-    "Identified Gatekeeper Criterion: <e.g., Criterion 1: Must be under 50 words.>\n\n"
-    "--- Analysis ---\n"
-    "**Response A:**\n"
-    "- Criterion 1 [Hard Rule]: Justification: <...>\n"
-    "- Criterion 2 [Hard Rule]: Justification: <...>\n"
-    "- Criterion 3 [Principle]: Justification: <...>\n"
-    "- ... (and so on for all other criteria)\n\n"
-    "**Response B:**\n"
-    "- Criterion 1 [Hard Rule]: Justification: <...>\n"
-    "- Criterion 2 [Hard Rule]: Justification: <...>\n"
-    "- Criterion 3 [Principle]: Justification: <...>\n"
-    "- ... (and so on for all other criteria)\n\n"
-    "--- Final Judgment ---\n"
-    # "Aggregation Summary: <Provide a detailed, step-by-step explanation (3–6 sentences) of how the Gatekeeper and other criteria led to the decision>\n"
-    "Aggregation Summary: <1–3 sentences explaining how Gatekeeper and other criteria led to the decision>\n"
-    "Justification: <...>\n"
-    "Winner: <Response A / Response B>\n\n\n"
-    "Task to Evaluate:\n"
-    "Instruction:\n{instruction}\n\n"
-    "Rubric:\n{rubric}\n\n"
-    "Response A:\n{response_a}\n\n"
-    "Response B:\n{response_b}"
 )
 user_text = JUDGE_PROMPT_TEMPLATE.format(
@@ -90,13 +154,12 @@ message = tok.apply_chat_template(
 # ...
 # ...
 ```
 If you find our work helpful, please consider citing our paper:
-```
 @misc{xu2026alternating,
       title={Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training},
       author={Ran Xu and Tianci Liu and Zihan Dong and Tony You and Ilgee Hong and Carl Yang and Linjun Zhang and Tao Zhao and Haoyu Wang},
@@ -106,4 +169,4 @@ If you find our work helpful, please consider citing our paper:
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2602.01511},
 }
-```

+---
+pipeline_tag: text-generation
+library_name: transformers
+base_model: Qwen/Qwen3-8B
+tags:
+- reward-modeling
+- alignment
+- rubric-based-evaluation
+---
 # OpenRubrics/RubricARM-8B-Judge
 This is a 8B RubricARM-Judge model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
+It was introduced in the paper [Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training](https://huggingface.co/papers/2602.01511).
 ## Usage
 ```python
 JUDGE_PROMPT_TEMPLATE = (
     "You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' "
     "based on a given instruction and a rubric. You will conduct this evaluation in distinct "
+    "phases as outlined below.
+"
+    "### Phase 1: Compliance Check Instructions
+"
+    "First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.
+"
     "- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. "
     "Key examples are: word/paragraph limits, required output format (e.g., JSON validity), "
+    "required/forbidden sections, or forbidden content.**
+"
     "- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. "
     "Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" "
+    "\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**
+"
+    f"Think step-by-step to determine this single most important Gatekeeper, then write a 1–2 sentence explanation of your decision.
+"
+    "### Phase 2: Analyze Each Response
+"
     "Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each "
+    "response item by item.
+"
+    "For each item, think step-by-step and cite concrete evidence from the response before assigning your judgment.
+"
+    "### Phase 3: Final Judgment Instructions
+"
     "Based on the results from the previous phases, determine the winner using these simple rules. "
+    "Provide a final justification explaining your decision first and then give your decision.
+"
+    "Think step-by-step to aggregate the findings and make the decision; keep the reasoning explicit and concise.
+"
+    "---
+"
+    "### REQUIRED OUTPUT FORMAT
+"
+    "You must follow this exact output format below.
+"
+    "--- Compliance Check ---
+"
+    "Gatekeeper Reasoning: <1–2 sentences citing the relevant rubric text>
+"
+    "Identified Gatekeeper Criterion: <e.g., Criterion 1: Must be under 50 words.>
+"
+    "--- Analysis ---
+"
+    "**Response A:**
+"
+    "- Criterion 1 [Hard Rule]: Justification: <...>
+"
+    "- Criterion 2 [Hard Rule]: Justification: <...>
+"
+    "- Criterion 3 [Principle]: Justification: <...>
+"
+    "- ... (and so on for all other criteria)
+"
+    "**Response B:**
+"
+    "- Criterion 1 [Hard Rule]: Justification: <...>
+"
+    "- Criterion 2 [Hard Rule]: Justification: <...>
+"
+    "- Criterion 3 [Principle]: Justification: <...>
+"
+    "- ... (and so on for all other criteria)
+"
+    "--- Final Judgment ---
+"
+    # "Aggregation Summary: <Provide a detailed, step-by-step explanation (3–6 sentences) of how the Gatekeeper and other criteria led to the decision>
+"
+    "Aggregation Summary: <1–3 sentences explaining how Gatekeeper and other criteria led to the decision>
+"
+    "Justification: <...>
+"
+    "Winner: <Response A / Response B>
+"
+    "Task to Evaluate:
+"
+    "Instruction:
+{instruction}
+"
+    "Rubric:
+{rubric}
+"
+    "Response A:
+{response_a}
+"
+    "Response B:
+{response_b}"
 )
 user_text = JUDGE_PROMPT_TEMPLATE.format(
 # ...
 # ...
 ```
+## Citation
 If you find our work helpful, please consider citing our paper:
+```bibtex
 @misc{xu2026alternating,
       title={Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training},
       author={Ran Xu and Tianci Liu and Zihan Dong and Tony You and Ilgee Hong and Carl Yang and Linjun Zhang and Tao Zhao and Haoyu Wang},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2602.01511},
 }
+```