| # OpenRubrics/RubricARM-8B-Judge | |
| This is a 8B RubricARM-Judge model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). | |
| See our [paper](https://arxiv.org/abs/2602.01511) for more details. | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "OpenRubrics/RubricARM-8B-Judge" | |
| tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto") | |
| ``` | |
| To evaluate the model, please use the following format to build up message. | |
| Here `rubric` should be generated with a `RubricARM-Rubric` | |
| ```python | |
| JUDGE_PROMPT_TEMPLATE = ( | |
| "You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' " | |
| "based on a given instruction and a rubric. You will conduct this evaluation in distinct " | |
| "phases as outlined below.\n\n" | |
| "### Phase 1: Compliance Check Instructions\n" | |
| "First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.\n" | |
| "- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. " | |
| "Key examples are: word/paragraph limits, required output format (e.g., JSON validity), " | |
| "required/forbidden sections, or forbidden content.**\n" | |
| "- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. " | |
| "Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" " | |
| "\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**\n" | |
| f"Think step-by-step to determine this single most important Gatekeeper, then write a 1–2 sentence explanation of your decision.\n\n" | |
| "### Phase 2: Analyze Each Response\n" | |
| "Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each " | |
| "response item by item.\n" | |
| "For each item, think step-by-step and cite concrete evidence from the response before assigning your judgment.\n\n" | |
| "### Phase 3: Final Judgment Instructions\n" | |
| "Based on the results from the previous phases, determine the winner using these simple rules. " | |
| "Provide a final justification explaining your decision first and then give your decision.\n" | |
| "Think step-by-step to aggregate the findings and make the decision; keep the reasoning explicit and concise.\n\n" | |
| "---\n" | |
| "### REQUIRED OUTPUT FORMAT\n" | |
| "You must follow this exact output format below.\n\n" | |
| "--- Compliance Check ---\n" | |
| "Gatekeeper Reasoning: <1–2 sentences citing the relevant rubric text>\n" | |
| "Identified Gatekeeper Criterion: <e.g., Criterion 1: Must be under 50 words.>\n\n" | |
| "--- Analysis ---\n" | |
| "**Response A:**\n" | |
| "- Criterion 1 [Hard Rule]: Justification: <...>\n" | |
| "- Criterion 2 [Hard Rule]: Justification: <...>\n" | |
| "- Criterion 3 [Principle]: Justification: <...>\n" | |
| "- ... (and so on for all other criteria)\n\n" | |
| "**Response B:**\n" | |
| "- Criterion 1 [Hard Rule]: Justification: <...>\n" | |
| "- Criterion 2 [Hard Rule]: Justification: <...>\n" | |
| "- Criterion 3 [Principle]: Justification: <...>\n" | |
| "- ... (and so on for all other criteria)\n\n" | |
| "--- Final Judgment ---\n" | |
| # "Aggregation Summary: <Provide a detailed, step-by-step explanation (3–6 sentences) of how the Gatekeeper and other criteria led to the decision>\n" | |
| "Aggregation Summary: <1–3 sentences explaining how Gatekeeper and other criteria led to the decision>\n" | |
| "Justification: <...>\n" | |
| "Winner: <Response A / Response B>\n\n\n" | |
| "Task to Evaluate:\n" | |
| "Instruction:\n{instruction}\n\n" | |
| "Rubric:\n{rubric}\n\n" | |
| "Response A:\n{response_a}\n\n" | |
| "Response B:\n{response_b}" | |
| ) | |
| user_text = JUDGE_PROMPT_TEMPLATE.format( | |
| instruction=instruction, | |
| rubric=rubric, | |
| response_a=response_a, | |
| response_b=response_b | |
| ) | |
| messages_list = [ | |
| {"role": "user", "content": user_text}, | |
| ] | |
| message = tok.apply_chat_template( | |
| messages_list, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| enable_thinking=False | |
| ) | |
| # Remaining step: Use either HF or vLLM for evaluation. | |
| # ... | |
| # ... | |
| ``` | |
| If you find our work helpful, please consider citing our paper: | |
| ``` | |
| @misc{xu2026alternating, | |
| title={Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training}, | |
| author={Ran Xu and Tianci Liu and Zihan Dong and Tony You and Ilgee Hong and Carl Yang and Linjun Zhang and Tao Zhao and Haoyu Wang}, | |
| year={2026}, | |
| eprint={2602.01511}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2602.01511}, | |
| } | |
| ``` | |