OpenRubrics
/

RubricRM-4B-Judge

Model card Files Files and versions

lliutianc commited on Oct 11, 2025

Commit

5b040a1

·

verified ·

1 Parent(s): 23eac5c

Update README.md

Files changed (1) hide show

README.md +74 -0

README.md CHANGED Viewed

@@ -11,6 +11,80 @@ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
 ```
 If you find our work helpful, please consider citing our paper:
 ```

 ```
+To evaluate the model, please use the following format to build up message.
+```python
+JUDGE_PROMPT_TEMPLATE = (
+    f"You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' "
+    f"based on a given instruction and a rubric. You will conduct this evaluation in distinct "
+    f"phases as outlined below.\n\n"
+    f"### Phase 1: Compliance Check Instructions\n"
+    f"First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.\n"
+    f"- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. "
+    f"Key examples are: word/paragraph limits, required output format (e.g., JSON validity), "
+    f"required/forbidden sections, or forbidden content.**\n"
+    f"- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. "
+    f"Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" "
+    f"\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**\n\n"
+    f"### Phase 2: Analyze Each Response\n"
+    f"Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each "
+    f"response item by item.\n\n"
+    f"### Phase 3: Final Judgment Instructions\n"
+    f"Based on the results from the previous phases, determine the winner using these simple rules. "
+    f"Provide a final justification explaining your decision first and then give your decision.\n\n"
+    f"---\n"
+    f"### REQUIRED OUTPUT FORMAT\n"
+    f"You must follow this exact output format below.\n\n"
+    f"--- Compliance Check ---\n"
+    f"Identified Gatekeeper Criterion: <e.g., Criterion 1: Must be under 50 words.>\n\n"
+    f"--- Analysis ---\n"
+    f"**Response A:**\n"
+    f"- Criterion 1 [Hard Rule]: Justification: <...>\n"
+    f"- Criterion 2 [Hard Rule]: Justification: <...>\n"
+    f"- Criterion 3 [Principle]: Justification: <...>\n"
+    f"- ... (and so on for all other criteria)\n\n"
+    f"**Response B:**\n"
+    f"- Criterion 1 [Hard Rule]: Justification: <...>\n"
+    f"- Criterion 2 [Hard Rule]: Justification: <...>\n"
+    f"- Criterion 3 [Principle]: Justification: <...>\n"
+    f"- ... (and so on for all other criteria)\n\n"
+    f"--- Final Judgment ---\n"
+    f"Justification: <...>\n"
+    f"Winner: <Response A / Response B>\n\n\n"
+    f"Task to Evaluate:\n"
+    "Instruction:\n{instruction}\n\n"
+    "Rubric:\n{rubric}\n\n"
+    "Response A:\n{response_a}\n\n"
+    "Response B:\n{response_b}"
+)
+user_text = JUDGE_PROMPT_TEMPLATE.format(
+      instruction=instruction,
+      rubric=rubric,
+      response_a=response_a,
+      response_b=response_b
+  )
+messages_list = [
+    {"role": "user", "content": user_text},
+]
+message = tok.apply_chat_template(
+    messages_list,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=False
+)
+# Remaining step: Use either HF or vLLM for evaluation.
+# ...
+# ...
+```
 If you find our work helpful, please consider citing our paper:
 ```