lliutianc commited on
Commit
5b040a1
·
verified ·
1 Parent(s): 23eac5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md CHANGED
@@ -11,6 +11,80 @@ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
11
  ```
12
 
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  If you find our work helpful, please consider citing our paper:
15
 
16
  ```
 
11
  ```
12
 
13
 
14
+ To evaluate the model, please use the following format to build up message.
15
+
16
+ ```python
17
+
18
+ JUDGE_PROMPT_TEMPLATE = (
19
+ f"You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' "
20
+ f"based on a given instruction and a rubric. You will conduct this evaluation in distinct "
21
+ f"phases as outlined below.\n\n"
22
+ f"### Phase 1: Compliance Check Instructions\n"
23
+ f"First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.\n"
24
+ f"- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. "
25
+ f"Key examples are: word/paragraph limits, required output format (e.g., JSON validity), "
26
+ f"required/forbidden sections, or forbidden content.**\n"
27
+ f"- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. "
28
+ f"Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" "
29
+ f"\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**\n\n"
30
+ f"### Phase 2: Analyze Each Response\n"
31
+ f"Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each "
32
+ f"response item by item.\n\n"
33
+ f"### Phase 3: Final Judgment Instructions\n"
34
+ f"Based on the results from the previous phases, determine the winner using these simple rules. "
35
+ f"Provide a final justification explaining your decision first and then give your decision.\n\n"
36
+ f"---\n"
37
+ f"### REQUIRED OUTPUT FORMAT\n"
38
+ f"You must follow this exact output format below.\n\n"
39
+ f"--- Compliance Check ---\n"
40
+ f"Identified Gatekeeper Criterion: <e.g., Criterion 1: Must be under 50 words.>\n\n"
41
+ f"--- Analysis ---\n"
42
+ f"**Response A:**\n"
43
+ f"- Criterion 1 [Hard Rule]: Justification: <...>\n"
44
+ f"- Criterion 2 [Hard Rule]: Justification: <...>\n"
45
+ f"- Criterion 3 [Principle]: Justification: <...>\n"
46
+ f"- ... (and so on for all other criteria)\n\n"
47
+ f"**Response B:**\n"
48
+ f"- Criterion 1 [Hard Rule]: Justification: <...>\n"
49
+ f"- Criterion 2 [Hard Rule]: Justification: <...>\n"
50
+ f"- Criterion 3 [Principle]: Justification: <...>\n"
51
+ f"- ... (and so on for all other criteria)\n\n"
52
+ f"--- Final Judgment ---\n"
53
+ f"Justification: <...>\n"
54
+ f"Winner: <Response A / Response B>\n\n\n"
55
+ f"Task to Evaluate:\n"
56
+ "Instruction:\n{instruction}\n\n"
57
+ "Rubric:\n{rubric}\n\n"
58
+ "Response A:\n{response_a}\n\n"
59
+ "Response B:\n{response_b}"
60
+ )
61
+
62
+ user_text = JUDGE_PROMPT_TEMPLATE.format(
63
+ instruction=instruction,
64
+ rubric=rubric,
65
+ response_a=response_a,
66
+ response_b=response_b
67
+ )
68
+
69
+ messages_list = [
70
+ {"role": "user", "content": user_text},
71
+ ]
72
+ message = tok.apply_chat_template(
73
+ messages_list,
74
+ tokenize=False,
75
+ add_generation_prompt=True,
76
+ enable_thinking=False
77
+ )
78
+
79
+ # Remaining step: Use either HF or vLLM for evaluation.
80
+ # ...
81
+ # ...
82
+ ```
83
+
84
+
85
+
86
+
87
+
88
  If you find our work helpful, please consider citing our paper:
89
 
90
  ```