KeenWoo commited on
Commit
ee1745e
·
verified ·
1 Parent(s): 394665b

Update evaluate.py

Browse files
Files changed (1) hide show
  1. evaluate.py +2 -2
evaluate.py CHANGED
@@ -69,10 +69,10 @@ QUERY_TYPE: {query_type}
69
  --- Specific Judging Criteria by QUERY_TYPE ---
70
  - If QUERY_TYPE is 'caregiving_scenario' AND the user is the patient:
71
  - Apply the rubric with a focus on **emotional support and validation**. The answer does NOT need to be factually exhaustive to get a high score. A 1.0 score means it provided excellent emotional comfort that aligns with the ground truth's intent.
72
- - It should ONLY be scored 0.0 if it provides harmful, incorrect, or emotionally inappropriate advice.
73
  - If QUERY_TYPE is 'factual_question':
74
  - Apply the rubric with a focus on **factual accuracy**. The answer must be factually aligned with the ground truth to get a high score.
75
- - Any empathetic or conversational language in the generated answer should be **completely ignored**; only the factual statements are to be graded against the ground truth.
76
  - For all other QUERY_TYPEs:
77
  - Default to applying the rubric with a focus on factual accuracy.
78
 
 
69
  --- Specific Judging Criteria by QUERY_TYPE ---
70
  - If QUERY_TYPE is 'caregiving_scenario' AND the user is the patient:
71
  - Apply the rubric with a focus on **emotional support and validation**. The answer does NOT need to be factually exhaustive to get a high score. A 1.0 score means it provided excellent emotional comfort that aligns with the ground truth's intent.
72
+ # - It should ONLY be scored 0.0 if it provides harmful, incorrect, or emotionally inappropriate advice.
73
  - If QUERY_TYPE is 'factual_question':
74
  - Apply the rubric with a focus on **factual accuracy**. The answer must be factually aligned with the ground truth to get a high score.
75
+ # - Any empathetic or conversational language in the generated answer should be **completely ignored**; only the factual statements are to be graded against the ground truth.
76
  - For all other QUERY_TYPEs:
77
  - Default to applying the rubric with a focus on factual accuracy.
78