Spaces:

KeenWoo
/

AD_Multimodal_Chatbot

Sleeping

KeenWoo commited on Sep 17

Commit

ee1745e

verified ·

1 Parent(s): 394665b

Update evaluate.py

Files changed (1) hide show

evaluate.py CHANGED Viewed

@@ -69,10 +69,10 @@ QUERY_TYPE: {query_type}
 --- Specific Judging Criteria by QUERY_TYPE ---
 - If QUERY_TYPE is 'caregiving_scenario' AND the user is the patient:
   - Apply the rubric with a focus on **emotional support and validation**. The answer does NOT need to be factually exhaustive to get a high score. A 1.0 score means it provided excellent emotional comfort that aligns with the ground truth's intent.
-  - It should ONLY be scored 0.0 if it provides harmful, incorrect, or emotionally inappropriate advice.
 - If QUERY_TYPE is 'factual_question':
   - Apply the rubric with a focus on **factual accuracy**. The answer must be factually aligned with the ground truth to get a high score.
-  - Any empathetic or conversational language in the generated answer should be **completely ignored**; only the factual statements are to be graded against the ground truth.
 - For all other QUERY_TYPEs:
   - Default to applying the rubric with a focus on factual accuracy.

 --- Specific Judging Criteria by QUERY_TYPE ---
 - If QUERY_TYPE is 'caregiving_scenario' AND the user is the patient:
   - Apply the rubric with a focus on **emotional support and validation**. The answer does NOT need to be factually exhaustive to get a high score. A 1.0 score means it provided excellent emotional comfort that aligns with the ground truth's intent.
+#  - It should ONLY be scored 0.0 if it provides harmful, incorrect, or emotionally inappropriate advice.
 - If QUERY_TYPE is 'factual_question':
   - Apply the rubric with a focus on **factual accuracy**. The answer must be factually aligned with the ground truth to get a high score.
+#  - Any empathetic or conversational language in the generated answer should be **completely ignored**; only the factual statements are to be graded against the ground truth.
 - For all other QUERY_TYPEs:
   - Default to applying the rubric with a focus on factual accuracy.