Spaces:

human-labeling
/

README

Running

App Files Files Community

tnwjddla2190 commited on Dec 23, 2025

Commit

dd80eb0

verified ·

1 Parent(s): ed6159c

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -27

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ sdk: static
 - 모순 여부나 사실 관계를 검증하기 어려운 추상적인 질문을 한 경우
   (e.g., "너의 취미는 뭐야?", "너의 인생에서 가장 중요한 가치는 뭐야?")
-- 인터뷰이 본인의 정보 및 경험과는 연관이 낮고 외부 지식을 이용해 답변해야 하는 질문
   (e.g., "나는 구글에 다녀." → "구글 설립 연도는 언제야?")
   - 예외) "나는 구글 창립자야." → "구글의 설립 연도는 언제야?" 처럼 인터뷰이가 직접 참여한 이벤트/사건/경험과 밀접한 질문은 허용함. 따라서 이전 질문과 답변들을 함께 고려해서 평가해야 함.
@@ -59,50 +59,58 @@ Very Good 기준을 모두 충족할 경우 Very Good, 1-2개 아쉬운 부분
 # Labeling Guideline
 Thank you for participating in this labeling project!
-In this task, you will review interview transcripts of two different AI interviewers engaging with the same interviewee. Your goal is to evaluate the questioning capabilities of each AI interviewer.
 There are two main tasks you need to perform:
-Compare and Choose: Determine which of the two interviewers asks better questions.
-Score: Rate the quality of each interviewer on a 5-point scale.
-# Evaluation Guideline
-The criteria for evaluating questioning ability are as follows. An interviewer who meets all five of the following items is considered to have performed excellently.
-### Criteria for "Very Good"
-- Depth of Inquiry: Did the interviewer continue questioning until the response regarding a specific topic was sufficiently detailed?
-- Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., questions that allow for detecting contradictions or can be cross-referenced via external search, such as dates, addresses, IDs, organization names, emails, or names of supervisors/colleagues).
-- Relevance: Were the questions specifically tailored and relevant to the interviewee?
-- Cohesion: Is there a high degree of interconnection between the questions?
-- Conflict Resolution: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with multiple questions related to those contradictions?
-### "Very Bad" Cases
-Conversely, the following are examples of poor questioning:
-- Premature Topic Switching: Moving to a completely different topic before a single subject has been sufficiently explored.
-- Abstract Questions: Asking vague questions where it is difficult to verify facts or detect contradictions (e.g., "What are your hobbies?", "What is the most important value in your life?").
-- Low Personal Relevance: Asking questions with little connection to the interviewee (e.g., Interviewee: "I work at Google." -> Interviewer: "When is the CEO of Google’s birthday?").
-- Lack of Correlation: Questions are so unrelated that it is impossible to judge internal consistency or contradictions.
-- Ignoring Discrepancies: Moving on to unrelated questions even after a contradiction was detected in the previous conversation.
 ### Scoring Summary
-Very Good: Meets all "Very Good" criteria.
-Good: Misses 1–2 criteria or has minor room for improvement.
-So-so: Fails to meet approximately 3 of the criteria.
-Bad: Meets only 1 of the criteria.
 Very Bad: Meets none of the criteria.
-## Caution
-- Focus on Questions $\rightarrow$ When evaluating the interviewer, ignore the interviewee's response quality and focus only on the interviewer's questioning ability.
-- Holistic Strategy $\rightarrow$ Consider the overall questioning strategy and flow, not just individual questions in isolation.
 📧 minskim010203@gmail.com, imsujeong2190@gmail.com

 - 모순 여부나 사실 관계를 검증하기 어려운 추상적인 질문을 한 경우
   (e.g., "너의 취미는 뭐야?", "너의 인생에서 가장 중요한 가치는 뭐야?")
+- 인터뷰이 본인의 정보 및 경험과는 연관이 낮고 외부 지식을 이용해 답변해야 하는 질문을 한 경우
   (e.g., "나는 구글에 다녀." → "구글 설립 연도는 언제야?")
   - 예외) "나는 구글 창립자야." → "구글의 설립 연도는 언제야?" 처럼 인터뷰이가 직접 참여한 이벤트/사건/경험과 밀접한 질문은 허용함. 따라서 이전 질문과 답변들을 함께 고려해서 평가해야 함.
 # Labeling Guideline
 Thank you for participating in this labeling project!
+In this task, you will review interview transcripts of two different AI interviewers (A and B) interacting with the same interviewee. Your goal is to evaluate the questioning capabilities of these AI interviewers.
 There are two main tasks you need to perform:
+Compare: Determine which of the two interviewers (A or B) asks better questions.
+Rate: Evaluate the quality of each interviewer on a 5-point scale.
+## Evaluation Guideline
+The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five of the following items is considered to have performed excellently.
+### "Very Good" Criteria
+- Depth: Did the interviewer continue questioning a single topic until the response became sufficiently specific?
+- Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., information that can be checked for contradictions or verified through external search).
+- Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, names of supervisors/colleagues, etc.
+- Personalization: Are the questions tailored specifically to the interviewee? (i.e., highly relevant to the interviewee's specific experiences and previous answers).
+- Coherence: Is there a high degree of interconnection between the questions?
+- Contradiction Handling: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with questions related to those contradictions?
+### "Very Bad" Cases
+Conversely, the following cases indicate poor questioning:
+- Premature Topic Switching: Moving to a completely different topic before a current topic has been sufficiently detailed.
+- Abstract Questions: Asking abstract questions that make it difficult to verify facts or detect contradictions.
+  - Examples: "What is your hobby?", "What is the most important value in your life?"
+- External Knowledge Dependency: Asking questions that rely on general external knowledge rather than the interviewee's own information and experiences.
+  - Example: Interviewee says "I work at Google" → Interviewer asks "When was Google founded?"
+  - Exception: If the question is closely linked to an event/experience the interviewee was directly involved in, it is acceptable. (e.g., Interviewee says "I am the founder of Google" → "When was Google founded?"). Evaluation must consider the context of previous questions and answers.
+- Low Relatedness: Questions that are so unrelated that it is impossible to judge mutual contradictions.
+- Ignoring Inconsistencies: Moving on to an unrelated question even though a contradiction was detected in the previous dialogue.
 ### Scoring Summary
+Very Good: Meets all criteria.
+Good: Fails to meet 1–2 criteria.
+So-so: Fails to meet approximately 3 criteria.
+Bad: Meets only 1 criterion.
 Very Bad: Meets none of the criteria.
+# Caution
+When evaluating the interviewer, focus only on the interviewer's questioning ability. Do not let the quality of the interviewee's answers influence your score.
+Consider the overall questioning strategy throughout the transcript, not just individual questions in isolation.
 📧 minskim010203@gmail.com, imsujeong2190@gmail.com