tnwjddla2190 commited on
Commit
dd80eb0
ยท
verified ยท
1 Parent(s): ed6159c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -27
README.md CHANGED
@@ -37,7 +37,7 @@ sdk: static
37
  - ๋ชจ์ˆœ ์—ฌ๋ถ€๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ค์šด ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ
38
 
39
  (e.g., "๋„ˆ์˜ ์ทจ๋ฏธ๋Š” ๋ญ์•ผ?", "๋„ˆ์˜ ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋Š” ๋ญ์•ผ?")
40
- - ์ธํ„ฐ๋ทฐ์ด ๋ณธ์ธ์˜ ์ •๋ณด ๋ฐ ๊ฒฝํ—˜๊ณผ๋Š” ์—ฐ๊ด€์ด ๋‚ฎ๊ณ  ์™ธ๋ถ€ ์ง€์‹์„ ์ด์šฉํ•ด ๋‹ต๋ณ€ํ•ด์•ผ ํ•˜๋Š” ์งˆ๋ฌธ
41
 
42
  (e.g., "๋‚˜๋Š” ๊ตฌ๊ธ€์— ๋‹ค๋…€." โ†’ "๊ตฌ๊ธ€ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?")
43
  - ์˜ˆ์™ธ) "๋‚˜๋Š” ๊ตฌ๊ธ€ ์ฐฝ๋ฆฝ์ž์•ผ." โ†’ "๊ตฌ๊ธ€์˜ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?" ์ฒ˜๋Ÿผ ์ธํ„ฐ๋ทฐ์ด๊ฐ€ ์ง์ ‘ ์ฐธ์—ฌํ•œ ์ด๋ฒคํŠธ/์‚ฌ๊ฑด/๊ฒฝํ—˜๊ณผ ๋ฐ€์ ‘ํ•œ ์งˆ๋ฌธ์€ ํ—ˆ์šฉํ•จ. ๋”ฐ๋ผ์„œ ์ด์ „ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋“ค์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์„œ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ.
@@ -59,50 +59,58 @@ Very Good ๊ธฐ์ค€์„ ๋ชจ๋‘ ์ถฉ์กฑํ•  ๊ฒฝ์šฐ Very Good, 1-2๊ฐœ ์•„์‰ฌ์šด ๋ถ€๋ถ„
59
  # Labeling Guideline
60
  Thank you for participating in this labeling project!
61
 
62
- In this task, you will review interview transcripts of two different AI interviewers engaging with the same interviewee. Your goal is to evaluate the questioning capabilities of each AI interviewer.
63
 
64
  There are two main tasks you need to perform:
65
 
66
- Compare and Choose: Determine which of the two interviewers asks better questions.
67
 
68
- Score: Rate the quality of each interviewer on a 5-point scale.
69
 
70
- # Evaluation Guideline
71
- The criteria for evaluating questioning ability are as follows. An interviewer who meets all five of the following items is considered to have performed excellently.
72
 
73
- ### Criteria for "Very Good"
74
- - Depth of Inquiry: Did the interviewer continue questioning until the response regarding a specific topic was sufficiently detailed?
 
 
 
 
 
75
 
76
- - Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., questions that allow for detecting contradictions or can be cross-referenced via external search, such as dates, addresses, IDs, organization names, emails, or names of supervisors/colleagues).
 
77
 
78
- - Relevance: Were the questions specifically tailored and relevant to the interviewee?
79
 
80
- - Cohesion: Is there a high degree of interconnection between the questions?
81
 
82
- - Conflict Resolution: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with multiple questions related to those contradictions?
83
 
84
- ### "Very Bad" Cases
85
- Conversely, the following are examples of poor questioning:
86
 
87
- - Premature Topic Switching: Moving to a completely different topic before a single subject has been sufficiently explored.
88
 
89
- - Abstract Questions: Asking vague questions where it is difficult to verify facts or detect contradictions (e.g., "What are your hobbies?", "What is the most important value in your life?").
90
 
91
- - Low Personal Relevance: Asking questions with little connection to the interviewee (e.g., Interviewee: "I work at Google." -> Interviewer: "When is the CEO of Googleโ€™s birthday?").
92
 
93
- - Lack of Correlation: Questions are so unrelated that it is impossible to judge internal consistency or contradictions.
94
-
95
- - Ignoring Discrepancies: Moving on to unrelated questions even after a contradiction was detected in the previous conversation.
96
 
97
  ### Scoring Summary
98
- Very Good: Meets all "Very Good" criteria.
99
- Good: Misses 1โ€“2 criteria or has minor room for improvement.
100
- So-so: Fails to meet approximately 3 of the criteria.
101
- Bad: Meets only 1 of the criteria.
 
 
 
 
102
  Very Bad: Meets none of the criteria.
103
 
104
- ## Caution
105
- - Focus on Questions $\rightarrow$ When evaluating the interviewer, ignore the interviewee's response quality and focus only on the interviewer's questioning ability.
106
- - Holistic Strategy $\rightarrow$ Consider the overall questioning strategy and flow, not just individual questions in isolation.
 
107
 
108
  ๐Ÿ“ง minskim010203@gmail.com, imsujeong2190@gmail.com
 
37
  - ๋ชจ์ˆœ ์—ฌ๋ถ€๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ค์šด ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ
38
 
39
  (e.g., "๋„ˆ์˜ ์ทจ๋ฏธ๋Š” ๋ญ์•ผ?", "๋„ˆ์˜ ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋Š” ๋ญ์•ผ?")
40
+ - ์ธํ„ฐ๋ทฐ์ด ๋ณธ์ธ์˜ ์ •๋ณด ๋ฐ ๊ฒฝํ—˜๊ณผ๋Š” ์—ฐ๊ด€์ด ๋‚ฎ๊ณ  ์™ธ๋ถ€ ์ง€์‹์„ ์ด์šฉํ•ด ๋‹ต๋ณ€ํ•ด์•ผ ํ•˜๋Š” ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ
41
 
42
  (e.g., "๋‚˜๋Š” ๊ตฌ๊ธ€์— ๋‹ค๋…€." โ†’ "๊ตฌ๊ธ€ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?")
43
  - ์˜ˆ์™ธ) "๋‚˜๋Š” ๊ตฌ๊ธ€ ์ฐฝ๋ฆฝ์ž์•ผ." โ†’ "๊ตฌ๊ธ€์˜ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?" ์ฒ˜๋Ÿผ ์ธํ„ฐ๋ทฐ์ด๊ฐ€ ์ง์ ‘ ์ฐธ์—ฌํ•œ ์ด๋ฒคํŠธ/์‚ฌ๊ฑด/๊ฒฝํ—˜๊ณผ ๋ฐ€์ ‘ํ•œ ์งˆ๋ฌธ์€ ํ—ˆ์šฉํ•จ. ๋”ฐ๋ผ์„œ ์ด์ „ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋“ค์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์„œ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ.
 
59
  # Labeling Guideline
60
  Thank you for participating in this labeling project!
61
 
62
+ In this task, you will review interview transcripts of two different AI interviewers (A and B) interacting with the same interviewee. Your goal is to evaluate the questioning capabilities of these AI interviewers.
63
 
64
  There are two main tasks you need to perform:
65
 
66
+ Compare: Determine which of the two interviewers (A or B) asks better questions.
67
 
68
+ Rate: Evaluate the quality of each interviewer on a 5-point scale.
69
 
70
+ ## Evaluation Guideline
71
+ The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five of the following items is considered to have performed excellently.
72
 
73
+ ### "Very Good" Criteria
74
+ - Depth: Did the interviewer continue questioning a single topic until the response became sufficiently specific?
75
+ - Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., information that can be checked for contradictions or verified through external search).
76
+ - Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, names of supervisors/colleagues, etc.
77
+ - Personalization: Are the questions tailored specifically to the interviewee? (i.e., highly relevant to the interviewee's specific experiences and previous answers).
78
+ - Coherence: Is there a high degree of interconnection between the questions?
79
+ - Contradiction Handling: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with questions related to those contradictions?
80
 
81
+ ### "Very Bad" Cases
82
+ Conversely, the following cases indicate poor questioning:
83
 
84
+ - Premature Topic Switching: Moving to a completely different topic before a current topic has been sufficiently detailed.
85
 
86
+ - Abstract Questions: Asking abstract questions that make it difficult to verify facts or detect contradictions.
87
 
88
+ - Examples: "What is your hobby?", "What is the most important value in your life?"
89
 
90
+ - External Knowledge Dependency: Asking questions that rely on general external knowledge rather than the interviewee's own information and experiences.
 
91
 
92
+ - Example: Interviewee says "I work at Google" โ†’ Interviewer asks "When was Google founded?"
93
 
94
+ - Exception: If the question is closely linked to an event/experience the interviewee was directly involved in, it is acceptable. (e.g., Interviewee says "I am the founder of Google" โ†’ "When was Google founded?"). Evaluation must consider the context of previous questions and answers.
95
 
96
+ - Low Relatedness: Questions that are so unrelated that it is impossible to judge mutual contradictions.
97
 
98
+ - Ignoring Inconsistencies: Moving on to an unrelated question even though a contradiction was detected in the previous dialogue.
 
 
99
 
100
  ### Scoring Summary
101
+ Very Good: Meets all criteria.
102
+
103
+ Good: Fails to meet 1โ€“2 criteria.
104
+
105
+ So-so: Fails to meet approximately 3 criteria.
106
+
107
+ Bad: Meets only 1 criterion.
108
+
109
  Very Bad: Meets none of the criteria.
110
 
111
+ # Caution
112
+ When evaluating the interviewer, focus only on the interviewer's questioning ability. Do not let the quality of the interviewee's answers influence your score.
113
+
114
+ Consider the overall questioning strategy throughout the transcript, not just individual questions in isolation.
115
 
116
  ๐Ÿ“ง minskim010203@gmail.com, imsujeong2190@gmail.com