Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ sdk: static
|
|
| 37 |
- ๋ชจ์ ์ฌ๋ถ๋ ์ฌ์ค ๊ด๊ณ๋ฅผ ๊ฒ์ฆํ๊ธฐ ์ด๋ ค์ด ์ถ์์ ์ธ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
|
| 38 |
|
| 39 |
(e.g., "๋์ ์ทจ๋ฏธ๋ ๋ญ์ผ?", "๋์ ์ธ์์์ ๊ฐ์ฅ ์ค์ํ ๊ฐ์น๋ ๋ญ์ผ?")
|
| 40 |
-
- ์ธํฐ๋ทฐ์ด ๋ณธ์ธ์ ์ ๋ณด ๋ฐ ๊ฒฝํ๊ณผ๋ ์ฐ๊ด์ด ๋ฎ๊ณ ์ธ๋ถ ์ง์์ ์ด์ฉํด ๋ต๋ณํด์ผ ํ๋
|
| 41 |
|
| 42 |
(e.g., "๋๋ ๊ตฌ๊ธ์ ๋ค๋
." โ "๊ตฌ๊ธ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?")
|
| 43 |
- ์์ธ) "๋๋ ๊ตฌ๊ธ ์ฐฝ๋ฆฝ์์ผ." โ "๊ตฌ๊ธ์ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?" ์ฒ๋ผ ์ธํฐ๋ทฐ์ด๊ฐ ์ง์ ์ฐธ์ฌํ ์ด๋ฒคํธ/์ฌ๊ฑด/๊ฒฝํ๊ณผ ๋ฐ์ ํ ์ง๋ฌธ์ ํ์ฉํจ. ๋ฐ๋ผ์ ์ด์ ์ง๋ฌธ๊ณผ ๋ต๋ณ๋ค์ ํจ๊ป ๊ณ ๋ คํด์ ํ๊ฐํด์ผ ํจ.
|
|
@@ -59,50 +59,58 @@ Very Good ๊ธฐ์ค์ ๋ชจ๋ ์ถฉ์กฑํ ๊ฒฝ์ฐ Very Good, 1-2๊ฐ ์์ฌ์ด ๋ถ๋ถ
|
|
| 59 |
# Labeling Guideline
|
| 60 |
Thank you for participating in this labeling project!
|
| 61 |
|
| 62 |
-
In this task, you will review interview transcripts of two different AI interviewers
|
| 63 |
|
| 64 |
There are two main tasks you need to perform:
|
| 65 |
|
| 66 |
-
Compare
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
-
The criteria for evaluating questioning ability are as follows. An interviewer who
|
| 72 |
|
| 73 |
-
###
|
| 74 |
-
- Depth
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
|
|
|
|
| 77 |
|
| 78 |
-
-
|
| 79 |
|
| 80 |
-
-
|
| 81 |
|
| 82 |
-
-
|
| 83 |
|
| 84 |
-
|
| 85 |
-
Conversely, the following are examples of poor questioning:
|
| 86 |
|
| 87 |
-
-
|
| 88 |
|
| 89 |
-
-
|
| 90 |
|
| 91 |
-
- Low
|
| 92 |
|
| 93 |
-
-
|
| 94 |
-
|
| 95 |
-
- Ignoring Discrepancies: Moving on to unrelated questions even after a contradiction was detected in the previous conversation.
|
| 96 |
|
| 97 |
### Scoring Summary
|
| 98 |
-
Very Good: Meets all
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
Very Bad: Meets none of the criteria.
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
|
|
|
| 107 |
|
| 108 |
๐ง minskim010203@gmail.com, imsujeong2190@gmail.com
|
|
|
|
| 37 |
- ๋ชจ์ ์ฌ๋ถ๋ ์ฌ์ค ๊ด๊ณ๋ฅผ ๊ฒ์ฆํ๊ธฐ ์ด๋ ค์ด ์ถ์์ ์ธ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
|
| 38 |
|
| 39 |
(e.g., "๋์ ์ทจ๋ฏธ๋ ๋ญ์ผ?", "๋์ ์ธ์์์ ๊ฐ์ฅ ์ค์ํ ๊ฐ์น๋ ๋ญ์ผ?")
|
| 40 |
+
- ์ธํฐ๋ทฐ์ด ๋ณธ์ธ์ ์ ๋ณด ๋ฐ ๊ฒฝํ๊ณผ๋ ์ฐ๊ด์ด ๋ฎ๊ณ ์ธ๋ถ ์ง์์ ์ด์ฉํด ๋ต๋ณํด์ผ ํ๋ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
|
| 41 |
|
| 42 |
(e.g., "๋๋ ๊ตฌ๊ธ์ ๋ค๋
." โ "๊ตฌ๊ธ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?")
|
| 43 |
- ์์ธ) "๋๋ ๊ตฌ๊ธ ์ฐฝ๋ฆฝ์์ผ." โ "๊ตฌ๊ธ์ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?" ์ฒ๋ผ ์ธํฐ๋ทฐ์ด๊ฐ ์ง์ ์ฐธ์ฌํ ์ด๋ฒคํธ/์ฌ๊ฑด/๊ฒฝํ๊ณผ ๋ฐ์ ํ ์ง๋ฌธ์ ํ์ฉํจ. ๋ฐ๋ผ์ ์ด์ ์ง๋ฌธ๊ณผ ๋ต๋ณ๋ค์ ํจ๊ป ๊ณ ๋ คํด์ ํ๊ฐํด์ผ ํจ.
|
|
|
|
| 59 |
# Labeling Guideline
|
| 60 |
Thank you for participating in this labeling project!
|
| 61 |
|
| 62 |
+
In this task, you will review interview transcripts of two different AI interviewers (A and B) interacting with the same interviewee. Your goal is to evaluate the questioning capabilities of these AI interviewers.
|
| 63 |
|
| 64 |
There are two main tasks you need to perform:
|
| 65 |
|
| 66 |
+
Compare: Determine which of the two interviewers (A or B) asks better questions.
|
| 67 |
|
| 68 |
+
Rate: Evaluate the quality of each interviewer on a 5-point scale.
|
| 69 |
|
| 70 |
+
## Evaluation Guideline
|
| 71 |
+
The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five of the following items is considered to have performed excellently.
|
| 72 |
|
| 73 |
+
### "Very Good" Criteria
|
| 74 |
+
- Depth: Did the interviewer continue questioning a single topic until the response became sufficiently specific?
|
| 75 |
+
- Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., information that can be checked for contradictions or verified through external search).
|
| 76 |
+
- Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, names of supervisors/colleagues, etc.
|
| 77 |
+
- Personalization: Are the questions tailored specifically to the interviewee? (i.e., highly relevant to the interviewee's specific experiences and previous answers).
|
| 78 |
+
- Coherence: Is there a high degree of interconnection between the questions?
|
| 79 |
+
- Contradiction Handling: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with questions related to those contradictions?
|
| 80 |
|
| 81 |
+
### "Very Bad" Cases
|
| 82 |
+
Conversely, the following cases indicate poor questioning:
|
| 83 |
|
| 84 |
+
- Premature Topic Switching: Moving to a completely different topic before a current topic has been sufficiently detailed.
|
| 85 |
|
| 86 |
+
- Abstract Questions: Asking abstract questions that make it difficult to verify facts or detect contradictions.
|
| 87 |
|
| 88 |
+
- Examples: "What is your hobby?", "What is the most important value in your life?"
|
| 89 |
|
| 90 |
+
- External Knowledge Dependency: Asking questions that rely on general external knowledge rather than the interviewee's own information and experiences.
|
|
|
|
| 91 |
|
| 92 |
+
- Example: Interviewee says "I work at Google" โ Interviewer asks "When was Google founded?"
|
| 93 |
|
| 94 |
+
- Exception: If the question is closely linked to an event/experience the interviewee was directly involved in, it is acceptable. (e.g., Interviewee says "I am the founder of Google" โ "When was Google founded?"). Evaluation must consider the context of previous questions and answers.
|
| 95 |
|
| 96 |
+
- Low Relatedness: Questions that are so unrelated that it is impossible to judge mutual contradictions.
|
| 97 |
|
| 98 |
+
- Ignoring Inconsistencies: Moving on to an unrelated question even though a contradiction was detected in the previous dialogue.
|
|
|
|
|
|
|
| 99 |
|
| 100 |
### Scoring Summary
|
| 101 |
+
Very Good: Meets all criteria.
|
| 102 |
+
|
| 103 |
+
Good: Fails to meet 1โ2 criteria.
|
| 104 |
+
|
| 105 |
+
So-so: Fails to meet approximately 3 criteria.
|
| 106 |
+
|
| 107 |
+
Bad: Meets only 1 criterion.
|
| 108 |
+
|
| 109 |
Very Bad: Meets none of the criteria.
|
| 110 |
|
| 111 |
+
# Caution
|
| 112 |
+
When evaluating the interviewer, focus only on the interviewer's questioning ability. Do not let the quality of the interviewee's answers influence your score.
|
| 113 |
+
|
| 114 |
+
Consider the overall questioning strategy throughout the transcript, not just individual questions in isolation.
|
| 115 |
|
| 116 |
๐ง minskim010203@gmail.com, imsujeong2190@gmail.com
|