Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,12 +12,12 @@ sdk: static
|
|
| 12 |
|
| 13 |
๋ ์ด๋ธ๋ง์ ์ฐธ์ฌํด์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค!
|
| 14 |
|
| 15 |
-
์ฌ๋ฌ๋ถ์ ๋์ผํ ๋ฉด์ ๋์์์ ๋ํ **์๋ก ๋ค๋ฅธ ๋ AI
|
| 16 |
|
| 17 |
์ฌ๋ฌ๋ถ์ด ํด์ฃผ์
์ผ ํ ํ์คํฌ๋ ์๋์ ๊ฐ์ด ๋ ๊ฐ์ง์
๋๋ค.
|
| 18 |
|
| 19 |
-
* ์๋ก ๋ค๋ฅธ ๋
|
| 20 |
-
* ๊ฐ
|
| 21 |
|
| 22 |
# ํ๊ฐ ๊ธฐ์ค
|
| 23 |
๋ณธ ์ธํฐ๋ทฐ๋ ์ธํฐ๋ทฐ์ด๊ฐ ๋ณธ์ธ์ ์ ๋ณด, ๊ธฐ์ต, ๊ฒฝํ์ ์ผ๊ด๋๊ฒ ๋ต๋ณํ๊ณ , ๋ ํด๋น ๋ต๋ณ๋ค์ด ์ธ๋ถ ์ธ๊ณ์๋ ๋ชจ์์ด ์๋์ง๋ฅผ ํ์ธํ๊ณ ์ ํ๋ ๊ณผ์ ์
๋๋ค.
|
|
@@ -55,7 +55,7 @@ sdk: static
|
|
| 55 |
* ์ด์ ๋ํ์์ ๋ชจ์์ด ๋ฐ๊ฒฌ๋์์์๋ ์ฐ๊ด์ฑ ์๋ ๋ค๋ฅธ ์ง๋ฌธ์ผ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
| 56 |
|
| 57 |
# ์ฃผ์ ์ฌํญ
|
| 58 |
-
*
|
| 59 |
* ๊ฐ๋ณ ์ง๋ฌธ๋ฟ๋ง ์๋๋ผ ์ ์ฒด์ ์ธ ์ง๋ฌธ ์ ๋ต์ ๊ณ ๋ คํด์ฃผ์ธ์.
|
| 60 |
|
| 61 |
# ์ฐธ๊ณ ์ฌํญ
|
|
@@ -70,12 +70,12 @@ sdk: static
|
|
| 70 |
|
| 71 |
Thank you for participating in this labeling project!
|
| 72 |
|
| 73 |
-
You will review interview transcripts of **two different AI
|
| 74 |
|
| 75 |
There are two main tasks to complete:
|
| 76 |
|
| 77 |
-
* **Comparison:** Determine which of the two
|
| 78 |
-
* **Rating:** Evaluate the quality of each
|
| 79 |
|
| 80 |
# Evaluation Criteria
|
| 81 |
|
|
@@ -85,9 +85,9 @@ Consequently, effective questioning should focus on extracting highly detailed a
|
|
| 85 |
|
| 86 |
### Criteria for Good Questions
|
| 87 |
|
| 88 |
-
* **Depth & Persistence:** Did the
|
| 89 |
-
* If the
|
| 90 |
-
* *Exception:* If the interviewee repeatedly refuses to answer despite paraphrasing, the
|
| 91 |
|
| 92 |
|
| 93 |
* **Verifiability:** Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
|
|
@@ -96,7 +96,7 @@ Consequently, effective questioning should focus on extracting highly detailed a
|
|
| 96 |
|
| 97 |
* **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโs specific experiences and previous answers).
|
| 98 |
* **Cohesion:** Is there a high degree of interconnection between the questions?
|
| 99 |
-
* **Addressing Contradictions:** If a contradiction or point of doubt was found in previous dialogue, did the
|
| 100 |
|
| 101 |
### Criteria for Poor Questions
|
| 102 |
|
|
@@ -121,7 +121,7 @@ Conversely, the following cases indicate poor questioning performance:
|
|
| 121 |
|
| 122 |
# Important Notes
|
| 123 |
|
| 124 |
-
* When evaluating the
|
| 125 |
* Consider the **overall questioning strategy** as a whole, rather than just looking at individual questions in isolation.
|
| 126 |
|
| 127 |
# Reference
|
|
|
|
| 12 |
|
| 13 |
๋ ์ด๋ธ๋ง์ ์ฐธ์ฌํด์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค!
|
| 14 |
|
| 15 |
+
์ฌ๋ฌ๋ถ์ ๋์ผํ ๋ฉด์ ๋์์์ ๋ํ **์๋ก ๋ค๋ฅธ ๋ AI ์ฌ๋ฌธ๊ด A์ B**์ ์ธํฐ๋ทฐ ๋ํ ๊ธฐ๋ก์ ๋ณด๊ณ , AI ์ฌ๋ฌธ๊ด์ ์ง๋ฌธ ๋ฅ๋ ฅ์ ํ๊ฐํ๊ฒ ๋ฉ๋๋ค.
|
| 16 |
|
| 17 |
์ฌ๋ฌ๋ถ์ด ํด์ฃผ์
์ผ ํ ํ์คํฌ๋ ์๋์ ๊ฐ์ด ๋ ๊ฐ์ง์
๋๋ค.
|
| 18 |
|
| 19 |
+
* ์๋ก ๋ค๋ฅธ ๋ ์ฌ๋ฌธ๊ด (A , B) ์ค, ์ด๋ค ์ฌ๋ฌธ๊ด์ด ๋ ๋์ ์ง๋ฌธ์ ํ๋์ง ํ๋จํ์ธ์.
|
| 20 |
+
* ๊ฐ ์ฌ๋ฌธ๊ด์ ์์ง์ 5์ ์ฒ๋๋ก ํ๊ฐํด ์ฃผ์ธ์.
|
| 21 |
|
| 22 |
# ํ๊ฐ ๊ธฐ์ค
|
| 23 |
๋ณธ ์ธํฐ๋ทฐ๋ ์ธํฐ๋ทฐ์ด๊ฐ ๋ณธ์ธ์ ์ ๋ณด, ๊ธฐ์ต, ๊ฒฝํ์ ์ผ๊ด๋๊ฒ ๋ต๋ณํ๊ณ , ๋ ํด๋น ๋ต๋ณ๋ค์ด ์ธ๋ถ ์ธ๊ณ์๋ ๋ชจ์์ด ์๋์ง๋ฅผ ํ์ธํ๊ณ ์ ํ๋ ๊ณผ์ ์
๋๋ค.
|
|
|
|
| 55 |
* ์ด์ ๋ํ์์ ๋ชจ์์ด ๋ฐ๊ฒฌ๋์์์๋ ์ฐ๊ด์ฑ ์๋ ๋ค๋ฅธ ์ง๋ฌธ์ผ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
| 56 |
|
| 57 |
# ์ฃผ์ ์ฌํญ
|
| 58 |
+
* **์ฌ๋ฌธ๊ด์ ํ๊ฐํ ๋, ์ธํฐ๋ทฐ์ด์ ๋ต๋ณ์ ๊ณ ๋ คํ์ง ์๊ณ ์ฌ๋ฌธ๊ด์ ์ง๋ฌธ ๋ฅ๋ ฅ๋ง์ ํ๊ฐํฉ๋๋ค. ๋ต๋ณ์ด ์๋ ์ง๋ฌธ์ ์์๊ณผ ํ๋ฆฌํฐ์ ์ง์คํด์ฃผ์ธ์.**
|
| 59 |
* ๊ฐ๋ณ ์ง๋ฌธ๋ฟ๋ง ์๋๋ผ ์ ์ฒด์ ์ธ ์ง๋ฌธ ์ ๋ต์ ๊ณ ๋ คํด์ฃผ์ธ์.
|
| 60 |
|
| 61 |
# ์ฐธ๊ณ ์ฌํญ
|
|
|
|
| 70 |
|
| 71 |
Thank you for participating in this labeling project!
|
| 72 |
|
| 73 |
+
You will review interview transcripts of **two different AI interrogators (A and B)** interacting with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interrogators.
|
| 74 |
|
| 75 |
There are two main tasks to complete:
|
| 76 |
|
| 77 |
+
* **Comparison:** Determine which of the two interrogators (A or B) asks better questions.
|
| 78 |
+
* **Rating:** Evaluate the quality of each interrogator on a 5-point scale.
|
| 79 |
|
| 80 |
# Evaluation Criteria
|
| 81 |
|
|
|
|
| 85 |
|
| 86 |
### Criteria for Good Questions
|
| 87 |
|
| 88 |
+
* **Depth & Persistence:** Did the interrogator ask follow-up questions until the topic was sufficiently detailed?
|
| 89 |
+
* If the interrogator needs to ask again because they didnโt get a clear answer, they should **paraphrase** the question.
|
| 90 |
+
* *Exception:* If the interviewee repeatedly refuses to answer despite paraphrasing, the interrogator may move to a different topic.
|
| 91 |
|
| 92 |
|
| 93 |
* **Verifiability:** Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
|
|
|
|
| 96 |
|
| 97 |
* **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโs specific experiences and previous answers).
|
| 98 |
* **Cohesion:** Is there a high degree of interconnection between the questions?
|
| 99 |
+
* **Addressing Contradictions:** If a contradiction or point of doubt was found in previous dialogue, did the interrogator focus on questions related to that contradiction?
|
| 100 |
|
| 101 |
### Criteria for Poor Questions
|
| 102 |
|
|
|
|
| 121 |
|
| 122 |
# Important Notes
|
| 123 |
|
| 124 |
+
* When evaluating the interrogator, **do not judge the interviewee's answers.** Focus solely on the pattern and quality of the interrogator's questions.
|
| 125 |
* Consider the **overall questioning strategy** as a whole, rather than just looking at individual questions in isolation.
|
| 126 |
|
| 127 |
# Reference
|