Spaces:
Running
Running
File size: 9,120 Bytes
c8dc8ed 2a54ac9 c8dc8ed c39cde6 c8dc8ed 686061d 2a54ac9 686061d 2a54ac9 be372a2 92e4fdf 686061d 2a54ac9 397640a 2a54ac9 df6357d c1e0969 77952cc 8345b67 a87907d df6357d 259655e 2c8bcb7 3aa6328 df6357d 686061d 3aa6328 df6357d 74c89ca df6357d 686061d df6357d 259655e df6357d 686061d df6357d 3c799fc ed6159c df6357d ba2c636 df6357d 686061d df6357d 397640a 88bafe2 df6357d 88bafe2 df6357d fb1d9bd 8a49292 fd6daba 33cd4a2 2a54ac9 686061d df6357d 686061d 397640a df6357d 397640a df6357d 686061d 77952cc 686061d df6357d 686061d 397640a 686061d 259655e 686061d 259655e 397640a 686061d 259655e 2a54ac9 259655e 2a54ac9 259655e 2a54ac9 a87907d 259655e 74c89ca df6357d ba2c636 259655e 74c89ca df6357d c39cde6 397640a 259655e 74c89ca 259655e dd80eb0 93603ec fb1d9bd 529e637 fd6daba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
title: README
emoji: ๐
colorFrom: purple
colorTo: indigo
โฆ: static
pinned: false
sdk: static
---
# Labeling Guideline
๋ ์ด๋ธ๋ง์ ์ฐธ์ฌํด์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค!
์ฌ๋ฌ๋ถ์ ๋์ผํ ์กฐ์ฌ ๋์์์ ๋ํ **์๋ก ๋ค๋ฅธ ๋ AI ์ฌ๋ฌธ๊ด A์ B**์ ์ธํฐ๋ทฐ ๋ํ ๊ธฐ๋ก์ ์ผ๋ถ๋ฅผ ๋ณด๊ณ , AI ์ฌ๋ฌธ๊ด์ ์ง๋ฌธ ๋ฅ๋ ฅ์ ํ๊ฐํ๊ฒ ๋ฉ๋๋ค.
์ฌ๋ฌ๋ถ์ด ํด์ฃผ์
์ผ ํ ํ์คํฌ๋ ์๋์ ๊ฐ์ด ๋ ๊ฐ์ง์
๋๋ค.
* ์๋ก ๋ค๋ฅธ ๋ ์ฌ๋ฌธ๊ด (A , B) ์ค, ์ด๋ค ์ฌ๋ฌธ๊ด์ด ๋ ๋์ ์ง๋ฌธ์ ํ๋์ง ํ๋จํ์ธ์.
* ๊ฐ ์ฌ๋ฌธ๊ด์ ์์ง์ 5์ ์ฒ๋๋ก ํ๊ฐํด ์ฃผ์ธ์.
# ํ๊ฐ ๊ธฐ์ค
๋ณธ ์ธํฐ๋ทฐ๋ ์ธํฐ๋ทฐ์ด๊ฐ ๋ณธ์ธ์ ์ ๋ณด, ๊ธฐ์ต, ๊ฒฝํ์ ์ผ๊ด๋๊ฒ ๋ต๋ณํ๊ณ , ๋ ํด๋น ๋ต๋ณ๋ค์ด ์ธ๋ถ ์ธ๊ณ์๋ ๋ชจ์์ด ์๋์ง๋ฅผ ํ์ธํ๊ณ ์ ํ๋ ๊ณผ์ ์
๋๋ค.
๋ฐ๋ผ์ ๋ณธ ์ธํฐ๋ทฐ์์ ์ข์ ์ง๋ฌธ์ด๋ '์ธํฐ๋ทฐ์ด์ ๊ด๋ จ๋ ์ต๋ํ ๊ตฌ์ฒด์ ์ด๊ณ ๊ฒ์ฆ ๊ฐ๋ฅํ ๋ต๋ณ๋ค์ ์ป์ ์ ์๋ ์ง๋ฌธ'์
๋๋ค.
์ข ๋ ๊ตฌ์ฒดํ๋ฅผ ํ๋ฉด ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
### ์ข์ ์ง๋ฌธ์ ๊ธฐ์ค
* ํ ์ฃผ์ ์ ๋ํด ๋ต๋ณ์ด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋ ๋๊น์ง ์ง๋ฌธํ๋๊ฐ
* ๋ง์ฝ ์ง๋ฌธ์ ๋ํ ๋ต๋ณ์ ์ป์ง ๋ชปํด์ ์ฌ์ง๋ฌธํ๊ณ ์ ํ ๊ฒฝ์ฐ, ์ง๋ฌธ์ ๋ค๋ฅธ ํํ์ผ๋ก ๋ฐ๊พธ์ด์(paraphraseํด์) ๋ฌผ์ด๋ด์ผ ํฉ๋๋ค.
* ๋จ, ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ์ธํฐ๋ทฐ์ด๊ฐ ๊ด๋ จ ์ง๋ฌธ์ ๋ํ ๋ต๋ณ์ ๊ณ์ ๊ฑฐ๋ถํ ๊ฒฝ์ฐ ๋ค๋ฅธ ์ฃผ์ ๋ก ๋์ด๊ฐ ์ ์์ต๋๋ค.
* ๊ฒ์ฆ ๊ฐ๋ฅํ ์ ๋ณด๋ค์ ๋ฝ์๋ผ ์ ์๋ ์ง๋ฌธ ์์ฃผ๋ก ํ๋๊ฐ (= ๋ชจ์์ ํ๋จํ ์ ์๊ฑฐ๋, ์ธ๋ถ ๊ฒ์์ ํตํด ๊ฒ์ฆํ ์ ์์ ๋งํ ์ง๋ฌธ์ธ๊ฐ)
(e.g., "๋ ์ง, ์ฃผ์, ์์ ID, ๊ธฐ๊ด ์ด๋ฆ, ์ด๋ฉ์ผ, ๋ค๋๋ ํ์ฌ ์์ฌ ๋ฑ ๊ด๊ณ์ ์ด๋ฆ" ๊ด๋ จ๋ ์ง๋ฌธ๋ค)
* ์ง๋ฌธ์ด ์ธํฐ๋ทฐ์ด์ ํนํ๋ ์ง๋ฌธ์ธ๊ฐ (์ฆ ์ธํฐ๋ทฐ์ด์ ๊ตฌ์ฒด์ ์ธ ๊ฒฝํ, ๋ต๋ณ๊ณผ ์ฐ๊ด์ฑ์ด ๋์ ์ง๋ฌธ์ธ๊ฐ)
* ์ง๋ฌธ๋ค ๊ฐ์ ์ํธ ์ฐ๊ด์ฑ์ด ๋์๊ฐ
* ์ด์ ๋ํ์์ ๋ชจ์์ด๋ ์๋ฌธ์ ์ด ๋ฐ๊ฒฌ๋์์ ๊ฒฝ์ฐ, ๋ฐ์ํ ๋ชจ์๊ณผ ๊ด๋ จ๋ ์ง๋ฌธ์ ๋ง์ด ํ๋๊ฐ
### ์ง๋ฌธ์ ์ํ์ง ๋ชปํ ๊ฒฝ์ฐ
๋ฐ๋๋ก ์ง๋ฌธ์ ๋ชปํ ์ผ์ด์ค๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
* ํ๋์ ์ฃผ์ ์ ๋ํด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋์ง ์์๋๋ฐ ๋ฐ๋ก ์์ ํ ๋ค๋ฅธ ์ฃผ์ ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
* ๋์ผํ ์ง๋ฌธ์ ๋ค๋ฅธ ํํ์ผ๋ก ๋ฐ๊พธ์ง ์๊ณ (paraphrase ํ์ง ์๊ณ ) ๊ทธ๋๋ก ๋ฐ๋ณตํ ๊ฒฝ์ฐ
* ๋ชจ์ ์ฌ๋ถ๋ ์ฌ์ค ๊ด๊ณ๋ฅผ ๊ฒ์ฆํ๊ธฐ ์ด๋ ค์ด ์ถ์์ ์ธ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
(e.g., "๋์ ์ทจ๋ฏธ๋ ๋ญ์ผ?", "๋์ ์ธ์์์ ๊ฐ์ฅ ์ค์ํ ๊ฐ์น๋ ๋ญ์ผ?")
* ์ธํฐ๋ทฐ์ด ๋ณธ์ธ์ ์ ๋ณด ๋ฐ ๊ฒฝํ๊ณผ๋ ์ฐ๊ด์ด ๋ฎ๊ณ ์ธ๋ถ ์ง์์ ์ด์ฉํด ๋ต๋ณํด์ผ ํ๋ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
(e.g., "๋๋ ๊ตฌ๊ธ์ ๋ค๋
." โ "๊ตฌ๊ธ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?")
* ์์ธ) "๋๋ ๊ตฌ๊ธ ์ฐฝ๋ฆฝ์์ผ." โ "๊ตฌ๊ธ์ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?" ์ฒ๋ผ ์ธํฐ๋ทฐ์ด๊ฐ ์ง์ ์ฐธ์ฌํ ์ด๋ฒคํธ/์ฌ๊ฑด/๊ฒฝํ๊ณผ ๋ฐ์ ํ ์ง๋ฌธ์ ํ์ฉํจ. ๋ฐ๋ผ์ ์ด์ ์ง๋ฌธ๊ณผ ๋ต๋ณ๋ค์ ํจ๊ป ๊ณ ๋ คํด์ ํ๊ฐํด์ผ ํจ.
* ์ธ๋ถ ์ง์์ ๊ฐ์ ธ์ ํ์ธํ๋ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
* ์ง๋ฌธ ํ์ : "Would you confirm that..." (e.g., "Would you confirm that the 'KAIST' you metioned is the research-oriented science and engineering university in South Korea?")
* ์ธ๋ถ ์ง์์ ํ์ธํ๋ ๊ณผ์ ์ ๋ฐ๋ก ์กด์ฌํ๋ฏ๋ก, ๋ฉ์ธ ์ง๋ฌธ ๊ณผ์ ์์๋ ์ธํฐ๋ทฐ์ด์ ๊ด๋ จ๋ ์ง๋ฌธ๋ง์ ํด์ผ ํฉ๋๋ค.
* ์ง๋ฌธ๋ค ์ฌ์ด์ ๊ด๋ จ์ฑ์ด ๋ฎ์ ์ํธ ๋ชจ์์ ํ๋จํ๊ธฐ ์ด๋ ค์ด ๊ฒฝ์ฐ
* ์ด์ ๋ํ์์ ๋ชจ์์ด ๋ฐ๊ฒฌ๋์์์๋ ์ฐ๊ด์ฑ ์๋ ๋ค๋ฅธ ์ง๋ฌธ์ผ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
# ์ฃผ์ ์ฌํญ
* **์ฌ๋ฌธ๊ด์ ํ๊ฐํ ๋, ์ธํฐ๋ทฐ์ด์ ๋ต๋ณ์ ๊ณ ๋ คํ์ง ์๊ณ ์ฌ๋ฌธ๊ด์ ์ง๋ฌธ ๋ฅ๋ ฅ๋ง์ ํ๊ฐํฉ๋๋ค. ๋ต๋ณ์ด ์๋ ์ง๋ฌธ์ ์์๊ณผ ํ๋ฆฌํฐ์ ์ง์คํด์ฃผ์ธ์.**
* ๊ฐ๋ณ ์ง๋ฌธ๋ฟ๋ง ์๋๋ผ ์ ์ฒด์ ์ธ ์ง๋ฌธ ์ ๋ต์ ๊ณ ๋ คํด์ฃผ์ธ์.
# ์ฐธ๊ณ ์ฌํญ
* Chrome์ ๋ฒ์ญ ๊ธฐ๋ฅ์ ์ฌ์ฉํด์ ํ๊ธ๋ก ๋ฒ์ญ ํ ํ๊ฐํ์
๋ ๋ฉ๋๋ค!
* ๋ ์ด๋ธ๋ง์ ํ๋ค๊ฐ ๊ธฐ์ค์ด ๊ธฐ์ต์ด ์ ๋๊ฑฐ๋ ํท๊ฐ๋ฆฌ์๋ฉด, ํ๋ฉด์ ์ข์ธก ํ๋จ์ ์์นํ **GUIDELINES**๋ฅผ ํด๋ฆญํ์ฌ ๋ด์ฉ์ ํ์ธํ ์ ์์ต๋๋ค.
* ๋ง์ฝ ๋ A,B ์ค ํ๋๋ฅผ ๊ณ ๋ฅด๊ธฐ ์ด๋ ต๋ค๋ฉด 1) ์ฃผ๊ด์ ์ง๋ฌธ์ ์ข ๋ ๋ง์ด ํ ์ฌ๋ฌธ๊ด์ ๋ ์ํ๋ค๊ณ ๊ณจ๋ผ์ฃผ์๊ฑฐ๋ 2) ๋๋ค ์ข๋ค/๋๋ค ํํธ์๋ค ์ค์ ๊ณจ๋ผ์ฃผ์๋ฉด ๋ฉ๋๋ค.
๐ง minskim010203@gmail.com, imsujeong2190@gmail.com
---
# Labeling Guideline
Thank you for participating in this labeling project!
You will review interview transcripts of **two different AI interrogators (A and B)** interacting with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interrogators.
There are two main tasks to complete:
* **Comparison:** Determine which of the two interrogators (A or B) asks better questions.
* **Rating:** Evaluate the quality of each interrogator on a 5-point scale.
# Evaluation Criteria
The purpose of this interview is to ensure the interviewee provides consistent accounts of their data, memories, and experiences, and that these accounts do not contradict external reality.
Consequently, effective questioning should focus on extracting highly detailed and verifiable information from the interviewee.
### Criteria for Good Questions
* **Depth & Persistence:** Did the interrogator ask follow-up questions until the topic was sufficiently detailed?
* If the interrogator needs to ask again because they didnโt get a clear answer, they should **paraphrase** the question.
* *Exception:* If the interviewee repeatedly refuses to answer despite paraphrasing, the interrogator may move to a different topic.
* **Verifiability:** Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
* Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, or names of relevant parties like supervisors.
* **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโs specific experiences and previous answers).
* **Cohesion:** Is there a high degree of interconnection between the questions?
* **Addressing Contradictions:** If a contradiction or point of doubt was found in previous dialogue, did the interrogator focus on questions related to that contradiction?
### Criteria for Poor Questions
Conversely, the following cases indicate poor questioning performance:
* **Premature Topic Shifts:** Moving to a completely different topic before the current subject has been sufficiently detailed.
* **Repetition without Paraphrasing:** Repeating the exact same question without changing the phrasing.
* **Abstract/Unverifiable Questions:** Asking abstract questions where it is difficult to judge contradictions or verify facts.
* Examples: "What are your hobbies?", "What is the most important value in your life?"
* **External Knowledge Over Personal Experience:** Asking questions that require external knowledge rather than the intervieweeโs own information/experience.
* Example: "I work at Google." โ "What year was Google founded?"
* *Exception:* Questions closely related to events/experiences the interviewee directly participated in are allowed. (e.g., "I am the founder of Google." โ "What year was Google founded?") You must evaluate this based on the context of the previous dialogue.
* **Fact-Checking External Knowledge:** Using the main questioning phase to verify external facts rather than focusing on the interviewee.
* Question Format : "Would you confirm that ..." (e.g., "Would you confirm that the 'KAIST' you metioned is the research-oriented science and engineering university in South Korea?")
* There is a separate process for external fact-checking
* **Low Correlation:** Questions that lack relevance to each other, making it difficult to identify mutual contradictions.
* **Ignoring Inconsistencies:** Moving to an unrelated question even though a contradiction was detected in the previous conversation.
# Important Notes
* When evaluating the interrogator, **do not judge the interviewee's answers.** Focus solely on the pattern and quality of the interrogator's questions.
* Consider the **overall questioning strategy** as a whole, rather than just looking at individual questions in isolation.
# Reference
* You may use the Chrome translation tool to view and evaluate the content in your language of choice.
* If you forget the labeling criteria or get confused, you can click the **GUIDELINES** button in the bottom-left corner of the screen to review them.
* If you're having trouble choosing between A and B, lean toward the one asking more subjective questions. If it's still a tie, feel free to select 'Both are equally good' or 'Both are equally poor.'
๐ง minskim010203@gmail.com, imsujeong2190@gmail.com |