README / README.md
tnwjddla2190's picture
Update README.md
dd80eb0 verified
|
raw
history blame
6.53 kB
metadata
title: README
emoji: ๐Ÿ“Š
colorFrom: purple
colorTo: indigo
โ€ฆ: static
pinned: false
sdk: static

Labeling Guideline

๋ ˆ์ด๋ธ”๋ง์— ์ฐธ์—ฌํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

์—ฌ๋Ÿฌ๋ถ„์€ ๋™์ผํ•œ ๋ฉด์ ‘ ๋Œ€์ƒ์ž์— ๋Œ€ํ•œ **์„œ๋กœ ๋‹ค๋ฅธ ๋‘ AI ๋ฉด์ ‘๊ด€ (A , B)**์˜ ์ธํ„ฐ๋ทฐ ๋Œ€ํ™” ๊ธฐ๋ก์„ ๋ณด๊ณ , AI ๋ฉด์ ‘๊ด€์˜ ์งˆ๋ฌธ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ๋ถ„์ด ํ•ด์ฃผ์…”์•ผ ํ•  ํƒœ์Šคํฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๋‘ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

  • ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ ๋ฉด์ ‘๊ด€ (A , B) ์ค‘, ์–ด๋–ค ๋ฉด์ ‘๊ด€์ด ๋” ๋‚˜์€ ์งˆ๋ฌธ์„ ํ•˜๋Š”์ง€ ํŒ๋‹จํ•˜์„ธ์š”.
  • ๊ฐ ๋ฉด์ ‘๊ด€์˜ ์ž์งˆ์„ 5์  ์ฒ™๋„๋กœ ํ‰๊ฐ€ํ•ด ์ฃผ์„ธ์š”.

Evaluation Guideline

์งˆ๋ฌธ ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๊ธฐ์ค€์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๋‹ค์„ฏ๊ฐ€์ง€ ํ•ญ๋ชฉ์— ๋Œ€ํ•ด ๋ชจ๋‘ ์ถฉ์กฑํ•  ๊ฒฝ์šฐ ๊ฐ€์žฅ ์งˆ๋ฌธ์„ ์ž˜ํ•œ ๋ฉด์ ‘๊ด€์ž…๋‹ˆ๋‹ค.

Very Good ๊ธฐ์ค€

  • ํ•œ ์ฃผ์ œ์— ๋Œ€ํ•ด ๋‹ต๋ณ€์ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ์งˆ๋ฌธํ–ˆ๋Š”๊ฐ€

  • ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์ •๋ณด๋“ค์„ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ์งˆ๋ฌธ ์œ„์ฃผ๋กœ ํ–ˆ๋Š”๊ฐ€ (= ๋ชจ์ˆœ์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜, ์™ธ๋ถ€ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์„ ๋งŒํ•œ ์งˆ๋ฌธ์ธ๊ฐ€)

    (e.g., "๋‚ ์งœ, ์ฃผ์†Œ, ์†Œ์† ID, ๊ธฐ๊ด€ ์ด๋ฆ„, ์ด๋ฉ”์ผ, ๋‹ค๋‹ˆ๋Š” ํšŒ์‚ฌ ์ƒ์‚ฌ ๋“ฑ ๊ด€๊ณ„์ž ์ด๋ฆ„" ๊ด€๋ จ ์งˆ๋ฌธ๋“ค)

  • ์งˆ๋ฌธ์ด ์ธํ„ฐ๋ทฐ์ด์— ํŠนํ™”๋œ ์งˆ๋ฌธ์ธ๊ฐ€ (์ฆ‰ ์ธํ„ฐ๋ทฐ์ด์˜ ๊ตฌ์ฒด์ ์ธ ๊ฒฝํ—˜, ๋‹ต๋ณ€๊ณผ ์—ฐ๊ด€์„ฑ์ด ๋†’์€ ์งˆ๋ฌธ์ธ๊ฐ€)

  • ์งˆ๋ฌธ๋“ค ๊ฐ„์˜ ์ƒํ˜ธ ์—ฐ๊ด€์„ฑ์ด ๋†’์€๊ฐ€

  • ์ด์ „ ๋Œ€ํ™”์—์„œ ๋ชจ์ˆœ์ด๋‚˜ ์˜๋ฌธ์ ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ์„ ๊ฒฝ์šฐ, ๋ฐœ์ƒํ•œ ๋ชจ์ˆœ๊ณผ ๊ด€๋ จ๋œ ์งˆ๋ฌธ์„ ๋งŽ์ด ํ–ˆ๋Š”๊ฐ€

Very Bad ์ผ€์ด์Šค

๋ฐ˜๋Œ€๋กœ ์งˆ๋ฌธ์„ ๋ชปํ•œ ์ผ€์ด์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ํ•˜๋‚˜์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋˜์ง€ ์•Š์•˜๋Š”๋ฐ ๋ฐ”๋กœ ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ฃผ์ œ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ

  • ๋ชจ์ˆœ ์—ฌ๋ถ€๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ค์šด ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ

    (e.g., "๋„ˆ์˜ ์ทจ๋ฏธ๋Š” ๋ญ์•ผ?", "๋„ˆ์˜ ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋Š” ๋ญ์•ผ?")

  • ์ธํ„ฐ๋ทฐ์ด ๋ณธ์ธ์˜ ์ •๋ณด ๋ฐ ๊ฒฝํ—˜๊ณผ๋Š” ์—ฐ๊ด€์ด ๋‚ฎ๊ณ  ์™ธ๋ถ€ ์ง€์‹์„ ์ด์šฉํ•ด ๋‹ต๋ณ€ํ•ด์•ผ ํ•˜๋Š” ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ

    (e.g., "๋‚˜๋Š” ๊ตฌ๊ธ€์— ๋‹ค๋…€." โ†’ "๊ตฌ๊ธ€ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?")

    • ์˜ˆ์™ธ) "๋‚˜๋Š” ๊ตฌ๊ธ€ ์ฐฝ๋ฆฝ์ž์•ผ." โ†’ "๊ตฌ๊ธ€์˜ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?" ์ฒ˜๋Ÿผ ์ธํ„ฐ๋ทฐ์ด๊ฐ€ ์ง์ ‘ ์ฐธ์—ฌํ•œ ์ด๋ฒคํŠธ/์‚ฌ๊ฑด/๊ฒฝํ—˜๊ณผ ๋ฐ€์ ‘ํ•œ ์งˆ๋ฌธ์€ ํ—ˆ์šฉํ•จ. ๋”ฐ๋ผ์„œ ์ด์ „ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋“ค์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์„œ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ.
  • ์งˆ๋ฌธ๋“ค ์‚ฌ์ด์˜ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์•„ ์ƒํ˜ธ ๋ชจ์ˆœ์„ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ

  • ์ด์ „ ๋Œ€ํ™”์—์„œ ๋ชจ์ˆœ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ์Œ์—๋„ ์—ฐ๊ด€์„ฑ ์—†๋Š” ๋‹ค๋ฅธ ์งˆ๋ฌธ์œผ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ

Scoring Summary

Very Good ๊ธฐ์ค€์„ ๋ชจ๋‘ ์ถฉ์กฑํ•  ๊ฒฝ์šฐ Very Good, 1-2๊ฐœ ์•„์‰ฌ์šด ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด Good, 3๊ฐœ ์ •๋„ ์ถฉ์กฑํ•˜์ง€ ๋ชปํ•˜๋ฉด So-so, 1๊ฐœ๋งŒ ์ถฉ์กฑํ•œ๋‹ค๋ฉด Bad ์ž…๋‹ˆ๋‹ค. ๋ชจ๋‘ ๋‹ค ์ถฉ์กฑํ•˜์ง€ ๋ชปํ•˜๋ฉด Very Bad๋ฅผ ์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Causion

  • ๋ฉด์ ‘๊ด€์„ ํ‰๊ฐ€ํ•  ๋•Œ, ์ธํ„ฐ๋ทฐ์ด์˜ ๋‹ต๋ณ€์€ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ๋ฉด์ ‘๊ด€์˜ ์งˆ๋ฌธ ๋Šฅ๋ ฅ๋งŒ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋‹ต๋ณ€์ด ์•„๋‹Œ ์งˆ๋ฌธ์— ์ง‘์ค‘ํ•ด์ฃผ์„ธ์š”.
  • ๊ฐœ๋ณ„ ์งˆ๋ฌธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ „์ฒด์ ์ธ ์งˆ๋ฌธ ์ „๋žต์„ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค.

๐Ÿ“ง minskim010203@gmail.com, imsujeong2190@gmail.com


Labeling Guideline

Thank you for participating in this labeling project!

In this task, you will review interview transcripts of two different AI interviewers (A and B) interacting with the same interviewee. Your goal is to evaluate the questioning capabilities of these AI interviewers.

There are two main tasks you need to perform:

Compare: Determine which of the two interviewers (A or B) asks better questions.

Rate: Evaluate the quality of each interviewer on a 5-point scale.

Evaluation Guideline

The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five of the following items is considered to have performed excellently.

"Very Good" Criteria

  • Depth: Did the interviewer continue questioning a single topic until the response became sufficiently specific?
  • Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., information that can be checked for contradictions or verified through external search).
  • Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, names of supervisors/colleagues, etc.
  • Personalization: Are the questions tailored specifically to the interviewee? (i.e., highly relevant to the interviewee's specific experiences and previous answers).
  • Coherence: Is there a high degree of interconnection between the questions?
  • Contradiction Handling: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with questions related to those contradictions?

"Very Bad" Cases

Conversely, the following cases indicate poor questioning:

  • Premature Topic Switching: Moving to a completely different topic before a current topic has been sufficiently detailed.

  • Abstract Questions: Asking abstract questions that make it difficult to verify facts or detect contradictions.

    • Examples: "What is your hobby?", "What is the most important value in your life?"
  • External Knowledge Dependency: Asking questions that rely on general external knowledge rather than the interviewee's own information and experiences.

    • Example: Interviewee says "I work at Google" โ†’ Interviewer asks "When was Google founded?"

    • Exception: If the question is closely linked to an event/experience the interviewee was directly involved in, it is acceptable. (e.g., Interviewee says "I am the founder of Google" โ†’ "When was Google founded?"). Evaluation must consider the context of previous questions and answers.

  • Low Relatedness: Questions that are so unrelated that it is impossible to judge mutual contradictions.

  • Ignoring Inconsistencies: Moving on to an unrelated question even though a contradiction was detected in the previous dialogue.

Scoring Summary

Very Good: Meets all criteria.

Good: Fails to meet 1โ€“2 criteria.

So-so: Fails to meet approximately 3 criteria.

Bad: Meets only 1 criterion.

Very Bad: Meets none of the criteria.

Caution

When evaluating the interviewer, focus only on the interviewer's questioning ability. Do not let the quality of the interviewee's answers influence your score.

Consider the overall questioning strategy throughout the transcript, not just individual questions in isolation.

๐Ÿ“ง minskim010203@gmail.com, imsujeong2190@gmail.com