Spaces:

human-labeling
/

README

Running

App Files Files Community

README / README.md

tnwjddla2190

Update README.md

dd80eb0 verified 4 months ago

6.53 kB

title: README
emoji: 📊
colorFrom: purple
colorTo: indigo
…: static
pinned: false
sdk: static

Labeling Guideline

레이블링에 참여해주셔서 감사합니다!

여러분은 동일한 면접 대상자에 대한 **서로 다른 두 AI 면접관 (A , B)**의 인터뷰 대화 기록을 보고, AI 면접관의 질문 능력을 평가하게 됩니다.

여러분이 해주셔야 할 태스크는 아래와 같이 두 가지입니다.

서로 다른 두 면접관 (A , B) 중, 어떤 면접관이 더 나은 질문을 하는지 판단하세요.
각 면접관의 자질을 5점 척도로 평가해 주세요.

Evaluation Guideline

질문 능력 평가 기준은 다음과 같습니다. 아래 다섯가지 항목에 대해 모두 충족할 경우 가장 질문을 잘한 면접관입니다.

Very Good 기준

한 주제에 대해 답변이 충분히 구체화가 될 때까지 질문했는가
검증 가능한 정보들을 뽑아낼 수 있는 질문 위주로 했는가 (= 모순을 판단할 수 있거나, 외부 검색을 통해 검증할 수 있을 만한 질문인가)

(e.g., "날짜, 주소, 소속 ID, 기관 이름, 이메일, 다니는 회사 상사 등 관계자 이름" 관련 질문들)
질문이 인터뷰이에 특화된 질문인가 (즉 인터뷰이의 구체적인 경험, 답변과 연관성이 높은 질문인가)
질문들 간의 상호 연관성이 높은가
이전 대화에서 모순이나 의문점이 발견되었을 경우, 발생한 모순과 관련된 질문을 많이 했는가

Very Bad 케이스

반대로 질문을 못한 케이스는 다음과 같습니다.

하나의 주제에 대해 충분히 구체화가 되지 않았는데 바로 완전히 다른 주제로 넘어가버린 경우
모순 여부나 사실 관계를 검증하기 어려운 추상적인 질문을 한 경우

(e.g., "너의 취미는 뭐야?", "너의 인생에서 가장 중요한 가치는 뭐야?")
인터뷰이 본인의 정보 및 경험과는 연관이 낮고 외부 지식을 이용해 답변해야 하는 질문을 한 경우

(e.g., "나는 구글에 다녀." → "구글 설립 연도는 언제야?")
- 예외) "나는 구글 창립자야." → "구글의 설립 연도는 언제야?" 처럼 인터뷰이가 직접 참여한 이벤트/사건/경험과 밀접한 질문은 허용함. 따라서 이전 질문과 답변들을 함께 고려해서 평가해야 함.
질문들 사이의 관련성이 낮아 상호 모순을 판단하기 어려운 경우
이전 대화에서 모순이 발견되었음에도 연관성 없는 다른 질문으로 넘어가버린 경우

Scoring Summary

Very Good 기준을 모두 충족할 경우 Very Good, 1-2개 아쉬운 부분이 있다면 Good, 3개 정도 충족하지 못하면 So-so, 1개만 충족한다면 Bad 입니다. 모두 다 충족하지 못하면 Very Bad를 주시면 됩니다.

Causion

면접관을 평가할 때, 인터뷰이의 답변은 고려하지 않고 면접관의 질문 능력만을 평가합니다. 답변이 아닌 질문에 집중해주세요.
개별 질문뿐만 아니라 전체적인 질문 전략을 고려하십시오.

📧 minskim010203@gmail.com, imsujeong2190@gmail.com

Labeling Guideline

Thank you for participating in this labeling project!

In this task, you will review interview transcripts of two different AI interviewers (A and B) interacting with the same interviewee. Your goal is to evaluate the questioning capabilities of these AI interviewers.

There are two main tasks you need to perform:

Compare: Determine which of the two interviewers (A or B) asks better questions.

Rate: Evaluate the quality of each interviewer on a 5-point scale.

Evaluation Guideline

The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five of the following items is considered to have performed excellently.

"Very Good" Criteria

Depth: Did the interviewer continue questioning a single topic until the response became sufficiently specific?
Verifiability: Did the interviewer focus on questions that extract verifiable information? (i.e., information that can be checked for contradictions or verified through external search).
Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, names of supervisors/colleagues, etc.
Personalization: Are the questions tailored specifically to the interviewee? (i.e., highly relevant to the interviewee's specific experiences and previous answers).
Coherence: Is there a high degree of interconnection between the questions?
Contradiction Handling: If contradictions or questionable points arose in previous dialogue, did the interviewer follow up with questions related to those contradictions?

"Very Bad" Cases

Conversely, the following cases indicate poor questioning:

Premature Topic Switching: Moving to a completely different topic before a current topic has been sufficiently detailed.
Abstract Questions: Asking abstract questions that make it difficult to verify facts or detect contradictions.
- Examples: "What is your hobby?", "What is the most important value in your life?"
External Knowledge Dependency: Asking questions that rely on general external knowledge rather than the interviewee's own information and experiences.
- Example: Interviewee says "I work at Google" → Interviewer asks "When was Google founded?"
- Exception: If the question is closely linked to an event/experience the interviewee was directly involved in, it is acceptable. (e.g., Interviewee says "I am the founder of Google" → "When was Google founded?"). Evaluation must consider the context of previous questions and answers.
Low Relatedness: Questions that are so unrelated that it is impossible to judge mutual contradictions.
Ignoring Inconsistencies: Moving on to an unrelated question even though a contradiction was detected in the previous dialogue.

Scoring Summary

Very Good: Meets all criteria.

Good: Fails to meet 1–2 criteria.

So-so: Fails to meet approximately 3 criteria.

Bad: Meets only 1 criterion.

Very Bad: Meets none of the criteria.

Caution

When evaluating the interviewer, focus only on the interviewer's questioning ability. Do not let the quality of the interviewee's answers influence your score.

Consider the overall questioning strategy throughout the transcript, not just individual questions in isolation.

📧 minskim010203@gmail.com, imsujeong2190@gmail.com