Spaces:
Running on Zero
Running on Zero
File size: 2,292 Bytes
e1c0b77 67da08d e1c0b77 5e6ff7d e1c0b77 c8e8b73 e1c0b77 c8e8b73 e1c0b77 c8e8b73 e1c0b77 c8e8b73 5e6ff7d e1c0b77 83ec3f5 e1c0b77 490c5f1 67da08d e1c0b77 b5f4a07 8f2e039 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | """
core/evaluator.py β Evaluate student answers and provide structured feedback.
Responsibility:
Compare the student's answer against both the original question and the
source chunk, then return actionable feedback that includes:
- Whether the answer is correct / partially correct / incorrect.
- What the student got right.
- What is missing or imprecise.
- A brief model answer for reference.
The LLM acts as a tutor, not just a judge, so feedback is constructive
and encourages deeper understanding rather than simply flagging errors.
Public API:
evaluate_answer(question: str, chunk: str, student_answer: str) -> str
"""
from model.llm import get_llm
from core.lang import ensure_english
_PROMPT_EN = """\
You are a patient and constructive university tutor.
IMPORTANT: Write your ENTIRE response in English β even if the source material is in another language. Translate everything; do NOT use the source language.
Source material:
{chunk}
Question asked to the student:
{question}
Student's answer:
{answer}
Evaluate using this EXACT 4-section structure β all sections are REQUIRED:
1. Verdict: Correct / Partially correct / Incorrect
2. What was good: Even if the answer is wrong or empty, find something positive to say (e.g., "You attempted the question" or identify any partially correct element). This section is MANDATORY β never skip it.
3. What was missing or imprecise: describe what the student got wrong or omitted.
4. Model answer: Write a concise 2-4 sentence answer IN YOUR OWN WORDS in English. Do NOT copy or quote the source text directly β synthesize it.
Be encouraging and specific. Write in English only β do not use the source language."""
def evaluate_answer(question: str, chunk: str, student_answer: str, language: str = "English") -> str:
"""Return structured feedback for *student_answer* given *question* and *chunk*."""
llm = get_llm()
prompt = _PROMPT_EN.format(
chunk=ensure_english(chunk.strip()),
question=question.strip(),
answer=student_answer.strip(),
)
# 4-section feedback fits comfortably in 320 tokens β keeps CPU
# (llama.cpp) latency inside the UI timeout.
return llm.generate(prompt, max_new_tokens=320, temperature=0.4).strip()
|