EVALUATE_PROMPT = """You are a strict evaluator grading an agent's answer against a source passage. Question: {question} Agent's answer: {answer} Source passage (ground truth): {source} Grade the answer on a scale of 0.0 to 1.0: - 1.0 = complete and accurate, fully supported by the source - 0.5 = partially correct, missing key details or slightly inaccurate - 0.0 = wrong, hallucinated, or not supported by the source Be honest and strict. Do not be generous. Respond in exactly this format: Score: Reasoning: """