Spaces:
Sleeping
Sleeping
Upload 2 files
Browse files- CAT_universal_prompt.txt +53 -0
- app.py +400 -0
CAT_universal_prompt.txt
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CAT Universal Prompt (CAT_universal_prompt.txt)
|
| 2 |
+
|
| 3 |
+
Opening Context
|
| 4 |
+
You are an AI Mentor guiding a student through a simulation. The student will help a fictional character, in a realistic NYC entry-level job, navigate a decision or dilemma directly related to the Module Learning Objectives below. You will generate one realistic scenario aligned with these objectives, starting with a vivid, in-character request for help that encourages the student to ask questions and perform the relevant analyses taught in the module.
|
| 5 |
+
|
| 6 |
+
[LEARNING OBJECTIVES]
|
| 7 |
+
{LEARNING_OBJECTIVES}
|
| 8 |
+
|
| 9 |
+
Rubric Criteria for Evaluation
|
| 10 |
+
The evaluator will use the rubric below. Throughout the conversation, gently guide the student to address these items. Do not do analyses for the student; instead, prompt them for inputs and let them run the tools.
|
| 11 |
+
{RUBRIC}
|
| 12 |
+
|
| 13 |
+
Rules for the Simulation
|
| 14 |
+
- Stay in character during roleplay until the scene ends.
|
| 15 |
+
- Keep responses concise (2–4 sentences) and focused on helping the student think, not on solving the problem for them.
|
| 16 |
+
- Let the student lead—do not introduce decision-making tools; if they choose one, ask for their inputs and let them run it.
|
| 17 |
+
- Encourage the student to perform the relevant analyses taught in the module as part of their reasoning.
|
| 18 |
+
- Aim for about 15–20 meaningful exchanges total.
|
| 19 |
+
- Before moving to the wrap-up, ensure the student has addressed the key elements in this module’s rubric and learning objectives. If any appear missing, ask a brief, relevant question to prompt for them.
|
| 20 |
+
|
| 21 |
+
Scene Wrap & Transition to Evaluation
|
| 22 |
+
- After roughly 7–9 student turns, begin closing the scene.
|
| 23 |
+
- In character, warmly acknowledge the student’s efforts and invite a final contribution:
|
| 24 |
+
"Thanks for walking through this with me. We’ve covered a lot. Before we wrap up, is there anything else you’d like me to consider before I give you a preliminary assessment?"
|
| 25 |
+
- If the student says no:
|
| 26 |
+
"Okay, thank you. Let’s step back and review how you approached this situation."
|
| 27 |
+
- If the student says yes:
|
| 28 |
+
Provide one short, neutral acknowledgment only (no new roleplay branches):
|
| 29 |
+
"I appreciate you sharing that. I’ll take it into account."
|
| 30 |
+
Then transition to mentor mode.
|
| 31 |
+
- Hard cap: If 10 student turns are reached without the wrap-up, trigger it automatically.
|
| 32 |
+
|
| 33 |
+
Evaluation Phase (Mentor Mode)
|
| 34 |
+
- Drop character completely.
|
| 35 |
+
- Ask the student to name at least two decision-making tools they used and confirm whether they applied them accurately.
|
| 36 |
+
- Using the module rubric:
|
| 37 |
+
- Assign a score for each category.
|
| 38 |
+
- Provide up to 100 words of feedback per category in a warm, professional tone.
|
| 39 |
+
- Include at least one quote or paraphrase from the conversation to support each score.
|
| 40 |
+
- Be specific and constructive. Award full marks only if all criteria are met.
|
| 41 |
+
- Invite revision:
|
| 42 |
+
"Would you like to revise any part of your reasoning or recommendation before receiving your final score?"
|
| 43 |
+
- If they revise, reassess and give updated scores and feedback.
|
| 44 |
+
|
| 45 |
+
Fictional Consequence
|
| 46 |
+
- After scoring, describe a fictional but plausible consequence of the character’s decision-making process tied to the student’s performance:
|
| 47 |
+
- Excellent: Significant success or positive impact.
|
| 48 |
+
- Satisfactory: Middling result with some improvement needed.
|
| 49 |
+
- Unsatisfactory: Realistic setback or risk from weaker reasoning.
|
| 50 |
+
- Keep it brief (2–3 sentences), professional, and relevant.
|
| 51 |
+
|
| 52 |
+
Starting the Simulation
|
| 53 |
+
Generate one realistic scenario aligned with the learning objectives. Begin with a short, vivid description of the situation and an in-character request for guidance. Then wait for the student to respond.
|
app.py
ADDED
|
@@ -0,0 +1,400 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import gradio as gr
|
| 3 |
+
import re
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
UNIVERSAL_PROMPT_PATH = "CAT_universal_prompt.txt"
|
| 7 |
+
MODULE_DIR = "modules" # <-- now using /modules subfolder
|
| 8 |
+
|
| 9 |
+
from openai import OpenAI
|
| 10 |
+
from dotenv import load_dotenv
|
| 11 |
+
load_dotenv()
|
| 12 |
+
client = OpenAI()
|
| 13 |
+
|
| 14 |
+
# Type aliases
|
| 15 |
+
from typing import List, cast
|
| 16 |
+
from openai.types.chat import ChatCompletionMessageParam
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def call_model(system_prompt: str, history: list[dict[str, str]]) -> str:
|
| 20 |
+
# Build as simple dicts first
|
| 21 |
+
msgs: list[dict[str, str]] = [{"role": "system", "content": system_prompt}]
|
| 22 |
+
for m in history:
|
| 23 |
+
role = m.get("role")
|
| 24 |
+
content = m.get("content")
|
| 25 |
+
if role in ("user", "assistant") and isinstance(content, str):
|
| 26 |
+
msgs.append({"role": role, "content": content})
|
| 27 |
+
|
| 28 |
+
# Cast once at the call site to satisfy the SDK types
|
| 29 |
+
typed_msgs = cast(List[ChatCompletionMessageParam], msgs)
|
| 30 |
+
|
| 31 |
+
resp = client.chat.completions.create(
|
| 32 |
+
model="gpt-4o-mini",
|
| 33 |
+
messages=typed_msgs,
|
| 34 |
+
temperature=0.4,
|
| 35 |
+
)
|
| 36 |
+
return resp.choices[0].message.content or ""
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def load_text(path: str) -> str:
|
| 40 |
+
with open(path, "r", encoding="utf-8") as f:
|
| 41 |
+
return f.read()
|
| 42 |
+
|
| 43 |
+
def assemble_prompt(universal_prompt_text: str, module_text: str) -> str:
|
| 44 |
+
def extract(label: str) -> str:
|
| 45 |
+
marker = label + ":"
|
| 46 |
+
start = module_text.find(marker)
|
| 47 |
+
if start == -1:
|
| 48 |
+
return ""
|
| 49 |
+
start += len(marker)
|
| 50 |
+
next_markers = ["\nLEARNING OBJECTIVES:", "\nRUBRIC:", "\nMODULE NAME:"]
|
| 51 |
+
end_positions = [module_text.find(m, start) for m in next_markers if module_text.find(m, start) != -1]
|
| 52 |
+
end = min(end_positions) if end_positions else len(module_text)
|
| 53 |
+
return module_text[start:end].strip()
|
| 54 |
+
|
| 55 |
+
learning_objectives = extract("LEARNING OBJECTIVES")
|
| 56 |
+
rubric = extract("RUBRIC")
|
| 57 |
+
|
| 58 |
+
prompt = universal_prompt_text.replace("{LEARNING_OBJECTIVES}", learning_objectives)
|
| 59 |
+
prompt = prompt.replace("{RUBRIC}", rubric)
|
| 60 |
+
return prompt
|
| 61 |
+
|
| 62 |
+
def init_state():
|
| 63 |
+
return {
|
| 64 |
+
"assembled_prompt": "",
|
| 65 |
+
"history": [],
|
| 66 |
+
"mode": "roleplay",
|
| 67 |
+
"mentor_step": 0,
|
| 68 |
+
"student_name": ""
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
def start_session(module_file, student_name=""):
|
| 72 |
+
state = init_state()
|
| 73 |
+
state["student_name"] = student_name
|
| 74 |
+
|
| 75 |
+
universal = load_text(UNIVERSAL_PROMPT_PATH)
|
| 76 |
+
module_text = load_text(Path(MODULE_DIR) / module_file)
|
| 77 |
+
|
| 78 |
+
# Parse the full RUBRIC section once and keep a structured copy
|
| 79 |
+
state["rubric_items"] = parse_rubric_from_module(module_text)
|
| 80 |
+
print(f"[CAT] Parsed {len(state['rubric_items'])} rubric items for this module.")
|
| 81 |
+
|
| 82 |
+
# Personalize lightly with the student's first name
|
| 83 |
+
name_hint = (
|
| 84 |
+
f"\n\n[Student first name: {student_name}. Use it naturally once in the opening; don’t overuse.]"
|
| 85 |
+
if student_name else ""
|
| 86 |
+
)
|
| 87 |
+
state["assembled_prompt"] = assemble_prompt(universal, module_text) + name_hint
|
| 88 |
+
|
| 89 |
+
state["history"].append({"role": "system", "content": state["assembled_prompt"]})
|
| 90 |
+
opening = call_model(state["assembled_prompt"], state["history"])
|
| 91 |
+
state["history"].append({"role": "assistant", "content": opening})
|
| 92 |
+
return state, state["history"]
|
| 93 |
+
|
| 94 |
+
def chat(user_msg, state):
|
| 95 |
+
if not user_msg.strip():
|
| 96 |
+
return "", state["history"], state
|
| 97 |
+
|
| 98 |
+
# Shortcut: typing "grade" acts like pressing the Assess button
|
| 99 |
+
if user_msg.strip().lower() == "grade":
|
| 100 |
+
hist, st = assess_fn(state)
|
| 101 |
+
return "", hist, st
|
| 102 |
+
|
| 103 |
+
# If the scene is finished, ignore further input and return cleanly
|
| 104 |
+
if state.get("mode") == "done":
|
| 105 |
+
return "", state["history"], state
|
| 106 |
+
|
| 107 |
+
# If we've left roleplay (e.g., pressed Assess), stop running roleplay code
|
| 108 |
+
if state.get("mode") != "roleplay":
|
| 109 |
+
return "", state["history"], state
|
| 110 |
+
|
| 111 |
+
state["history"].append({"role": "user", "content": user_msg})
|
| 112 |
+
|
| 113 |
+
if state["mode"] == "roleplay":
|
| 114 |
+
reply = call_model(state["assembled_prompt"], state["history"])
|
| 115 |
+
state["history"].append({"role": "assistant", "content": reply})
|
| 116 |
+
return "", state["history"], state
|
| 117 |
+
|
| 118 |
+
if state["mode"] == "mentor":
|
| 119 |
+
# Step 1: general intro (no assumption of tools)
|
| 120 |
+
if state.get("mentor_step", 0) == 0:
|
| 121 |
+
eval_intro = (
|
| 122 |
+
"Before we wrap up: name two specific concepts, tools, or frameworks you used in this scenario, "
|
| 123 |
+
"and in one short sentence each say how you applied them. If you didn’t use any, name two insights "
|
| 124 |
+
"you learned and how you would apply them next time."
|
| 125 |
+
)
|
| 126 |
+
state["history"].append({"role": "assistant", "content": eval_intro})
|
| 127 |
+
state["mentor_step"] = 1
|
| 128 |
+
return "", state["history"], state
|
| 129 |
+
|
| 130 |
+
# Step 2: concise rubric-based evaluation
|
| 131 |
+
else:
|
| 132 |
+
# Concise rubric-based evaluation (hidden): pass instruction via system_prompt
|
| 133 |
+
eval_request = (
|
| 134 |
+
"Evaluate the student's performance using the module rubric. Provide these sections: "
|
| 135 |
+
"Overall rating (Unsatisfactory, Satisfactory, or Excellent) with a one-sentence justification; "
|
| 136 |
+
"Career competencies; Uniquely human capacities; Argument analysis; Ethical frameworks; ESG awareness; "
|
| 137 |
+
"Application; Interaction quality; Strength; Area to improve; Advice for next time; Fictional consequence. "
|
| 138 |
+
"Quote at least one student phrase. Keep the whole evaluation under 180 words."
|
| 139 |
+
)
|
| 140 |
+
try:
|
| 141 |
+
reply = call_model(
|
| 142 |
+
state["assembled_prompt"] + "\n\n" + eval_request,
|
| 143 |
+
state["history"] # call_model ignores system entries in history by design
|
| 144 |
+
)
|
| 145 |
+
except Exception as e:
|
| 146 |
+
# Never let chat return None; show a friendly error and allow retry
|
| 147 |
+
state["history"].append({
|
| 148 |
+
"role": "assistant",
|
| 149 |
+
"content": f"[Assessment error: {e}] Please press Assess again in a few seconds."
|
| 150 |
+
})
|
| 151 |
+
return "", state["history"], state
|
| 152 |
+
|
| 153 |
+
state["history"].append({"role": "assistant", "content": reply})
|
| 154 |
+
state["mode"] = "done"
|
| 155 |
+
return "", state["history"], state
|
| 156 |
+
# Safety net: ensure consistent return shape if a future branch falls through
|
| 157 |
+
return "", state["history"], state
|
| 158 |
+
|
| 159 |
+
RUBRIC_FALLBACK = [
|
| 160 |
+
"States the decision and information needs clearly",
|
| 161 |
+
"Applies the appropriate tool or framework correctly",
|
| 162 |
+
"Shows steps or calculations and a decision rule, tool, or framework",
|
| 163 |
+
"Justifies the conclusion and notes at least one limitation or tradeoff",
|
| 164 |
+
]
|
| 165 |
+
# --- Helpers for rubric evaluation JSON ---
|
| 166 |
+
import json
|
| 167 |
+
|
| 168 |
+
def _safe_json_loads(s: str):
|
| 169 |
+
try:
|
| 170 |
+
return json.loads(s)
|
| 171 |
+
except Exception:
|
| 172 |
+
# crude but robust: try to extract {...} block if the model wrapped it
|
| 173 |
+
start = s.find("{")
|
| 174 |
+
end = s.rfind("}")
|
| 175 |
+
if start != -1 and end != -1 and end > start:
|
| 176 |
+
try:
|
| 177 |
+
return json.loads(s[start:end+1])
|
| 178 |
+
except Exception:
|
| 179 |
+
return None
|
| 180 |
+
return None
|
| 181 |
+
|
| 182 |
+
def _format_assessment_readable(assess_obj):
|
| 183 |
+
"""
|
| 184 |
+
assess_obj schema:
|
| 185 |
+
{
|
| 186 |
+
"criteria": [{"id": "...","level":"no|partial|full","points":0|0.5|1,"evidence":"..."}],
|
| 187 |
+
"total_points": float,
|
| 188 |
+
"max_points": float,
|
| 189 |
+
"summary": "≤180 words narrative"
|
| 190 |
+
}
|
| 191 |
+
"""
|
| 192 |
+
if not isinstance(assess_obj, dict) or "criteria" not in assess_obj:
|
| 193 |
+
return "[Assessment parsing error: invalid JSON]"
|
| 194 |
+
lines = []
|
| 195 |
+
total = assess_obj.get("total_points", 0)
|
| 196 |
+
maxp = assess_obj.get("max_points", 0)
|
| 197 |
+
lines.append(f"Score: {total:g}/{maxp:g}")
|
| 198 |
+
lines.append("")
|
| 199 |
+
for c in assess_obj["criteria"]:
|
| 200 |
+
lid = c.get("id","?")
|
| 201 |
+
level = c.get("level","?")
|
| 202 |
+
pts = c.get("points","?")
|
| 203 |
+
ev = c.get("evidence","")
|
| 204 |
+
lines.append(f"- {lid}: {level} ({pts}) — {ev}")
|
| 205 |
+
if assess_obj.get("summary"):
|
| 206 |
+
lines.append("")
|
| 207 |
+
lines.append(assess_obj["summary"])
|
| 208 |
+
return "\n".join(lines)
|
| 209 |
+
# --- end helpers ---
|
| 210 |
+
|
| 211 |
+
# --- Assess: rubric-based, no/partial/full per criterion (JSON output) ---
|
| 212 |
+
def assess_fn(state):
|
| 213 |
+
"""
|
| 214 |
+
One press:
|
| 215 |
+
1) Adds END OF SCENE visibly once.
|
| 216 |
+
2) Runs a rubric-based evaluation using a dedicated evaluator system prompt.
|
| 217 |
+
Output schema is strict JSON with per-criterion no/partial/full.
|
| 218 |
+
If already done: no-op.
|
| 219 |
+
Returns: (chat_history, state)
|
| 220 |
+
"""
|
| 221 |
+
# If already finalized, do nothing
|
| 222 |
+
if state.get("mode") == "done":
|
| 223 |
+
return state["history"], state
|
| 224 |
+
|
| 225 |
+
# 1) Show the scene break once
|
| 226 |
+
if not (
|
| 227 |
+
state["history"]
|
| 228 |
+
and state["history"][-1].get("role") == "assistant"
|
| 229 |
+
and state["history"][-1].get("content", "").strip() == "END OF SCENE"
|
| 230 |
+
):
|
| 231 |
+
state["history"].append({"role": "assistant", "content": "END OF SCENE"})
|
| 232 |
+
|
| 233 |
+
# Enter mentor mode and skip any intro
|
| 234 |
+
state["mode"] = "mentor"
|
| 235 |
+
state["mentor_step"] = 2
|
| 236 |
+
|
| 237 |
+
# 2) Build rubric payload from the current module; fall back only if needed
|
| 238 |
+
raw_items = state.get("rubric_items") or []
|
| 239 |
+
if not isinstance(raw_items, list) or len(raw_items) == 0:
|
| 240 |
+
raw_items = RUBRIC_FALLBACK[:] # last resort
|
| 241 |
+
|
| 242 |
+
rubric = []
|
| 243 |
+
for i, item in enumerate(raw_items, start=1):
|
| 244 |
+
rid = item.get("id") if isinstance(item, dict) else str(item)
|
| 245 |
+
rubric.append({"id": rid or f"Criterion {i}"})
|
| 246 |
+
|
| 247 |
+
# 3) Dedicated evaluator prompt: yes/no evidence per item (simple, deterministic)
|
| 248 |
+
assessor_system = (
|
| 249 |
+
"You are the Evaluator for the Conversational Assessment Tool (CAT).\n"
|
| 250 |
+
"For EACH rubric item, decide if the student provided reasonable, college-level evidence.\n"
|
| 251 |
+
"Rules:\n"
|
| 252 |
+
"- 'meets' = true only if the student shows specific, relevant reasoning/evidence for that item.\n"
|
| 253 |
+
"- Otherwise 'meets' = false.\n"
|
| 254 |
+
"Return STRICT JSON ONLY (no prose outside JSON):\n"
|
| 255 |
+
"{\n"
|
| 256 |
+
' "results": [\n'
|
| 257 |
+
' {"id": "<criterion id>", "meets": true|false, "evidence": "<short quote or brief reason>"}\n'
|
| 258 |
+
" ]\n"
|
| 259 |
+
"}"
|
| 260 |
+
)
|
| 261 |
+
|
| 262 |
+
# 4) Provide the actual rubric text to the model as context
|
| 263 |
+
module_rubric_text = "\n".join(f"- {c['id']}" for c in rubric)
|
| 264 |
+
history_for_eval = list(state["history"]) + [
|
| 265 |
+
{"role": "assistant", "content": "Evaluate against these rubric items:\n" + module_rubric_text}
|
| 266 |
+
]
|
| 267 |
+
|
| 268 |
+
try:
|
| 269 |
+
model_raw = call_model(assessor_system, history_for_eval)
|
| 270 |
+
|
| 271 |
+
# Parse and normalize
|
| 272 |
+
data = _safe_json_loads(model_raw)
|
| 273 |
+
if not data or "results" not in data or not isinstance(data["results"], list):
|
| 274 |
+
raise ValueError("Invalid evaluator JSON")
|
| 275 |
+
|
| 276 |
+
# Align results to rubric order
|
| 277 |
+
results = []
|
| 278 |
+
by_id = {str(r.get("id", "")): r for r in data["results"]}
|
| 279 |
+
for c in rubric:
|
| 280 |
+
cid = c["id"]
|
| 281 |
+
r = by_id.get(cid, {})
|
| 282 |
+
meets = bool(r.get("meets") is True)
|
| 283 |
+
evidence = str(r.get("evidence") or "")
|
| 284 |
+
results.append({"id": cid, "meets": meets, "evidence": evidence})
|
| 285 |
+
|
| 286 |
+
met = sum(1 for r in results if r["meets"])
|
| 287 |
+
total = len(results)
|
| 288 |
+
pct = (met / total) if total else 0.0
|
| 289 |
+
|
| 290 |
+
if total > 0 and met == total:
|
| 291 |
+
overall = "Full Credit"
|
| 292 |
+
elif pct >= 0.50:
|
| 293 |
+
overall = "Partial Credit"
|
| 294 |
+
else:
|
| 295 |
+
overall = "No Credit"
|
| 296 |
+
|
| 297 |
+
# Render readable output
|
| 298 |
+
lines = [f"Overall: {overall}", f"Met: {met}/{total} ({round(pct*100)}%)", ""]
|
| 299 |
+
for r in results:
|
| 300 |
+
mark = "✅" if r["meets"] else "❌"
|
| 301 |
+
ev = f" — {r['evidence']}" if r["evidence"] else ""
|
| 302 |
+
lines.append(f"- {mark} {r['id']}{ev}")
|
| 303 |
+
readable = "\n".join(lines)
|
| 304 |
+
|
| 305 |
+
except Exception as e:
|
| 306 |
+
readable = f"[Assessment error: {e}]"
|
| 307 |
+
|
| 308 |
+
state["history"].append({"role": "assistant", "content": readable})
|
| 309 |
+
state["mode"] = "done"
|
| 310 |
+
return state["history"], state
|
| 311 |
+
|
| 312 |
+
# --- end Assess ---
|
| 313 |
+
|
| 314 |
+
def parse_rubric_from_module(module_text: str):
|
| 315 |
+
"""
|
| 316 |
+
Extracts the full RUBRIC section from a module text file and returns a list of items.
|
| 317 |
+
- Captures everything after a line that says 'RUBRIC' (with optional colon)
|
| 318 |
+
until the next ALL-CAPS header or file end.
|
| 319 |
+
- Accepts bullets like -, *, •, or 1), 1., etc.
|
| 320 |
+
- Falls back to RUBRIC_FALLBACK if nothing is found.
|
| 321 |
+
"""
|
| 322 |
+
if not module_text:
|
| 323 |
+
return RUBRIC_FALLBACK[:]
|
| 324 |
+
|
| 325 |
+
# 1) Slice out the RUBRIC block
|
| 326 |
+
block_re = re.compile(
|
| 327 |
+
r'^\s*RUBRIC\s*:?\s*$([\s\S]*?)(?=^\s*[A-Z][A-Z\s/&\-]{3,}\s*:?\s*$|^\Z)',
|
| 328 |
+
re.MULTILINE
|
| 329 |
+
)
|
| 330 |
+
m = block_re.search(module_text)
|
| 331 |
+
if not m:
|
| 332 |
+
return RUBRIC_FALLBACK[:]
|
| 333 |
+
block = m.group(1).strip()
|
| 334 |
+
|
| 335 |
+
# 2) Collect bullet-like lines
|
| 336 |
+
items = []
|
| 337 |
+
for line in block.splitlines():
|
| 338 |
+
# Keep original text but trim whitespace
|
| 339 |
+
raw = line.strip()
|
| 340 |
+
if not raw:
|
| 341 |
+
continue
|
| 342 |
+
# Match common bullet or numbered list starters
|
| 343 |
+
if re.match(r'^(\-|\*|•|\d+[\.\)]|\([a-z]\))\s+', raw, re.IGNORECASE):
|
| 344 |
+
# strip the bullet prefix
|
| 345 |
+
cleaned = re.sub(r'^(\-|\*|•|\d+[\.\)]|\([a-z]\))\s+', '', raw, flags=re.IGNORECASE).strip()
|
| 346 |
+
if cleaned:
|
| 347 |
+
items.append(cleaned)
|
| 348 |
+
else:
|
| 349 |
+
# Some rubrics are paragraph-style; treat non-empty lines as items
|
| 350 |
+
# but avoid obvious section labels like "Notes:" inside the block
|
| 351 |
+
if not re.match(r'^\s*(notes?|example|weight|scale)\s*:?\s*$', raw, re.IGNORECASE):
|
| 352 |
+
items.append(raw)
|
| 353 |
+
|
| 354 |
+
# 3) Deduplicate while preserving order
|
| 355 |
+
seen = set()
|
| 356 |
+
deduped = []
|
| 357 |
+
for it in items:
|
| 358 |
+
if it not in seen:
|
| 359 |
+
seen.add(it)
|
| 360 |
+
deduped.append(it)
|
| 361 |
+
|
| 362 |
+
return deduped or RUBRIC_FALLBACK[:]
|
| 363 |
+
|
| 364 |
+
with gr.Blocks(title="CAT (MVP)") as demo:
|
| 365 |
+
gr.Markdown("## 😼Conversational Assessment Tool (CAT) — MVP")
|
| 366 |
+
with gr.Row():
|
| 367 |
+
module_file = gr.Dropdown(
|
| 368 |
+
label="Select Module File",
|
| 369 |
+
choices=[p.name for p in sorted(Path(MODULE_DIR).glob("module*.txt"))],
|
| 370 |
+
value="module01.txt",
|
| 371 |
+
interactive=True
|
| 372 |
+
)
|
| 373 |
+
name_tb = gr.Textbox(label="Your first name", placeholder="e.g., Maya", value="", interactive=True)
|
| 374 |
+
start_btn = gr.Button("Start") # fine to keep inside the row (optional)
|
| 375 |
+
|
| 376 |
+
chatbot = gr.Chatbot(label="CAT Conversation", type="messages")
|
| 377 |
+
user_in = gr.Textbox(label="Your message", placeholder="Type here and press Enter")
|
| 378 |
+
state = gr.State(init_state())
|
| 379 |
+
assess_btn = gr.Button("Assess", variant="primary")
|
| 380 |
+
|
| 381 |
+
def _start(module_name, student_name):
|
| 382 |
+
student_name = student_name.strip()
|
| 383 |
+
if not student_name:
|
| 384 |
+
# Return a valid state object plus a warning message in the chat
|
| 385 |
+
return init_state(), [{"role": "assistant", "content": "⚠ Please enter your first name before starting."}]
|
| 386 |
+
st, hist = start_session(module_name, student_name)
|
| 387 |
+
return st, hist
|
| 388 |
+
|
| 389 |
+
start_btn.click(_start, [module_file, name_tb], [state, chatbot])
|
| 390 |
+
user_in.submit(chat, [user_in, state], [user_in, chatbot, state])
|
| 391 |
+
# Clicking Assess triggers the mentor/evaluator flow
|
| 392 |
+
assess_btn.click(
|
| 393 |
+
fn=assess_fn,
|
| 394 |
+
inputs=[state],
|
| 395 |
+
outputs=[chatbot, state]
|
| 396 |
+
)
|
| 397 |
+
|
| 398 |
+
|
| 399 |
+
if __name__ == "__main__":
|
| 400 |
+
demo.launch()
|