Decision-Fish commited on
Commit
aa39c2d
·
verified ·
1 Parent(s): 6e19a5c

Upload 2 files

Browse files
Files changed (2) hide show
  1. CAT_universal_prompt.txt +53 -0
  2. app.py +400 -0
CAT_universal_prompt.txt ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CAT Universal Prompt (CAT_universal_prompt.txt)
2
+
3
+ Opening Context
4
+ You are an AI Mentor guiding a student through a simulation. The student will help a fictional character, in a realistic NYC entry-level job, navigate a decision or dilemma directly related to the Module Learning Objectives below. You will generate one realistic scenario aligned with these objectives, starting with a vivid, in-character request for help that encourages the student to ask questions and perform the relevant analyses taught in the module.
5
+
6
+ [LEARNING OBJECTIVES]
7
+ {LEARNING_OBJECTIVES}
8
+
9
+ Rubric Criteria for Evaluation
10
+ The evaluator will use the rubric below. Throughout the conversation, gently guide the student to address these items. Do not do analyses for the student; instead, prompt them for inputs and let them run the tools.
11
+ {RUBRIC}
12
+
13
+ Rules for the Simulation
14
+ - Stay in character during roleplay until the scene ends.
15
+ - Keep responses concise (2–4 sentences) and focused on helping the student think, not on solving the problem for them.
16
+ - Let the student lead—do not introduce decision-making tools; if they choose one, ask for their inputs and let them run it.
17
+ - Encourage the student to perform the relevant analyses taught in the module as part of their reasoning.
18
+ - Aim for about 15–20 meaningful exchanges total.
19
+ - Before moving to the wrap-up, ensure the student has addressed the key elements in this module’s rubric and learning objectives. If any appear missing, ask a brief, relevant question to prompt for them.
20
+
21
+ Scene Wrap & Transition to Evaluation
22
+ - After roughly 7–9 student turns, begin closing the scene.
23
+ - In character, warmly acknowledge the student’s efforts and invite a final contribution:
24
+ "Thanks for walking through this with me. We’ve covered a lot. Before we wrap up, is there anything else you’d like me to consider before I give you a preliminary assessment?"
25
+ - If the student says no:
26
+ "Okay, thank you. Let’s step back and review how you approached this situation."
27
+ - If the student says yes:
28
+ Provide one short, neutral acknowledgment only (no new roleplay branches):
29
+ "I appreciate you sharing that. I’ll take it into account."
30
+ Then transition to mentor mode.
31
+ - Hard cap: If 10 student turns are reached without the wrap-up, trigger it automatically.
32
+
33
+ Evaluation Phase (Mentor Mode)
34
+ - Drop character completely.
35
+ - Ask the student to name at least two decision-making tools they used and confirm whether they applied them accurately.
36
+ - Using the module rubric:
37
+ - Assign a score for each category.
38
+ - Provide up to 100 words of feedback per category in a warm, professional tone.
39
+ - Include at least one quote or paraphrase from the conversation to support each score.
40
+ - Be specific and constructive. Award full marks only if all criteria are met.
41
+ - Invite revision:
42
+ "Would you like to revise any part of your reasoning or recommendation before receiving your final score?"
43
+ - If they revise, reassess and give updated scores and feedback.
44
+
45
+ Fictional Consequence
46
+ - After scoring, describe a fictional but plausible consequence of the character’s decision-making process tied to the student’s performance:
47
+ - Excellent: Significant success or positive impact.
48
+ - Satisfactory: Middling result with some improvement needed.
49
+ - Unsatisfactory: Realistic setback or risk from weaker reasoning.
50
+ - Keep it brief (2–3 sentences), professional, and relevant.
51
+
52
+ Starting the Simulation
53
+ Generate one realistic scenario aligned with the learning objectives. Begin with a short, vivid description of the situation and an in-character request for guidance. Then wait for the student to respond.
app.py ADDED
@@ -0,0 +1,400 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import re
4
+ from pathlib import Path
5
+
6
+ UNIVERSAL_PROMPT_PATH = "CAT_universal_prompt.txt"
7
+ MODULE_DIR = "modules" # <-- now using /modules subfolder
8
+
9
+ from openai import OpenAI
10
+ from dotenv import load_dotenv
11
+ load_dotenv()
12
+ client = OpenAI()
13
+
14
+ # Type aliases
15
+ from typing import List, cast
16
+ from openai.types.chat import ChatCompletionMessageParam
17
+
18
+
19
+ def call_model(system_prompt: str, history: list[dict[str, str]]) -> str:
20
+ # Build as simple dicts first
21
+ msgs: list[dict[str, str]] = [{"role": "system", "content": system_prompt}]
22
+ for m in history:
23
+ role = m.get("role")
24
+ content = m.get("content")
25
+ if role in ("user", "assistant") and isinstance(content, str):
26
+ msgs.append({"role": role, "content": content})
27
+
28
+ # Cast once at the call site to satisfy the SDK types
29
+ typed_msgs = cast(List[ChatCompletionMessageParam], msgs)
30
+
31
+ resp = client.chat.completions.create(
32
+ model="gpt-4o-mini",
33
+ messages=typed_msgs,
34
+ temperature=0.4,
35
+ )
36
+ return resp.choices[0].message.content or ""
37
+
38
+
39
+ def load_text(path: str) -> str:
40
+ with open(path, "r", encoding="utf-8") as f:
41
+ return f.read()
42
+
43
+ def assemble_prompt(universal_prompt_text: str, module_text: str) -> str:
44
+ def extract(label: str) -> str:
45
+ marker = label + ":"
46
+ start = module_text.find(marker)
47
+ if start == -1:
48
+ return ""
49
+ start += len(marker)
50
+ next_markers = ["\nLEARNING OBJECTIVES:", "\nRUBRIC:", "\nMODULE NAME:"]
51
+ end_positions = [module_text.find(m, start) for m in next_markers if module_text.find(m, start) != -1]
52
+ end = min(end_positions) if end_positions else len(module_text)
53
+ return module_text[start:end].strip()
54
+
55
+ learning_objectives = extract("LEARNING OBJECTIVES")
56
+ rubric = extract("RUBRIC")
57
+
58
+ prompt = universal_prompt_text.replace("{LEARNING_OBJECTIVES}", learning_objectives)
59
+ prompt = prompt.replace("{RUBRIC}", rubric)
60
+ return prompt
61
+
62
+ def init_state():
63
+ return {
64
+ "assembled_prompt": "",
65
+ "history": [],
66
+ "mode": "roleplay",
67
+ "mentor_step": 0,
68
+ "student_name": ""
69
+ }
70
+
71
+ def start_session(module_file, student_name=""):
72
+ state = init_state()
73
+ state["student_name"] = student_name
74
+
75
+ universal = load_text(UNIVERSAL_PROMPT_PATH)
76
+ module_text = load_text(Path(MODULE_DIR) / module_file)
77
+
78
+ # Parse the full RUBRIC section once and keep a structured copy
79
+ state["rubric_items"] = parse_rubric_from_module(module_text)
80
+ print(f"[CAT] Parsed {len(state['rubric_items'])} rubric items for this module.")
81
+
82
+ # Personalize lightly with the student's first name
83
+ name_hint = (
84
+ f"\n\n[Student first name: {student_name}. Use it naturally once in the opening; don’t overuse.]"
85
+ if student_name else ""
86
+ )
87
+ state["assembled_prompt"] = assemble_prompt(universal, module_text) + name_hint
88
+
89
+ state["history"].append({"role": "system", "content": state["assembled_prompt"]})
90
+ opening = call_model(state["assembled_prompt"], state["history"])
91
+ state["history"].append({"role": "assistant", "content": opening})
92
+ return state, state["history"]
93
+
94
+ def chat(user_msg, state):
95
+ if not user_msg.strip():
96
+ return "", state["history"], state
97
+
98
+ # Shortcut: typing "grade" acts like pressing the Assess button
99
+ if user_msg.strip().lower() == "grade":
100
+ hist, st = assess_fn(state)
101
+ return "", hist, st
102
+
103
+ # If the scene is finished, ignore further input and return cleanly
104
+ if state.get("mode") == "done":
105
+ return "", state["history"], state
106
+
107
+ # If we've left roleplay (e.g., pressed Assess), stop running roleplay code
108
+ if state.get("mode") != "roleplay":
109
+ return "", state["history"], state
110
+
111
+ state["history"].append({"role": "user", "content": user_msg})
112
+
113
+ if state["mode"] == "roleplay":
114
+ reply = call_model(state["assembled_prompt"], state["history"])
115
+ state["history"].append({"role": "assistant", "content": reply})
116
+ return "", state["history"], state
117
+
118
+ if state["mode"] == "mentor":
119
+ # Step 1: general intro (no assumption of tools)
120
+ if state.get("mentor_step", 0) == 0:
121
+ eval_intro = (
122
+ "Before we wrap up: name two specific concepts, tools, or frameworks you used in this scenario, "
123
+ "and in one short sentence each say how you applied them. If you didn’t use any, name two insights "
124
+ "you learned and how you would apply them next time."
125
+ )
126
+ state["history"].append({"role": "assistant", "content": eval_intro})
127
+ state["mentor_step"] = 1
128
+ return "", state["history"], state
129
+
130
+ # Step 2: concise rubric-based evaluation
131
+ else:
132
+ # Concise rubric-based evaluation (hidden): pass instruction via system_prompt
133
+ eval_request = (
134
+ "Evaluate the student's performance using the module rubric. Provide these sections: "
135
+ "Overall rating (Unsatisfactory, Satisfactory, or Excellent) with a one-sentence justification; "
136
+ "Career competencies; Uniquely human capacities; Argument analysis; Ethical frameworks; ESG awareness; "
137
+ "Application; Interaction quality; Strength; Area to improve; Advice for next time; Fictional consequence. "
138
+ "Quote at least one student phrase. Keep the whole evaluation under 180 words."
139
+ )
140
+ try:
141
+ reply = call_model(
142
+ state["assembled_prompt"] + "\n\n" + eval_request,
143
+ state["history"] # call_model ignores system entries in history by design
144
+ )
145
+ except Exception as e:
146
+ # Never let chat return None; show a friendly error and allow retry
147
+ state["history"].append({
148
+ "role": "assistant",
149
+ "content": f"[Assessment error: {e}] Please press Assess again in a few seconds."
150
+ })
151
+ return "", state["history"], state
152
+
153
+ state["history"].append({"role": "assistant", "content": reply})
154
+ state["mode"] = "done"
155
+ return "", state["history"], state
156
+ # Safety net: ensure consistent return shape if a future branch falls through
157
+ return "", state["history"], state
158
+
159
+ RUBRIC_FALLBACK = [
160
+ "States the decision and information needs clearly",
161
+ "Applies the appropriate tool or framework correctly",
162
+ "Shows steps or calculations and a decision rule, tool, or framework",
163
+ "Justifies the conclusion and notes at least one limitation or tradeoff",
164
+ ]
165
+ # --- Helpers for rubric evaluation JSON ---
166
+ import json
167
+
168
+ def _safe_json_loads(s: str):
169
+ try:
170
+ return json.loads(s)
171
+ except Exception:
172
+ # crude but robust: try to extract {...} block if the model wrapped it
173
+ start = s.find("{")
174
+ end = s.rfind("}")
175
+ if start != -1 and end != -1 and end > start:
176
+ try:
177
+ return json.loads(s[start:end+1])
178
+ except Exception:
179
+ return None
180
+ return None
181
+
182
+ def _format_assessment_readable(assess_obj):
183
+ """
184
+ assess_obj schema:
185
+ {
186
+ "criteria": [{"id": "...","level":"no|partial|full","points":0|0.5|1,"evidence":"..."}],
187
+ "total_points": float,
188
+ "max_points": float,
189
+ "summary": "≤180 words narrative"
190
+ }
191
+ """
192
+ if not isinstance(assess_obj, dict) or "criteria" not in assess_obj:
193
+ return "[Assessment parsing error: invalid JSON]"
194
+ lines = []
195
+ total = assess_obj.get("total_points", 0)
196
+ maxp = assess_obj.get("max_points", 0)
197
+ lines.append(f"Score: {total:g}/{maxp:g}")
198
+ lines.append("")
199
+ for c in assess_obj["criteria"]:
200
+ lid = c.get("id","?")
201
+ level = c.get("level","?")
202
+ pts = c.get("points","?")
203
+ ev = c.get("evidence","")
204
+ lines.append(f"- {lid}: {level} ({pts}) — {ev}")
205
+ if assess_obj.get("summary"):
206
+ lines.append("")
207
+ lines.append(assess_obj["summary"])
208
+ return "\n".join(lines)
209
+ # --- end helpers ---
210
+
211
+ # --- Assess: rubric-based, no/partial/full per criterion (JSON output) ---
212
+ def assess_fn(state):
213
+ """
214
+ One press:
215
+ 1) Adds END OF SCENE visibly once.
216
+ 2) Runs a rubric-based evaluation using a dedicated evaluator system prompt.
217
+ Output schema is strict JSON with per-criterion no/partial/full.
218
+ If already done: no-op.
219
+ Returns: (chat_history, state)
220
+ """
221
+ # If already finalized, do nothing
222
+ if state.get("mode") == "done":
223
+ return state["history"], state
224
+
225
+ # 1) Show the scene break once
226
+ if not (
227
+ state["history"]
228
+ and state["history"][-1].get("role") == "assistant"
229
+ and state["history"][-1].get("content", "").strip() == "END OF SCENE"
230
+ ):
231
+ state["history"].append({"role": "assistant", "content": "END OF SCENE"})
232
+
233
+ # Enter mentor mode and skip any intro
234
+ state["mode"] = "mentor"
235
+ state["mentor_step"] = 2
236
+
237
+ # 2) Build rubric payload from the current module; fall back only if needed
238
+ raw_items = state.get("rubric_items") or []
239
+ if not isinstance(raw_items, list) or len(raw_items) == 0:
240
+ raw_items = RUBRIC_FALLBACK[:] # last resort
241
+
242
+ rubric = []
243
+ for i, item in enumerate(raw_items, start=1):
244
+ rid = item.get("id") if isinstance(item, dict) else str(item)
245
+ rubric.append({"id": rid or f"Criterion {i}"})
246
+
247
+ # 3) Dedicated evaluator prompt: yes/no evidence per item (simple, deterministic)
248
+ assessor_system = (
249
+ "You are the Evaluator for the Conversational Assessment Tool (CAT).\n"
250
+ "For EACH rubric item, decide if the student provided reasonable, college-level evidence.\n"
251
+ "Rules:\n"
252
+ "- 'meets' = true only if the student shows specific, relevant reasoning/evidence for that item.\n"
253
+ "- Otherwise 'meets' = false.\n"
254
+ "Return STRICT JSON ONLY (no prose outside JSON):\n"
255
+ "{\n"
256
+ ' "results": [\n'
257
+ ' {"id": "<criterion id>", "meets": true|false, "evidence": "<short quote or brief reason>"}\n'
258
+ " ]\n"
259
+ "}"
260
+ )
261
+
262
+ # 4) Provide the actual rubric text to the model as context
263
+ module_rubric_text = "\n".join(f"- {c['id']}" for c in rubric)
264
+ history_for_eval = list(state["history"]) + [
265
+ {"role": "assistant", "content": "Evaluate against these rubric items:\n" + module_rubric_text}
266
+ ]
267
+
268
+ try:
269
+ model_raw = call_model(assessor_system, history_for_eval)
270
+
271
+ # Parse and normalize
272
+ data = _safe_json_loads(model_raw)
273
+ if not data or "results" not in data or not isinstance(data["results"], list):
274
+ raise ValueError("Invalid evaluator JSON")
275
+
276
+ # Align results to rubric order
277
+ results = []
278
+ by_id = {str(r.get("id", "")): r for r in data["results"]}
279
+ for c in rubric:
280
+ cid = c["id"]
281
+ r = by_id.get(cid, {})
282
+ meets = bool(r.get("meets") is True)
283
+ evidence = str(r.get("evidence") or "")
284
+ results.append({"id": cid, "meets": meets, "evidence": evidence})
285
+
286
+ met = sum(1 for r in results if r["meets"])
287
+ total = len(results)
288
+ pct = (met / total) if total else 0.0
289
+
290
+ if total > 0 and met == total:
291
+ overall = "Full Credit"
292
+ elif pct >= 0.50:
293
+ overall = "Partial Credit"
294
+ else:
295
+ overall = "No Credit"
296
+
297
+ # Render readable output
298
+ lines = [f"Overall: {overall}", f"Met: {met}/{total} ({round(pct*100)}%)", ""]
299
+ for r in results:
300
+ mark = "✅" if r["meets"] else "❌"
301
+ ev = f" — {r['evidence']}" if r["evidence"] else ""
302
+ lines.append(f"- {mark} {r['id']}{ev}")
303
+ readable = "\n".join(lines)
304
+
305
+ except Exception as e:
306
+ readable = f"[Assessment error: {e}]"
307
+
308
+ state["history"].append({"role": "assistant", "content": readable})
309
+ state["mode"] = "done"
310
+ return state["history"], state
311
+
312
+ # --- end Assess ---
313
+
314
+ def parse_rubric_from_module(module_text: str):
315
+ """
316
+ Extracts the full RUBRIC section from a module text file and returns a list of items.
317
+ - Captures everything after a line that says 'RUBRIC' (with optional colon)
318
+ until the next ALL-CAPS header or file end.
319
+ - Accepts bullets like -, *, •, or 1), 1., etc.
320
+ - Falls back to RUBRIC_FALLBACK if nothing is found.
321
+ """
322
+ if not module_text:
323
+ return RUBRIC_FALLBACK[:]
324
+
325
+ # 1) Slice out the RUBRIC block
326
+ block_re = re.compile(
327
+ r'^\s*RUBRIC\s*:?\s*$([\s\S]*?)(?=^\s*[A-Z][A-Z\s/&\-]{3,}\s*:?\s*$|^\Z)',
328
+ re.MULTILINE
329
+ )
330
+ m = block_re.search(module_text)
331
+ if not m:
332
+ return RUBRIC_FALLBACK[:]
333
+ block = m.group(1).strip()
334
+
335
+ # 2) Collect bullet-like lines
336
+ items = []
337
+ for line in block.splitlines():
338
+ # Keep original text but trim whitespace
339
+ raw = line.strip()
340
+ if not raw:
341
+ continue
342
+ # Match common bullet or numbered list starters
343
+ if re.match(r'^(\-|\*|•|\d+[\.\)]|\([a-z]\))\s+', raw, re.IGNORECASE):
344
+ # strip the bullet prefix
345
+ cleaned = re.sub(r'^(\-|\*|•|\d+[\.\)]|\([a-z]\))\s+', '', raw, flags=re.IGNORECASE).strip()
346
+ if cleaned:
347
+ items.append(cleaned)
348
+ else:
349
+ # Some rubrics are paragraph-style; treat non-empty lines as items
350
+ # but avoid obvious section labels like "Notes:" inside the block
351
+ if not re.match(r'^\s*(notes?|example|weight|scale)\s*:?\s*$', raw, re.IGNORECASE):
352
+ items.append(raw)
353
+
354
+ # 3) Deduplicate while preserving order
355
+ seen = set()
356
+ deduped = []
357
+ for it in items:
358
+ if it not in seen:
359
+ seen.add(it)
360
+ deduped.append(it)
361
+
362
+ return deduped or RUBRIC_FALLBACK[:]
363
+
364
+ with gr.Blocks(title="CAT (MVP)") as demo:
365
+ gr.Markdown("## 😼Conversational Assessment Tool (CAT) — MVP")
366
+ with gr.Row():
367
+ module_file = gr.Dropdown(
368
+ label="Select Module File",
369
+ choices=[p.name for p in sorted(Path(MODULE_DIR).glob("module*.txt"))],
370
+ value="module01.txt",
371
+ interactive=True
372
+ )
373
+ name_tb = gr.Textbox(label="Your first name", placeholder="e.g., Maya", value="", interactive=True)
374
+ start_btn = gr.Button("Start") # fine to keep inside the row (optional)
375
+
376
+ chatbot = gr.Chatbot(label="CAT Conversation", type="messages")
377
+ user_in = gr.Textbox(label="Your message", placeholder="Type here and press Enter")
378
+ state = gr.State(init_state())
379
+ assess_btn = gr.Button("Assess", variant="primary")
380
+
381
+ def _start(module_name, student_name):
382
+ student_name = student_name.strip()
383
+ if not student_name:
384
+ # Return a valid state object plus a warning message in the chat
385
+ return init_state(), [{"role": "assistant", "content": "⚠ Please enter your first name before starting."}]
386
+ st, hist = start_session(module_name, student_name)
387
+ return st, hist
388
+
389
+ start_btn.click(_start, [module_file, name_tb], [state, chatbot])
390
+ user_in.submit(chat, [user_in, state], [user_in, chatbot, state])
391
+ # Clicking Assess triggers the mentor/evaluator flow
392
+ assess_btn.click(
393
+ fn=assess_fn,
394
+ inputs=[state],
395
+ outputs=[chatbot, state]
396
+ )
397
+
398
+
399
+ if __name__ == "__main__":
400
+ demo.launch()