tblaisaacliao commited on
Commit
dca7537
·
1 Parent(s): 710cde6

refine evaluation and fix CSV download problem

Browse files
docs/backend-doc/14-evaluation-system.md ADDED
@@ -0,0 +1,413 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Evaluation System
2
+
3
+ This document explains how the AI-based conversation evaluation system works.
4
+
5
+ ## Overview
6
+
7
+ The evaluation system uses AI (OpenAI) to assess conversation quality for teacher training purposes. It supports two evaluation modes:
8
+
9
+ - **Student conversations**: Evaluates how well the AI student simulation helps train teachers
10
+ - **Coach-direct conversations**: Evaluates direct teacher-coach interactions (no student)
11
+
12
+ ## Key Files
13
+
14
+ | File | Purpose |
15
+ |------|---------|
16
+ | `src/lib/services/evaluation-service.ts` | Core evaluation logic and AI integration |
17
+ | `src/lib/repositories/evaluation-repository.ts` | Database operations |
18
+ | `src/lib/types/models.ts` | TypeScript types (lines 83-150) |
19
+ | `src/app/api/admin/evaluations/` | API endpoints |
20
+ | `src/app/admin/evaluations/` | Admin UI pages |
21
+
22
+ ---
23
+
24
+ ## API Endpoints
25
+
26
+ ### List Evaluations
27
+ ```
28
+ GET /api/admin/evaluations
29
+ ```
30
+
31
+ **Query Parameters:**
32
+ | Parameter | Type | Default | Description |
33
+ |-----------|------|---------|-------------|
34
+ | `page` | number | 1 | Page number |
35
+ | `limit` | number | 50 | Results per page |
36
+ | `sortBy` | string | `evaluatedAt` | Sort by `evaluatedAt` or `overallScore` |
37
+ | `sortOrder` | string | `desc` | `asc` or `desc` |
38
+ | `evaluationType` | string | - | Filter by type |
39
+ | `studentPromptId` | string | - | Filter by student personality |
40
+ | `minScore` | number | - | Minimum score filter |
41
+ | `maxScore` | number | - | Maximum score filter |
42
+ | `startDate` | string | - | Start date filter (ISO) |
43
+ | `endDate` | string | - | End date filter (ISO) |
44
+
45
+ **Response:**
46
+ ```json
47
+ {
48
+ "evaluations": [...],
49
+ "pagination": {
50
+ "page": 1,
51
+ "limit": 50,
52
+ "total": 100,
53
+ "totalPages": 2
54
+ }
55
+ }
56
+ ```
57
+
58
+ ### Trigger Batch Evaluation
59
+ ```
60
+ POST /api/admin/evaluations
61
+ ```
62
+
63
+ **Request Body:**
64
+ ```json
65
+ {
66
+ "conversationIds": ["uuid-1", "uuid-2", "uuid-3"]
67
+ }
68
+ ```
69
+
70
+ **Constraints:**
71
+ - Maximum 10 conversations per batch
72
+ - Array must be non-empty
73
+
74
+ **Response:**
75
+ ```json
76
+ {
77
+ "successful": [...],
78
+ "failed": [{ "conversationId": "...", "error": "..." }],
79
+ "summary": {
80
+ "total": 3,
81
+ "successful": 2,
82
+ "failed": 1
83
+ }
84
+ }
85
+ ```
86
+
87
+ ### Get Single Evaluation
88
+ ```
89
+ GET /api/admin/evaluations/[id]
90
+ ```
91
+
92
+ ### Get Evaluation by Conversation
93
+ ```
94
+ GET /api/admin/evaluations/conversation/[conversationId]
95
+ ```
96
+
97
+ ### Trigger Single Evaluation
98
+ ```
99
+ POST /api/admin/evaluations/conversation/[conversationId]
100
+ ```
101
+
102
+ **Query Parameters:**
103
+ | Parameter | Type | Default | Description |
104
+ |-----------|------|---------|-------------|
105
+ | `force` | boolean | false | Force re-evaluation (deletes existing) |
106
+
107
+ ### Get Statistics
108
+ ```
109
+ GET /api/admin/evaluations/stats
110
+ GET /api/admin/evaluations/stats?studentPromptId=grade_1
111
+ ```
112
+
113
+ ---
114
+
115
+ ## Evaluation Flow
116
+
117
+ ```
118
+ ┌─────────────────────────────────────────────────────────────────┐
119
+ │ API Request Received │
120
+ └─────────────────────────────────────────────────────────────────┘
121
+
122
+
123
+ ┌─────────────────────────────────────────────────────────────────┐
124
+ │ 1. Load Data │
125
+ │ - Fetch conversation from ConversationRepository │
126
+ │ - Fetch messages from MessageRepository │
127
+ │ - Check if evaluation already exists │
128
+ └─────────────────────────────────────────────────────────────────┘
129
+
130
+
131
+ ┌─────────────────────────────────────────────────────────────────┐
132
+ │ 2. Detect Conversation Type │
133
+ │ - If studentPromptId === 'coach_direct' → coach_direct mode │
134
+ │ - Otherwise → student mode │
135
+ └─────────────────────────────────────────────────────────────────┘
136
+
137
+
138
+ ┌─────────────────────────────────────────────────────────────────┐
139
+ │ 3. Format Conversation for AI │
140
+ │ - Filter out system messages │
141
+ │ - Map roles to Chinese labels: │
142
+ │ • user → "老師" (Teacher) │
143
+ │ • assistant → "學生"/"教練" based on speaker field │
144
+ │ - Coach-direct: only "老師" and "教練" │
145
+ └─────────────────────────────────────────────────────────────────┘
146
+
147
+
148
+ ┌─────────────────────────────────────────────────────────────────┐
149
+ │ 4. Call AI Model (OpenAI) │
150
+ │ - Model: gpt-4o-mini (or MODEL_NAME env var) │
151
+ │ - Temperature: 0.3 (for consistency) │
152
+ │ - System prompt: based on conversation type │
153
+ │ - User prompt: formatted conversation + system prompt │
154
+ └─────────────────────────────────────────────────────────────────┘
155
+
156
+
157
+ ┌─────────────────────────────────────────────────────────────────┐
158
+ │ 5. Parse AI Response │
159
+ │ - Extract JSON from response │
160
+ │ - Validate required fields │
161
+ │ - Build EvaluationScores and EvaluationFeedback objects │
162
+ └─────────────────────────────────────────────────────────────────┘
163
+
164
+
165
+ ┌─────────────────────────────────────────────────────────────────┐
166
+ │ 6. Save to Database │
167
+ │ - Generate UUID │
168
+ │ - Serialize scores/feedback to JSON │
169
+ │ - Insert into evaluations table │
170
+ └─────────────────────────────────────────────────────────────────┘
171
+
172
+
173
+ ┌─────────────────────────────────────────────────────────────────┐
174
+ │ Return Evaluation Object │
175
+ └─────────────────────────────────────────────────────────────────┘
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Response Schema (Critical)
181
+
182
+ When modifying evaluation prompts, the AI response **must** follow this exact JSON structure. The `parseEvaluationResponse()` function validates these fields.
183
+
184
+ ### Required JSON Structure
185
+
186
+ ```json
187
+ {
188
+ "teacherEngagement": {
189
+ "level": "high|medium|low|none",
190
+ "warning": "string (empty if level is high/medium)"
191
+ },
192
+ "promptDesign": {
193
+ "clarity": 1-5,
194
+ "completeness": 1-5,
195
+ "specificity": 1-5,
196
+ "consistency": 1-5,
197
+ "overall": 1-5,
198
+ "rationale": "string"
199
+ },
200
+ "trainingEffectiveness": {
201
+ "challengeLevel": 1-5,
202
+ "learningOpportunities": 1-5,
203
+ "realisticScenarios": 1-5,
204
+ "engagementDepth": 1-5,
205
+ "overall": 1-5,
206
+ "rationale": "string"
207
+ },
208
+ "conversationQuality": {
209
+ "teacherInsights": 1-5,
210
+ "interactionDepth": 1-5,
211
+ "educationalValue": 1-5,
212
+ "overall": 1-5,
213
+ "rationale": "string"
214
+ },
215
+ "overallScore": 1-5,
216
+ "strengths": ["string array"],
217
+ "improvementAreas": ["string array"],
218
+ "promptSuggestions": ["string array"]
219
+ }
220
+ ```
221
+
222
+ ### Validation Rules
223
+
224
+ The parser checks:
225
+ - `promptDesign`, `trainingEffectiveness`, `conversationQuality` objects must exist
226
+ - `overallScore` must be a number
227
+ - Missing optional fields (`teacherEngagement`, arrays) default to empty values
228
+
229
+ ---
230
+
231
+ ## Modifying Evaluation Prompts
232
+
233
+ ### Prompt Locations
234
+
235
+ | Prompt | Variable | Purpose |
236
+ |--------|----------|---------|
237
+ | Student evaluation | `EVALUATION_SYSTEM_PROMPT` | Evaluates student conversations |
238
+ | Coach-direct evaluation | `COACH_DIRECT_EVALUATION_SYSTEM_PROMPT` | Evaluates coach-only conversations |
239
+
240
+ File: `src/lib/services/evaluation-service.ts` (lines 21-200)
241
+
242
+ ### What You CAN Change
243
+
244
+ - Evaluation criteria descriptions
245
+ - Score weighting explanations
246
+ - Chinese text and examples
247
+ - `teacherEngagement.level` thresholds
248
+
249
+ ### What You MUST Keep
250
+
251
+ - The JSON structure defined in "Response Schema" section above
252
+ - Field names (e.g., `promptDesign.clarity`, `conversationQuality.overall`)
253
+ - Score range (1-5)
254
+ - The instruction: `只返回有效的 JSON,不要其他文字`
255
+
256
+ ### Feedback Field Audience
257
+
258
+ `improvementAreas` and `promptSuggestions` are for **prompt engineers**, not teachers:
259
+
260
+ - ❌ Wrong: "老師可以多用開放式問句"
261
+ - ✅ Correct: "系統提示應增加學生對教師特定技巧的回應指示"
262
+
263
+ ---
264
+
265
+ ## Conversation Type Handling
266
+
267
+ ### Student Conversations
268
+
269
+ **Condition:** `studentPromptId !== 'coach_direct'`
270
+
271
+ **Evaluation Focus:**
272
+ - Prompt design quality (weight: 20%)
273
+ - Training effectiveness (weight: 20%)
274
+ - Teacher experience (weight: 60% - highest priority)
275
+
276
+ **Message Labeling:**
277
+ - User → "老師" (Teacher)
278
+ - Assistant with `speaker === 'student'` → "學生" (Student)
279
+ - Assistant with `speaker === 'coach'` → "教練" (Coach)
280
+
281
+ **Overall Score Calculation:**
282
+ ```
283
+ overallScore = 0.2 × promptDesign.overall
284
+ + 0.2 × trainingEffectiveness.overall
285
+ + 0.6 × conversationQuality.overall
286
+ ```
287
+
288
+ ### Coach-Direct Conversations
289
+
290
+ **Condition:** `studentPromptId === 'coach_direct'`
291
+
292
+ **Evaluation Focus:**
293
+ - Coach guidance quality (weight: 50%)
294
+ - Teacher learning effectiveness (weight: 50%)
295
+
296
+ **Message Labeling:**
297
+ - User → "老師" (Teacher)
298
+ - Assistant → "教練" (Coach)
299
+
300
+ **Overall Score Calculation:**
301
+ ```
302
+ overallScore = 0.5 × promptDesign.overall
303
+ + 0.5 × trainingEffectiveness.overall
304
+ ```
305
+
306
+ ---
307
+
308
+ ## Scoring System
309
+
310
+ All scores are 1-5 (1=Poor, 5=Excellent). See "Response Schema" section for field names.
311
+
312
+ ### Score Weights
313
+
314
+ | Mode | promptDesign | trainingEffectiveness | conversationQuality |
315
+ |------|--------------|----------------------|---------------------|
316
+ | Student | 20% | 20% | 60% |
317
+ | Coach-direct | 50% | 50% | (included in feedback) |
318
+
319
+ ### Score Color Coding (UI)
320
+
321
+ | Score Range | Color | Meaning |
322
+ |-------------|-------|---------|
323
+ | ≥ 4.0 | Green | Excellent |
324
+ | 3.0 - 3.9 | Yellow | Good |
325
+ | < 3.0 | Red | Needs Improvement |
326
+
327
+ ---
328
+
329
+ ## Feedback Structure
330
+
331
+ See "Response Schema" section for full field structure.
332
+
333
+ ### Teacher Engagement Levels
334
+
335
+ | Level | Description |
336
+ |-------|-------------|
337
+ | `high` | Actively engaged, meaningful questions |
338
+ | `medium` | Participated but shallow interaction |
339
+ | `low` | Low participation, brief responses |
340
+ | `none` | Meaningless input (random numbers, gibberish) |
341
+
342
+ **Low scores are triggered by:**
343
+ - Random numbers or gibberish input
344
+ - Conversations too short
345
+ - Teacher clearly not serious
346
+
347
+ ---
348
+
349
+ ## Database Schema
350
+
351
+ ### evaluations Table
352
+
353
+ ```sql
354
+ CREATE TABLE IF NOT EXISTS evaluations (
355
+ id TEXT PRIMARY KEY,
356
+ conversation_id TEXT NOT NULL,
357
+ student_prompt_id TEXT,
358
+ evaluation_type TEXT NOT NULL,
359
+ model_used TEXT NOT NULL,
360
+ evaluated_at TEXT NOT NULL,
361
+ evaluated_by TEXT,
362
+ overall_score REAL,
363
+ scores TEXT NOT NULL, -- JSON string
364
+ feedback TEXT NOT NULL, -- JSON string
365
+ raw_response TEXT,
366
+ created_at TEXT NOT NULL
367
+ );
368
+ ```
369
+
370
+ ### Indexes
371
+
372
+ ```sql
373
+ CREATE INDEX idx_evaluations_conversation ON evaluations(conversation_id);
374
+ CREATE INDEX idx_evaluations_type ON evaluations(evaluation_type);
375
+ CREATE INDEX idx_evaluations_prompt ON evaluations(student_prompt_id);
376
+ CREATE INDEX idx_evaluations_score ON evaluations(overall_score);
377
+ CREATE INDEX idx_evaluations_date ON evaluations(evaluated_at);
378
+ ```
379
+
380
+ ---
381
+
382
+ ## TypeScript Types
383
+
384
+ See `src/lib/types/models.ts` (lines 83-150) for full type definitions.
385
+
386
+ ### Evaluation (main object)
387
+
388
+ ```typescript
389
+ interface Evaluation {
390
+ id: string;
391
+ conversationId: string;
392
+ studentPromptId?: string;
393
+ evaluationType: 'conversation_quality';
394
+ evaluationMode?: 'student' | 'coach_direct'; // Derived from studentPromptId
395
+ modelUsed: string;
396
+ evaluatedAt: string;
397
+ evaluatedBy?: string;
398
+ overallScore?: number;
399
+ scores: EvaluationScores; // See "Response Schema" section
400
+ feedback: EvaluationFeedback; // See "Response Schema" section
401
+ rawResponse?: string;
402
+ createdAt: string;
403
+ }
404
+ ```
405
+
406
+ ---
407
+
408
+ ## Notes
409
+
410
+ - `evaluationMode` is **not stored** in the database - it's derived from `studentPromptId` at read time
411
+ - Raw AI response is stored for debugging and auditing purposes
412
+ - Existing evaluations are skipped unless `force=true` is passed
413
+ - Batch evaluations continue on individual failures (doesn't stop on first error)
src/app/admin/conversations/[conversationId]/page.tsx CHANGED
@@ -416,7 +416,7 @@ export default function AdminConversationDetailPage() {
416
  </div>
417
 
418
  <div className="p-4 bg-yellow-50 rounded-lg border border-yellow-200">
419
- <h4 className="text-sm font-semibold text-yellow-800 mb-2">Prompt Improvements</h4>
420
  {evaluation.feedback.improvementAreas.length > 0 ? (
421
  <ul className="text-sm text-yellow-700 space-y-1">
422
  {evaluation.feedback.improvementAreas.map((a, i) => (
@@ -435,7 +435,7 @@ export default function AdminConversationDetailPage() {
435
  {/* Prompt Suggestions */}
436
  {evaluation.feedback.promptSuggestions.length > 0 && (
437
  <div className="p-4 bg-blue-50 rounded-lg border border-blue-200">
438
- <h4 className="text-sm font-semibold text-blue-800 mb-2">Prompt Suggestions</h4>
439
  <ul className="text-sm text-blue-700 space-y-1">
440
  {evaluation.feedback.promptSuggestions.map((s, i) => (
441
  <li key={i} className="flex items-start gap-2">
 
416
  </div>
417
 
418
  <div className="p-4 bg-yellow-50 rounded-lg border border-yellow-200">
419
+ <h4 className="text-sm font-semibold text-yellow-800 mb-2">System Prompt Issues</h4>
420
  {evaluation.feedback.improvementAreas.length > 0 ? (
421
  <ul className="text-sm text-yellow-700 space-y-1">
422
  {evaluation.feedback.improvementAreas.map((a, i) => (
 
435
  {/* Prompt Suggestions */}
436
  {evaluation.feedback.promptSuggestions.length > 0 && (
437
  <div className="p-4 bg-blue-50 rounded-lg border border-blue-200">
438
+ <h4 className="text-sm font-semibold text-blue-800 mb-2">System Prompt Suggestions</h4>
439
  <ul className="text-sm text-blue-700 space-y-1">
440
  {evaluation.feedback.promptSuggestions.map((s, i) => (
441
  <li key={i} className="flex items-start gap-2">
src/app/admin/conversations/page.tsx CHANGED
@@ -76,6 +76,26 @@ export default function AdminConversationsPage() {
76
  setPage(1); // Reset to first page when filters change
77
  };
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  return (
80
  <div className="p-4 md:p-8">
81
  {/* Header */}
@@ -274,13 +294,12 @@ export default function AdminConversationsPage() {
274
  >
275
  View Messages
276
  </Link>
277
- <a
278
- href={`/api/admin/conversations/${conv.id}/export/tsv`}
279
  className="text-green-600 hover:text-green-900"
280
- download
281
  >
282
  Download
283
- </a>
284
  </div>
285
  </td>
286
  </tr>
@@ -338,13 +357,12 @@ export default function AdminConversationsPage() {
338
  >
339
  View Messages
340
  </Link>
341
- <a
342
- href={`/api/admin/conversations/${conv.id}/export/tsv`}
343
  className="flex-1 text-center px-4 py-2 bg-green-600 text-white rounded-lg hover:bg-green-700"
344
- download
345
  >
346
  Download
347
- </a>
348
  </div>
349
  </div>
350
  ))
 
76
  setPage(1); // Reset to first page when filters change
77
  };
78
 
79
+ const handleDownload = async (conversationId: string, title: string) => {
80
+ try {
81
+ const response = await adminFetch(`/api/admin/conversations/${conversationId}/export/tsv`);
82
+ if (!response.ok) {
83
+ throw new Error('Failed to download conversation');
84
+ }
85
+ const blob = await response.blob();
86
+ const url = URL.createObjectURL(blob);
87
+ const a = document.createElement('a');
88
+ a.href = url;
89
+ const sanitizedTitle = (title || conversationId).replace(/[<>:"/\\|?*\x00-\x1F]/g, '_').replace(/\s+/g, '_').substring(0, 100);
90
+ a.download = `conversation_${sanitizedTitle}_${new Date().toISOString().split('T')[0]}.tsv`;
91
+ a.click();
92
+ URL.revokeObjectURL(url);
93
+ } catch (err) {
94
+ console.error('Download error:', err);
95
+ alert('Failed to download conversation. Please try again.');
96
+ }
97
+ };
98
+
99
  return (
100
  <div className="p-4 md:p-8">
101
  {/* Header */}
 
294
  >
295
  View Messages
296
  </Link>
297
+ <button
298
+ onClick={() => handleDownload(conv.id, conv.title)}
299
  className="text-green-600 hover:text-green-900"
 
300
  >
301
  Download
302
+ </button>
303
  </div>
304
  </td>
305
  </tr>
 
357
  >
358
  View Messages
359
  </Link>
360
+ <button
361
+ onClick={() => handleDownload(conv.id, conv.title)}
362
  className="flex-1 text-center px-4 py-2 bg-green-600 text-white rounded-lg hover:bg-green-700"
 
363
  >
364
  Download
365
+ </button>
366
  </div>
367
  </div>
368
  ))
src/app/admin/evaluations/[id]/page.tsx CHANGED
@@ -369,7 +369,7 @@ export default function EvaluationDetailPage({
369
  {/* Improvement Areas */}
370
  <div className="bg-white rounded-lg shadow p-6">
371
  <h3 className="text-lg font-semibold text-gray-900 mb-4 flex items-center gap-2">
372
- <span className="text-yellow-500">!</span> {evaluation.evaluationMode === 'coach_direct' ? '教練改進空間' : 'Prompt Improvements'}
373
  </h3>
374
  {evaluation.feedback.improvementAreas.length > 0 ? (
375
  <ul className="space-y-2">
@@ -389,7 +389,7 @@ export default function EvaluationDetailPage({
389
  {/* Prompt Suggestions */}
390
  <div className="bg-blue-50 border border-blue-200 rounded-lg p-6 mb-6">
391
  <h3 className="text-lg font-semibold text-blue-900 mb-4 flex items-center gap-2">
392
- <span>💡</span> {evaluation.evaluationMode === 'coach_direct' ? '教練提示改進建議' : 'Prompt Improvement Suggestions'}
393
  </h3>
394
  {evaluation.feedback.promptSuggestions.length > 0 ? (
395
  <ul className="space-y-2">
 
369
  {/* Improvement Areas */}
370
  <div className="bg-white rounded-lg shadow p-6">
371
  <h3 className="text-lg font-semibold text-gray-900 mb-4 flex items-center gap-2">
372
+ <span className="text-yellow-500">!</span> {evaluation.evaluationMode === 'coach_direct' ? '教練改進空間' : 'System Prompt Issues'}
373
  </h3>
374
  {evaluation.feedback.improvementAreas.length > 0 ? (
375
  <ul className="space-y-2">
 
389
  {/* Prompt Suggestions */}
390
  <div className="bg-blue-50 border border-blue-200 rounded-lg p-6 mb-6">
391
  <h3 className="text-lg font-semibold text-blue-900 mb-4 flex items-center gap-2">
392
+ <span>💡</span> {evaluation.evaluationMode === 'coach_direct' ? '教練提示改進建議' : 'System Prompt Suggestions (For Prompt Engineers)'}
393
  </h3>
394
  {evaluation.feedback.promptSuggestions.length > 0 ? (
395
  <ul className="space-y-2">
src/app/api/conversations/create/route.ts CHANGED
@@ -55,6 +55,20 @@ export async function POST(request: NextRequest) {
55
  systemPrompt = await promptService.getStudentPrompt(studentPromptId);
56
  }
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  // Get coach info
59
  const coach = await promptService.getCoachPrompt(coachPromptId);
60
 
@@ -82,7 +96,7 @@ export async function POST(request: NextRequest) {
82
  userId,
83
  studentPromptId,
84
  coachPromptId as CoachType,
85
- title,
86
  summary,
87
  systemPrompt
88
  );
 
55
  systemPrompt = await promptService.getStudentPrompt(studentPromptId);
56
  }
57
 
58
+ // Generate default title from student prompt if none provided
59
+ let defaultTitle = title;
60
+ if (!defaultTitle && studentPromptId !== 'coach_direct' && studentConfig) {
61
+ // Extract short context from description (last phrase after comma, or full if short)
62
+ const desc = studentConfig.description;
63
+ const shortContext = desc.includes(',')
64
+ ? desc.split(',').slice(-1)[0].substring(0, 20)
65
+ : desc.substring(0, 20);
66
+ defaultTitle = `${studentConfig.name} - ${shortContext}`;
67
+ }
68
+ if (!defaultTitle && studentPromptId === 'coach_direct') {
69
+ defaultTitle = '教練對話';
70
+ }
71
+
72
  // Get coach info
73
  const coach = await promptService.getCoachPrompt(coachPromptId);
74
 
 
96
  userId,
97
  studentPromptId,
98
  coachPromptId as CoachType,
99
+ defaultTitle,
100
  summary,
101
  systemPrompt
102
  );
src/lib/services/evaluation-service.ts CHANGED
@@ -91,11 +91,16 @@ overallScore = 0.5 × coachingQuality.overall + 0.5 × teacherLearning.overall
91
  "rationale": "<對話品質說明>"
92
  },
93
  "overallScore": <數字 1-5,按權重計算>,
94
- "strengths": ["<教練的優點>"],
95
- "improvementAreas": ["<教練需要改進地方>"],
96
- "promptSuggestions": ["<具體的教練提示修改建議>"]
97
  }
98
 
 
 
 
 
 
99
  **teacherEngagement.level 判斷標準:**
100
  - "high": 教師積極參與,提出有意義的問題和回應
101
  - "medium": 教師有參與但互動較淺
@@ -176,11 +181,16 @@ overallScore = 0.2 × promptDesign.overall + 0.2 × trainingEffectiveness.overal
176
  "rationale": "<教師體驗說明 - 描述教師在對話中的感受和學習>"
177
  },
178
  "overallScore": <數字 1-5,按權重計算>,
179
- "strengths": ["<提示的優點>"],
180
- "improvementAreas": ["<提示需要改進地方>"],
181
- "promptSuggestions": ["<具體的提示修改建議>"]
182
  }
183
 
 
 
 
 
 
184
  **teacherEngagement.level 判斷標準:**
185
  - "high": 教師積極參與,提出有意義的問題和回應
186
  - "medium": 教師有參與但互動較淺
 
91
  "rationale": "<對話品質說明>"
92
  },
93
  "overallScore": <數字 1-5,按權重計算>,
94
+ "strengths": ["<教練系統提示的優點 - 系統提示中有效的設計元素>"],
95
+ "improvementAreas": ["<教練系統提示的技術問題 - 提示工程師需要修正提示詞問題,不是對教師的建議>"],
96
+ "promptSuggestions": ["<具體的系統提示修改建議 - 提供給提示工程師的具體文字/指令修改方案>"]
97
  }
98
 
99
+ **重要:improvementAreas 和 promptSuggestions 是給提示工程師的技術建議,用於改進 AI 系統提示,而非給教師的建議。**
100
+
101
+ 錯誤範例(對教師的建議):「老師可以多用開放式問句」「建議老師使用更同理的語氣」
102
+ 正確範例(對提示工程師的建議):「系統提示應增加教練主動追問的指示」「建議在提示詞中加入更多情境判斷的引導語句」
103
+
104
  **teacherEngagement.level 判斷標準:**
105
  - "high": 教師積極參與,提出有意義的問題和回應
106
  - "medium": 教師有參與但互動較淺
 
181
  "rationale": "<教師體驗說明 - 描述教師在對話中的感受和學習>"
182
  },
183
  "overallScore": <數字 1-5,按權重計算>,
184
+ "strengths": ["<學生系統提示的優點 - 系統提示中有效的設計元素>"],
185
+ "improvementAreas": ["<學生系統提示的技術問題 - 提示工程師需要修正提示詞問題,不是對教師的建議>"],
186
+ "promptSuggestions": ["<具體的系統提示修改建議 - 提供給提示工程師的具體文字/指令修改方案>"]
187
  }
188
 
189
+ **重要:improvementAreas 和 promptSuggestions 是給提示工程師的技術建議,用於改進 AI 學生系統提示,而非給教師的建議。**
190
+
191
+ 錯誤範例(對教師的建議):「老師可以多用開放式問句」「建議老師使用更同理的語氣」「老師的談話多停在鼓勵」
192
+ 正確範例(對提示工程師的建議):「系統提示應增加學生對教師特定技巧的回應指示」「建議在提示詞中加入階段轉換的明確標記」「提示詞應指示AI學生在教師使用開放式問句時展現更多情緒開放」
193
+
194
  **teacherEngagement.level 判斷標準:**
195
  - "high": 教師積極參與,提出有意義的問題和回應
196
  - "medium": 教師有參與但互動較淺