ecuartasm Claude Opus 4.6 commited on
Commit
217abc3
·
0 Parent(s):

Initial commit: AI Course Assessment Generator

Browse files

Quiz generator application that creates learning objectives and multiple-choice
questions from course materials using OpenAI models with structured output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

.gitignore ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ .venv/
10
+ ENV/
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ *.egg-info/
24
+ .installed.cfg
25
+ *.egg
26
+ results/
27
+ incorrect_suggestion_debug/
28
+ Data/
29
+
30
+ # VS Code
31
+ .vscode/
32
+ *.code-workspace
33
+
34
+ # Environment variables
35
+ .env
36
+ .env.local
37
+
38
+ # Claude Code
39
+ .claude/
40
+
41
+ # OS
42
+ .DS_Store
43
+ Thumbs.db
44
+
45
+ # Logs
46
+ *.log
APP_FUNCTIONALITY_REPORT.md ADDED
@@ -0,0 +1,2035 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Course Assessment Generator - Functionality Report
2
+
3
+ ## Table of Contents
4
+ 1. [Overview](#overview)
5
+ 2. [System Architecture](#system-architecture)
6
+ 3. [Data Models](#data-models)
7
+ 4. [Application Entry Point](#application-entry-point)
8
+ 5. [User Interface Structure](#user-interface-structure)
9
+ 6. [Complete Workflow](#complete-workflow)
10
+ 7. [Detailed Component Functionality](#detailed-component-functionality)
11
+ 8. [Quality Standards and Prompts](#quality-standards-and-prompts)
12
+
13
+ ---
14
+
15
+ ## Overview
16
+
17
+ The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels.
18
+
19
+ ### Key Capabilities
20
+ - **Multi-format Content Processing**: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks)
21
+ - **AI-Powered Generation**: Uses OpenAI's GPT models with configurable parameters
22
+ - **Quality Assurance**: Implements LLM-based quality assessment and ranking
23
+ - **Source Tracking**: Maintains XML-tagged references from source materials to generated content
24
+ - **Iterative Improvement**: Supports feedback-based regeneration and enhancement
25
+ - **Parallel Processing**: Generates questions concurrently for improved performance
26
+
27
+ ---
28
+
29
+ ## System Architecture
30
+
31
+ ### Architectural Patterns
32
+
33
+ #### 1. **Orchestrator Pattern**
34
+ Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.
35
+
36
+ #### 2. **Modular Prompt System**
37
+ The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks.
38
+
39
+ #### 3. **Structured Output Generation**
40
+ All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API.
41
+
42
+ #### 4. **Source Tracking via XML Tags**
43
+ Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.
44
+
45
+ ### Technology Stack
46
+ - **Python 3.8+**
47
+ - **Gradio 5.29.0+**: Web-based UI framework
48
+ - **Pydantic 2.8.0+**: Data validation and schema management
49
+ - **OpenAI 1.52.0+**: LLM API integration
50
+ - **Instructor 1.7.9+**: Structured output generation
51
+ - **nbformat 5.9.2**: Jupyter notebook parsing
52
+ - **python-dotenv 1.0.0**: Environment variable management
53
+
54
+ ---
55
+
56
+ ## Data Models
57
+
58
+ ### Learning Objectives Progression
59
+
60
+ The system uses a hierarchical progression of learning objective models:
61
+
62
+ #### 1. **BaseLearningObjectiveWithoutCorrectAnswer**
63
+ ```python
64
+ - id: int
65
+ - learning_objective: str
66
+ - source_reference: Union[List[str], str]
67
+ ```
68
+ Initial generation without correct answers.
69
+
70
+ #### 2. **BaseLearningObjective**
71
+ ```python
72
+ - id: int
73
+ - learning_objective: str
74
+ - source_reference: Union[List[str], str]
75
+ - correct_answer: str
76
+ ```
77
+ Base objectives with correct answers added.
78
+
79
+ #### 3. **LearningObjective**
80
+ ```python
81
+ - id: int
82
+ - learning_objective: str
83
+ - source_reference: Union[List[str], str]
84
+ - correct_answer: str
85
+ - incorrect_answer_options: Union[List[str], str]
86
+ - in_group: Optional[bool]
87
+ - group_members: Optional[List[int]]
88
+ - best_in_group: Optional[bool]
89
+ ```
90
+ Enhanced with incorrect answer suggestions and grouping metadata.
91
+
92
+ #### 4. **GroupedLearningObjective**
93
+ ```python
94
+ (All fields from LearningObjective)
95
+ - in_group: bool (required)
96
+ - group_members: List[int] (required)
97
+ - best_in_group: bool (required)
98
+ ```
99
+ Fully grouped and ranked objectives.
100
+
101
+ ### Question Models Progression
102
+
103
+ #### 1. **MultipleChoiceOption**
104
+ ```python
105
+ - option_text: str
106
+ - is_correct: bool
107
+ - feedback: str
108
+ ```
109
+
110
+ #### 2. **MultipleChoiceQuestion**
111
+ ```python
112
+ - id: int
113
+ - question_text: str
114
+ - options: List[MultipleChoiceOption]
115
+ - learning_objective_id: int
116
+ - learning_objective: str
117
+ - correct_answer: str
118
+ - source_reference: Union[List[str], str]
119
+ - judge_feedback: Optional[str]
120
+ - approved: Optional[bool]
121
+ ```
122
+
123
+ #### 3. **RankedMultipleChoiceQuestion**
124
+ ```python
125
+ (All fields from MultipleChoiceQuestion)
126
+ - rank: int
127
+ - ranking_reasoning: str
128
+ - in_group: bool
129
+ - group_members: List[int]
130
+ - best_in_group: bool
131
+ ```
132
+
133
+ #### 4. **Assessment**
134
+ ```python
135
+ - learning_objectives: List[LearningObjective]
136
+ - questions: List[RankedMultipleChoiceQuestion]
137
+ ```
138
+ Final output containing both objectives and questions.
139
+
140
+ ### Configuration Models
141
+
142
+ #### **MODELS**
143
+ Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]`
144
+
145
+ #### **TEMPERATURE_UNAVAILABLE**
146
+ Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings).
147
+
148
+ ---
149
+
150
+ ## Application Entry Point
151
+
152
+ ### `app.py`
153
+ The root-level entry point that:
154
+ 1. Loads environment variables from `.env` file
155
+ 2. Checks for `OPENAI_API_KEY` presence
156
+ 3. Creates the Gradio UI via `ui.app.create_ui()`
157
+ 4. Launches the web interface at `http://127.0.0.1:7860`
158
+
159
+ ```python
160
+ # Workflow:
161
+ load_dotenv() → Check API key → create_ui() → app.launch()
162
+ ```
163
+
164
+ ---
165
+
166
+ ## User Interface Structure
167
+
168
+ ### `ui/app.py` - Gradio Interface
169
+
170
+ The UI is organized into **3 main tabs**:
171
+
172
+ #### **Tab 1: Generate Learning Objectives**
173
+
174
+ **Input Components:**
175
+ - File uploader (accepts `.ipynb`, `.vtt`, `.srt`)
176
+ - Number of objectives per run (slider: 1-20, default: 3)
177
+ - Number of generation runs (dropdown: 1-5, default: 3)
178
+ - Model selection (dropdown, default: "gpt-5")
179
+ - Incorrect answer model selection (dropdown, default: "gpt-5")
180
+ - Temperature setting (dropdown: 0.0-1.0, default: 1.0)
181
+ - Generate button
182
+ - Feedback input textbox
183
+ - Regenerate button
184
+
185
+ **Output Components:**
186
+ - Status textbox
187
+ - Best-in-Group Learning Objectives (JSON)
188
+ - All Grouped Learning Objectives (JSON)
189
+ - Raw Ungrouped Learning Objectives (JSON) - for debugging
190
+
191
+ **Event Handler:** `process_files()` from `objective_handlers.py`
192
+
193
+ #### **Tab 2: Generate Questions**
194
+
195
+ **Input Components:**
196
+ - Learning Objectives JSON (auto-populated from Tab 1)
197
+ - Model selection
198
+ - Temperature setting
199
+ - Number of question generation runs (slider: 1-5, default: 1)
200
+ - Generate Questions button
201
+
202
+ **Output Components:**
203
+ - Status textbox
204
+ - Ranked Best-in-Group Questions (JSON)
205
+ - All Grouped Questions (JSON)
206
+ - Formatted Quiz (human-readable format)
207
+
208
+ **Event Handler:** `generate_questions()` from `question_handlers.py`
209
+
210
+ #### **Tab 3: Propose/Edit Question**
211
+
212
+ **Input Components:**
213
+ - Question guidance/feedback textbox
214
+ - Model selection
215
+ - Temperature setting
216
+ - Generate Question button
217
+
218
+ **Output Components:**
219
+ - Status textbox
220
+ - Generated Question (JSON)
221
+
222
+ **Event Handler:** `propose_question_handler()` from `feedback_handlers.py`
223
+
224
+ ---
225
+
226
+ ## Complete Workflow
227
+
228
+ ### Phase 1: File Upload and Content Processing
229
+
230
+ #### Step 1.1: File Upload
231
+ User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface.
232
+
233
+ #### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)
234
+ ```python
235
+ # Handles different input formats:
236
+ - List of file paths
237
+ - Single file path string
238
+ - File objects with .name attribute
239
+ ```
240
+
241
+ #### Step 1.3: Content Processing (`ui/content_processor.py`)
242
+
243
+ **For Subtitle Files (`.vtt`, `.srt`):**
244
+ ```python
245
+ 1. Read file with UTF-8 encoding
246
+ 2. Split into lines
247
+ 3. Filter out:
248
+ - Empty lines
249
+ - Numeric timestamp indicators
250
+ - Lines containing '-->' (timestamps)
251
+ - 'WEBVTT' header lines
252
+ 4. Combine remaining text lines
253
+ 5. Wrap in XML tags: <source file='filename.vtt'>content</source>
254
+ ```
255
+
256
+ **For Jupyter Notebooks (`.ipynb`):**
257
+ ```python
258
+ 1. Validate JSON format
259
+ 2. Parse with nbformat.read()
260
+ 3. Extract from cells:
261
+ - Markdown cells: [Markdown]\n{content}
262
+ - Code cells: [Code]\n```python\n{content}\n```
263
+ 4. Combine all cell content
264
+ 5. Wrap in XML tags: <source file='filename.ipynb'>content</source>
265
+ ```
266
+
267
+ **Error Handling:**
268
+ - Invalid JSON: Wraps raw content in code blocks
269
+ - Parsing failures: Falls back to plain text extraction
270
+ - All errors logged to console
271
+
272
+ #### Step 1.4: State Storage
273
+ Processed content stored in global state (`ui/state.py`):
274
+ ```python
275
+ processed_file_contents = [tagged_content_1, tagged_content_2, ...]
276
+ ```
277
+
278
+ ### Phase 2: Learning Objective Generation
279
+
280
+ #### Step 2.1: Multi-Run Base Generation
281
+
282
+ **Process:** `objective_handlers._generate_multiple_runs()`
283
+
284
+ For each run (user-specified, typically 3 runs):
285
+
286
+ 1. **Call:** `QuizGenerator.generate_base_learning_objectives()`
287
+ 2. **Workflow:**
288
+ ```
289
+ generate_base_learning_objectives()
290
+
291
+ generate_base_learning_objectives_without_correct_answers()
292
+ → Creates prompt with:
293
+ - BASE_LEARNING_OBJECTIVES_PROMPT
294
+ - BLOOMS_TAXONOMY_LEVELS
295
+ - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
296
+ - Combined file contents
297
+ → Calls OpenAI API with structured output
298
+ → Returns List[BaseLearningObjectiveWithoutCorrectAnswer]
299
+
300
+ generate_correct_answers_for_objectives()
301
+ → For each objective:
302
+ - Creates prompt with objective and course content
303
+ - Calls OpenAI API (unstructured text response)
304
+ - Extracts correct answer
305
+ → Returns List[BaseLearningObjective]
306
+ ```
307
+
308
+ 3. **ID Assignment:**
309
+ ```python
310
+ # Temporary IDs by run:
311
+ Run 1: 1001, 1002, 1003
312
+ Run 2: 2001, 2002, 2003
313
+ Run 3: 3001, 3002, 3003
314
+ ```
315
+
316
+ 4. **Aggregation:**
317
+ All objectives from all runs combined into single list.
318
+
319
+ **Example:** 3 runs × 3 objectives = 9 total base objectives
320
+
321
+ #### Step 2.2: Grouping and Ranking
322
+
323
+ **Process:** `objective_handlers._group_base_objectives_add_incorrect_answers()`
324
+
325
+ **Step 2.2.1: Group Base Objectives**
326
+ ```python
327
+ QuizGenerator.group_base_learning_objectives()
328
+
329
+ learning_objective_generator/grouping_and_ranking.py
330
+ → group_base_learning_objectives()
331
+ ```
332
+
333
+ **Grouping Logic:**
334
+ 1. Creates prompt containing:
335
+ - Original generation criteria
336
+ - All base objectives with IDs
337
+ - Course content for context
338
+ - Grouping instructions
339
+
340
+ 2. **Special Rule:** All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective)
341
+
342
+ 3. **LLM Call:**
343
+ - Model: `gpt-5-mini`
344
+ - Response format: `GroupedBaseLearningObjectivesResponse`
345
+ - Returns: Grouped objectives with metadata
346
+
347
+ 4. **Output Structure:**
348
+ ```python
349
+ {
350
+ "all_grouped": [all objectives with group metadata],
351
+ "best_in_group": [objectives marked as best in their groups]
352
+ }
353
+ ```
354
+
355
+ **Step 2.2.2: ID Reassignment** (`_reassign_objective_ids()`)
356
+ ```python
357
+ 1. Find best objective from the 001 group
358
+ 2. Assign it ID = 1
359
+ 3. Assign remaining objectives IDs starting from 2
360
+ ```
361
+
362
+ **Step 2.2.3: Generate Incorrect Answer Options**
363
+
364
+ Only for **best-in-group** objectives:
365
+
366
+ ```python
367
+ QuizGenerator.generate_lo_incorrect_answer_options()
368
+
369
+ learning_objective_generator/enhancement.py
370
+ → generate_incorrect_answer_options()
371
+ ```
372
+
373
+ **Process:**
374
+ 1. For each best-in-group objective:
375
+ - Creates prompt with:
376
+ - Objective and correct answer
377
+ - INCORRECT_ANSWER_PROMPT guidelines
378
+ - INCORRECT_ANSWER_EXAMPLES
379
+ - Course content
380
+ - Calls OpenAI API (with optional model override)
381
+ - Generates 5 plausible incorrect answer options
382
+
383
+ 2. **Returns:** `List[LearningObjective]` with incorrect_answer_options populated
384
+
385
+ **Step 2.2.4: Improve Incorrect Answers**
386
+
387
+ ```python
388
+ learning_objective_generator.regenerate_incorrect_answers()
389
+
390
+ learning_objective_generator/suggestion_improvement.py
391
+ ```
392
+
393
+ **Quality Check Process:**
394
+ 1. For each objective's incorrect answers:
395
+ - Checks for red flags (contradictory phrases, absolute terms)
396
+ - Examples of red flags:
397
+ - "but not necessarily"
398
+ - "at the expense of"
399
+ - "rather than"
400
+ - "always", "never", "exclusively"
401
+
402
+ 2. If problems found:
403
+ - Logs issue to `incorrect_suggestion_debug/` directory
404
+ - Regenerates incorrect answers with additional constraints
405
+ - Updates objective with improved answers
406
+
407
+ **Step 2.2.5: Final Assembly**
408
+
409
+ Creates final list where:
410
+ - Best-in-group objectives have enhanced incorrect answers
411
+ - Non-best-in-group objectives have empty `incorrect_answer_options: []`
412
+
413
+ #### Step 2.3: Display Results
414
+
415
+ **Three output formats:**
416
+
417
+ 1. **Best-in-Group Objectives** (primary output):
418
+ - Only objectives marked as best_in_group
419
+ - Includes incorrect answer options
420
+ - Sorted by ID
421
+ - Formatted as JSON
422
+
423
+ 2. **All Grouped Objectives**:
424
+ - All objectives with grouping metadata
425
+ - Shows group_members arrays
426
+ - Best-in-group flags visible
427
+
428
+ 3. **Raw Ungrouped** (debug):
429
+ - Original objectives from all runs
430
+ - No grouping metadata
431
+ - Original temporary IDs
432
+
433
+ #### Step 2.4: State Update
434
+ ```python
435
+ set_learning_objectives(grouped_result["all_grouped"])
436
+ set_processed_contents(file_contents) # Already set, but persisted
437
+ ```
438
+
439
+ ### Phase 3: Question Generation
440
+
441
+ #### Step 3.1: Parse Learning Objectives
442
+
443
+ **Process:** `question_handlers._parse_learning_objectives()`
444
+
445
+ ```python
446
+ 1. Parse JSON from Tab 1 output
447
+ 2. Create LearningObjective objects from dictionaries
448
+ 3. Validate required fields
449
+ 4. Return List[LearningObjective]
450
+ ```
451
+
452
+ #### Step 3.2: Multi-Run Question Generation
453
+
454
+ **Process:** `question_handlers._generate_questions_multiple_runs()`
455
+
456
+ For each run (user-specified, typically 1 run):
457
+
458
+ ```python
459
+ QuizGenerator.generate_questions_in_parallel()
460
+
461
+ quiz_generator/assessment.py
462
+ → generate_questions_in_parallel()
463
+ ```
464
+
465
+ **Parallel Generation Process:**
466
+
467
+ 1. **Thread Pool Setup:**
468
+ ```python
469
+ max_workers = min(len(learning_objectives), 5)
470
+ ThreadPoolExecutor(max_workers=max_workers)
471
+ ```
472
+
473
+ 2. **For Each Learning Objective (in parallel):**
474
+
475
+ **Step 3.2.1: Question Generation** (`quiz_generator/question_generation.py`)
476
+
477
+ ```python
478
+ generate_multiple_choice_question()
479
+ ```
480
+
481
+ **a) Source Content Matching:**
482
+ ```python
483
+ - Extract source_reference from objective
484
+ - Search file_contents for matching XML tags
485
+ - Exact match: <source file='filename.vtt'>
486
+ - Fallback: Partial filename match
487
+ - Last resort: Use all file contents combined
488
+ ```
489
+
490
+ **b) Multi-Source Handling:**
491
+ ```python
492
+ if len(source_references) > 1:
493
+ Add special instruction:
494
+ "Question should synthesize information across sources"
495
+ ```
496
+
497
+ **c) Prompt Construction:**
498
+ ```python
499
+ Combines:
500
+ - Learning objective
501
+ - Correct answer
502
+ - Incorrect answer options from objective
503
+ - GENERAL_QUALITY_STANDARDS
504
+ - MULTIPLE_CHOICE_STANDARDS
505
+ - EXAMPLE_QUESTIONS
506
+ - QUESTION_SPECIFIC_QUALITY_STANDARDS
507
+ - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
508
+ - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
509
+ - ANSWER_FEEDBACK_QUALITY_STANDARDS
510
+ - Matched course content
511
+ ```
512
+
513
+ **d) API Call:**
514
+ ```python
515
+ - Model: User-selected (default: gpt-5)
516
+ - Temperature: User-selected (if supported by model)
517
+ - Response format: MultipleChoiceQuestion
518
+ - Returns: Question with 4 options, each with feedback
519
+ ```
520
+
521
+ **e) Post-Processing:**
522
+ ```python
523
+ - Set question ID = learning_objective ID
524
+ - Verify all options have feedback
525
+ - Add default feedback if missing
526
+ ```
527
+
528
+ **Step 3.2.2: Quality Assessment** (`quiz_generator/question_improvement.py`)
529
+
530
+ ```python
531
+ judge_question_quality()
532
+ ```
533
+
534
+ **Quality Judging Process:**
535
+ ```python
536
+ 1. Creates evaluation prompt with:
537
+ - Question text and all options
538
+ - Quality criteria from prompts
539
+ - Evaluation instructions
540
+
541
+ 2. LLM evaluates question for:
542
+ - Clarity and unambiguity
543
+ - Alignment with learning objective
544
+ - Quality of incorrect options
545
+ - Feedback quality
546
+ - Appropriate difficulty
547
+
548
+ 3. Returns:
549
+ - approved: bool
550
+ - feedback: str (reasoning for judgment)
551
+
552
+ 4. Updates question:
553
+ question.approved = approved
554
+ question.judge_feedback = feedback
555
+ ```
556
+
557
+ 3. **Results Collection:**
558
+ ```python
559
+ - Questions collected as futures complete
560
+ - IDs assigned sequentially across runs
561
+ - All questions aggregated into single list
562
+ ```
563
+
564
+ **Example:** 3 objectives × 1 run = 3 questions generated in parallel
565
+
566
+ #### Step 3.3: Grouping Questions
567
+
568
+ **Process:** `quiz_generator/question_ranking.py → group_questions()`
569
+
570
+ ```python
571
+ 1. Creates prompt with:
572
+ - All generated questions
573
+ - Grouping instructions
574
+ - Example format
575
+
576
+ 2. LLM identifies:
577
+ - Questions testing same concept (same learning_objective_id)
578
+ - Groups of similar questions
579
+ - Best question in each group
580
+
581
+ 3. Model: gpt-5-mini
582
+ Response format: GroupedMultipleChoiceQuestionsResponse
583
+
584
+ 4. Returns:
585
+ {
586
+ "grouped": [all questions with group metadata],
587
+ "best_in_group": [best questions from each group]
588
+ }
589
+ ```
590
+
591
+ #### Step 3.4: Ranking Questions
592
+
593
+ **Process:** `quiz_generator/question_ranking.py → rank_questions()`
594
+
595
+ **Only ranks best-in-group questions:**
596
+
597
+ ```python
598
+ 1. Creates prompt with:
599
+ - RANK_QUESTIONS_PROMPT
600
+ - All quality standards
601
+ - Best-in-group questions only
602
+ - Course content for context
603
+
604
+ 2. Ranking Criteria:
605
+ - Question clarity and unambiguity
606
+ - Alignment with learning objective
607
+ - Quality of incorrect options
608
+ - Feedback quality
609
+ - Appropriate difficulty (prefers simple English)
610
+ - Adherence to all guidelines
611
+ - Avoidance of absolute terms
612
+
613
+ 3. Special Instructions:
614
+ - NEVER change question with ID=1
615
+ - Each question gets unique rank (2, 3, 4, ...)
616
+ - Rank 1 is reserved
617
+ - All questions must be returned
618
+
619
+ 4. Model: User-selected
620
+ Response format: RankedMultipleChoiceQuestionsResponse
621
+
622
+ 5. Returns:
623
+ {
624
+ "ranked": [questions with rank and ranking_reasoning]
625
+ }
626
+ ```
627
+
628
+ #### Step 3.5: Format Results
629
+
630
+ **Process:** `question_handlers._format_question_results()`
631
+
632
+ **Three outputs:**
633
+
634
+ 1. **Best-in-Group Ranked Questions:**
635
+ ```python
636
+ - Sorted by rank
637
+ - Includes all question data
638
+ - Includes rank and ranking_reasoning
639
+ - Includes group metadata
640
+ - Formatted as JSON
641
+ ```
642
+
643
+ 2. **All Grouped Questions:**
644
+ ```python
645
+ - All questions with group metadata
646
+ - No ranking information
647
+ - Shows which questions are in groups
648
+ - Formatted as JSON
649
+ ```
650
+
651
+ 3. **Formatted Quiz:**
652
+ ```python
653
+ format_quiz_for_ui() creates human-readable format:
654
+
655
+ **Question 1 [Rank: 2]:** What is...
656
+
657
+ Ranking Reasoning: ...
658
+
659
+ • A [Correct]: Option text
660
+ ◦ Feedback: Correct feedback
661
+
662
+ • B: Option text
663
+ ◦ Feedback: Incorrect feedback
664
+
665
+ [continues for all questions]
666
+ ```
667
+
668
+ ### Phase 4: Custom Question Generation (Optional)
669
+
670
+ **Tab 3 Workflow:**
671
+
672
+ #### Step 4.1: User Input
673
+ User provides:
674
+ - Free-form guidance/feedback text
675
+ - Model selection
676
+ - Temperature setting
677
+
678
+ #### Step 4.2: Generation
679
+
680
+ **Process:** `feedback_handlers.propose_question_handler()`
681
+
682
+ ```python
683
+ QuizGenerator.generate_multiple_choice_question_from_feedback()
684
+
685
+ quiz_generator/feedback_questions.py
686
+ ```
687
+
688
+ **Workflow:**
689
+ ```python
690
+ 1. Retrieves processed file contents from state
691
+
692
+ 2. Creates prompt combining:
693
+ - User feedback/guidance
694
+ - All quality standards
695
+ - Course content
696
+ - Generation criteria
697
+
698
+ 3. Model generates:
699
+ - Single question
700
+ - With learning objective inferred from guidance
701
+ - 4 options with feedback
702
+ - Source references
703
+
704
+ 4. Returns: MultipleChoiceQuestionFromFeedback object
705
+ (includes user feedback as metadata)
706
+
707
+ 5. Formatted as JSON for display
708
+ ```
709
+
710
+ ### Phase 5: Assessment Export (Automated)
711
+
712
+ The final assessment can be saved using:
713
+
714
+ ```python
715
+ QuizGenerator.save_assessment_to_json()
716
+
717
+ quiz_generator/assessment.py → save_assessment_to_json()
718
+ ```
719
+
720
+ **Process:**
721
+ ```python
722
+ 1. Convert Assessment object to dictionary
723
+ assessment_dict = assessment.model_dump()
724
+
725
+ 2. Write to JSON file with indent=2
726
+ Default filename: "assessment.json"
727
+
728
+ 3. Contains:
729
+ - All learning objectives (best-in-group)
730
+ - All ranked questions
731
+ - Complete metadata
732
+ ```
733
+
734
+ ---
735
+
736
+ ## Detailed Component Functionality
737
+
738
+ ### Content Processor (`ui/content_processor.py`)
739
+
740
+ **Class: `ContentProcessor`**
741
+
742
+ **Methods:**
743
+
744
+ 1. **`process_files(file_paths: List[str]) -> List[str]`**
745
+ - Main entry point for processing multiple files
746
+ - Returns list of XML-tagged content strings
747
+ - Stores results in `self.file_contents`
748
+
749
+ 2. **`process_file(file_path: str) -> List[str]`**
750
+ - Routes to appropriate handler based on file extension
751
+ - Returns single-item list with tagged content
752
+
753
+ 3. **`_process_subtitle_file(file_path: str) -> List[str]`**
754
+ - Filters out timestamps and metadata
755
+ - Preserves actual subtitle text
756
+ - Wraps in `<source file='...'>` tags
757
+
758
+ 4. **`_process_notebook_file(file_path: str) -> List[str]`**
759
+ - Validates JSON structure
760
+ - Parses with nbformat
761
+ - Extracts markdown and code cells
762
+ - Falls back to raw text on parsing errors
763
+ - Wraps in `<source file='...'>` tags
764
+
765
+ ### Learning Objective Generator (`learning_objective_generator/`)
766
+
767
+ #### **generator.py - LearningObjectiveGenerator Class**
768
+
769
+ **Orchestrator that delegates to specialized modules:**
770
+
771
+ **Methods:**
772
+
773
+ 1. **`generate_base_learning_objectives()`**
774
+ - Delegates to `base_generation.py`
775
+ - Returns base objectives with correct answers
776
+
777
+ 2. **`group_base_learning_objectives()`**
778
+ - Delegates to `grouping_and_ranking.py`
779
+ - Groups similar objectives
780
+ - Identifies best in each group
781
+
782
+ 3. **`generate_incorrect_answer_options()`**
783
+ - Delegates to `enhancement.py`
784
+ - Adds 5 incorrect answer suggestions per objective
785
+
786
+ 4. **`regenerate_incorrect_answers()`**
787
+ - Delegates to `suggestion_improvement.py`
788
+ - Quality-checks and improves incorrect answers
789
+
790
+ 5. **`generate_and_group_learning_objectives()`**
791
+ - Complete workflow method
792
+ - Combines: base generation → grouping → incorrect answers
793
+ - Returns dict with all_grouped and best_in_group
794
+
795
+ #### **base_generation.py**
796
+
797
+ **Key Functions:**
798
+
799
+ **`generate_base_learning_objectives()`**
800
+ - Wrapper that calls two separate functions
801
+ - First: Generate objectives without correct answers
802
+ - Second: Generate correct answers for those objectives
803
+
804
+ **`generate_base_learning_objectives_without_correct_answers()`**
805
+
806
+ **Process:**
807
+ ```python
808
+ 1. Extract source filenames from XML tags
809
+ 2. Combine all file contents
810
+ 3. Create prompt with:
811
+ - BASE_LEARNING_OBJECTIVES_PROMPT
812
+ - BLOOMS_TAXONOMY_LEVELS
813
+ - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
814
+ - Course content
815
+ 4. API call:
816
+ - Model: User-selected
817
+ - Temperature: User-selected (if supported)
818
+ - Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse
819
+ 5. Post-process:
820
+ - Assign sequential IDs
821
+ - Normalize source_reference (extract basenames)
822
+ 6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer]
823
+ ```
824
+
825
+ **`generate_correct_answers_for_objectives()`**
826
+
827
+ **Process:**
828
+ ```python
829
+ 1. For each objective without answer:
830
+ - Create prompt with objective + course content
831
+ - Call OpenAI API (text response, not structured)
832
+ - Extract correct answer
833
+ - Create BaseLearningObjective with answer
834
+ 2. Error handling: Add "[Error generating correct answer]" on failure
835
+ 3. Returns: List[BaseLearningObjective]
836
+ ```
837
+
838
+ **Quality Guidelines in Prompt:**
839
+ - Objectives must be assessable via multiple-choice
840
+ - Start with action verbs (identify, describe, define, list, compare)
841
+ - One goal per objective
842
+ - Derived directly from course content
843
+ - Tool/framework agnostic (focus on principles, not specific implementations)
844
+ - First objective should be relatively easy recall question
845
+ - Avoid objectives about "building" or "creating" (not MC-assessable)
846
+
847
+ #### **grouping_and_ranking.py**
848
+
849
+ **Key Functions:**
850
+
851
+ **`group_base_learning_objectives()`**
852
+
853
+ **Process:**
854
+ ```python
855
+ 1. Format objectives for display in prompt
856
+ 2. Create grouping prompt with:
857
+ - Original generation criteria
858
+ - All base objectives
859
+ - Course content
860
+ - Grouping instructions
861
+ 3. Special rule:
862
+ - All objectives with IDs ending in 1 grouped together
863
+ - Best one selected from this group
864
+ - Will become primary objective (ID=1)
865
+ 4. API call:
866
+ - Model: "gpt-5-mini" (hardcoded for efficiency)
867
+ - Response format: GroupedBaseLearningObjectivesResponse
868
+ 5. Post-process:
869
+ - Normalize best_in_group to Python bool
870
+ - Filter for best-in-group objectives
871
+ 6. Returns:
872
+ {
873
+ "all_grouped": List[GroupedBaseLearningObjective],
874
+ "best_in_group": List[GroupedBaseLearningObjective]
875
+ }
876
+ ```
877
+
878
+ **Grouping Criteria:**
879
+ - Topic overlap
880
+ - Similarity of concepts
881
+ - Quality based on original generation criteria
882
+ - Clarity and specificity
883
+ - Alignment with course content
884
+
885
+ #### **enhancement.py**
886
+
887
+ **Key Function: `generate_incorrect_answer_options()`**
888
+
889
+ **Process:**
890
+ ```python
891
+ 1. For each base objective:
892
+ - Create prompt with:
893
+ - Learning objective and correct answer
894
+ - INCORRECT_ANSWER_PROMPT (detailed guidelines)
895
+ - INCORRECT_ANSWER_EXAMPLES
896
+ - Course content
897
+ - Request 5 plausible incorrect options
898
+ 2. API call:
899
+ - Model: model_override or default
900
+ - Temperature: User-selected (if supported)
901
+ - Response format: LearningObjective (includes incorrect_answer_options)
902
+ 3. Returns: List[LearningObjective] with all fields populated
903
+ ```
904
+
905
+ **Incorrect Answer Quality Principles:**
906
+ - Create common misunderstandings
907
+ - Maintain identical structure to correct answer
908
+ - Use course terminology correctly but in wrong contexts
909
+ - Include partially correct information
910
+ - Avoid obviously wrong answers
911
+ - Mirror detail level and style of correct answer
912
+ - Avoid absolute terms ("always", "never", "exclusively")
913
+ - Avoid contradictory second clauses
914
+
915
+ #### **suggestion_improvement.py**
916
+
917
+ **Key Function: `regenerate_incorrect_answers()`**
918
+
919
+ **Process:**
920
+ ```python
921
+ 1. For each learning objective:
922
+ - Call should_regenerate_incorrect_answers()
923
+
924
+ 2. should_regenerate_incorrect_answers():
925
+ - Creates evaluation prompt with:
926
+ - Objective and all incorrect options
927
+ - IMMEDIATE_RED_FLAGS checklist
928
+ - RULES_FOR_SECOND_CLAUSES
929
+ - LLM evaluates each option
930
+ - Returns: needs_regeneration: bool
931
+
932
+ 3. If regeneration needed:
933
+ - Logs to incorrect_suggestion_debug/{id}.txt
934
+ - Creates new prompt with additional constraints
935
+ - Regenerates incorrect answers
936
+ - Validates again
937
+
938
+ 4. Returns: List[LearningObjective] with improved incorrect answers
939
+ ```
940
+
941
+ **Red Flags Checked:**
942
+ - Contradictory second clauses ("but not necessarily")
943
+ - Explicit negations ("without automating")
944
+ - Opposite descriptions ("fixed steps" for flexible systems)
945
+ - Absolute/comparative terms
946
+ - Hedging that creates limitations
947
+ - Trade-off language creating false dichotomies
948
+
949
+ ### Quiz Generator (`quiz_generator/`)
950
+
951
+ #### **generator.py - QuizGenerator Class**
952
+
953
+ **Orchestrator with LearningObjectiveGenerator embedded:**
954
+
955
+ **Initialization:**
956
+ ```python
957
+ def __init__(self, api_key, model="gpt-5", temperature=1.0):
958
+ self.client = OpenAI(api_key=api_key)
959
+ self.model = model
960
+ self.temperature = temperature
961
+ self.learning_objective_generator = LearningObjectiveGenerator(
962
+ api_key=api_key, model=model, temperature=temperature
963
+ )
964
+ ```
965
+
966
+ **Methods (delegates to specialized modules):**
967
+
968
+ 1. **`generate_base_learning_objectives()`** → delegates to LearningObjectiveGenerator
969
+ 2. **`generate_lo_incorrect_answer_options()`** → delegates to LearningObjectiveGenerator
970
+ 3. **`group_base_learning_objectives()`** → delegates to grouping_and_ranking.py
971
+ 4. **`generate_multiple_choice_question()`** → delegates to question_generation.py
972
+ 5. **`generate_questions_in_parallel()`** → delegates to assessment.py
973
+ 6. **`group_questions()`** → delegates to question_ranking.py
974
+ 7. **`rank_questions()`** → delegates to question_ranking.py
975
+ 8. **`judge_question_quality()`** → delegates to question_improvement.py
976
+ 9. **`regenerate_incorrect_answers()`** → delegates to question_improvement.py
977
+ 10. **`generate_multiple_choice_question_from_feedback()`** → delegates to feedback_questions.py
978
+ 11. **`save_assessment_to_json()`** → delegates to assessment.py
979
+
980
+ #### **question_generation.py**
981
+
982
+ **Key Function: `generate_multiple_choice_question()`**
983
+
984
+ **Detailed Process:**
985
+
986
+ **1. Source Content Matching:**
987
+ ```python
988
+ source_references = learning_objective.source_reference
989
+ if isinstance(source_references, str):
990
+ source_references = [source_references]
991
+
992
+ combined_content = ""
993
+ for source_file in source_references:
994
+ # Try exact match: <source file='filename'>
995
+ for file_content in file_contents:
996
+ if f"<source file='{source_file}'>" in file_content:
997
+ combined_content += file_content
998
+ break
999
+
1000
+ # Fallback: partial match
1001
+ if not found:
1002
+ for file_content in file_contents:
1003
+ if source_file in file_content:
1004
+ combined_content += file_content
1005
+ break
1006
+
1007
+ # Last resort: use all content
1008
+ if not combined_content:
1009
+ combined_content = "\n\n".join(file_contents)
1010
+ ```
1011
+
1012
+ **2. Multi-Source Instruction:**
1013
+ ```python
1014
+ if len(source_references) > 1:
1015
+ Add special instruction:
1016
+ "This learning objective spans multiple sources.
1017
+ Your question should:
1018
+ 1. Synthesize information across these sources
1019
+ 2. Test understanding of overarching themes
1020
+ 3. Require knowledge from multiple sources"
1021
+ ```
1022
+
1023
+ **3. Prompt Construction:**
1024
+ Combines extensive quality standards:
1025
+ ```python
1026
+ - Learning objective
1027
+ - Correct answer
1028
+ - Incorrect answer options from objective
1029
+ - GENERAL_QUALITY_STANDARDS
1030
+ - MULTIPLE_CHOICE_STANDARDS
1031
+ - EXAMPLE_QUESTIONS
1032
+ - QUESTION_SPECIFIC_QUALITY_STANDARDS
1033
+ - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
1034
+ - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
1035
+ - ANSWER_FEEDBACK_QUALITY_STANDARDS
1036
+ - Multi-source instruction (if applicable)
1037
+ - Matched course content
1038
+ ```
1039
+
1040
+ **4. API Call:**
1041
+ ```python
1042
+ params = {
1043
+ "model": model,
1044
+ "messages": [
1045
+ {"role": "system", "content": "Expert educational assessment creator"},
1046
+ {"role": "user", "content": prompt}
1047
+ ],
1048
+ "response_format": MultipleChoiceQuestion
1049
+ }
1050
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
1051
+ params["temperature"] = temperature
1052
+
1053
+ response = client.beta.chat.completions.parse(**params)
1054
+ ```
1055
+
1056
+ **5. Post-Processing:**
1057
+ ```python
1058
+ - Set response.id = learning_objective.id
1059
+ - Set response.learning_objective_id = learning_objective.id
1060
+ - Set response.learning_objective = learning_objective.learning_objective
1061
+ - Set response.source_reference = learning_objective.source_reference
1062
+ - Verify all options have feedback
1063
+ - Add default feedback if missing
1064
+ ```
1065
+
1066
+ **6. Error Handling:**
1067
+ ```python
1068
+ On exception:
1069
+ - Create fallback question with 4 generic options
1070
+ - Include error message in question_text
1071
+ - Mark as questionable quality
1072
+ ```
1073
+
1074
+ #### **question_ranking.py**
1075
+
1076
+ **Key Functions:**
1077
+
1078
+ **`group_questions(questions, file_contents)`**
1079
+
1080
+ **Process:**
1081
+ ```python
1082
+ 1. Create prompt with:
1083
+ - GROUP_QUESTIONS_PROMPT
1084
+ - All questions with complete data
1085
+ - Grouping instructions
1086
+
1087
+ 2. Grouping Logic:
1088
+ - Questions with same learning_objective_id are similar
1089
+ - Group by topic overlap
1090
+ - Mark best_in_group within each group
1091
+ - Single-member groups: best_in_group = true by default
1092
+
1093
+ 3. API call:
1094
+ - Model: User-selected
1095
+ - Response format: GroupedMultipleChoiceQuestionsResponse
1096
+
1097
+ 4. Critical Instructions:
1098
+ - MUST return ALL questions
1099
+ - Each question must have group metadata
1100
+ - best_in_group set appropriately
1101
+
1102
+ 5. Returns:
1103
+ {
1104
+ "grouped": List[GroupedMultipleChoiceQuestion],
1105
+ "best_in_group": [questions where best_in_group=true]
1106
+ }
1107
+ ```
1108
+
1109
+ **`rank_questions(questions, file_contents)`**
1110
+
1111
+ **Process:**
1112
+ ```python
1113
+ 1. Create prompt with:
1114
+ - RANK_QUESTIONS_PROMPT
1115
+ - ALL quality standards (comprehensive)
1116
+ - Best-in-group questions only
1117
+ - Course content
1118
+
1119
+ 2. Ranking Criteria (from prompt):
1120
+ - Question clarity and unambiguity
1121
+ - Alignment with learning objective
1122
+ - Quality of incorrect options
1123
+ - Feedback quality
1124
+ - Appropriate difficulty (simple English preferred)
1125
+ - Adherence to all guidelines
1126
+ - Avoidance of problematic words/phrases
1127
+
1128
+ 3. Special Instructions:
1129
+ - DO NOT change question with ID=1
1130
+ - Rank starting from 2 (rank 1 reserved)
1131
+ - Each question gets unique rank
1132
+ - Must return ALL questions
1133
+
1134
+ 4. API call:
1135
+ - Model: User-selected
1136
+ - Response format: RankedMultipleChoiceQuestionsResponse
1137
+
1138
+ 5. Returns:
1139
+ {
1140
+ "ranked": List[RankedMultipleChoiceQuestion]
1141
+ (includes rank and ranking_reasoning for each)
1142
+ }
1143
+ ```
1144
+
1145
+ **Simple vs Complex English Examples (from ranking criteria):**
1146
+ ```
1147
+ Simple: "AI engineers create computer programs that can learn from data"
1148
+ Complex: "AI engineering practitioners architect computational paradigms
1149
+ exhibiting autonomous erudition capabilities"
1150
+ ```
1151
+
1152
+ #### **question_improvement.py**
1153
+
1154
+ **Key Functions:**
1155
+
1156
+ **`judge_question_quality(client, model, temperature, question)`**
1157
+
1158
+ **Process:**
1159
+ ```python
1160
+ 1. Create evaluation prompt with:
1161
+ - Question text
1162
+ - All options with feedback
1163
+ - Quality criteria
1164
+ - Evaluation instructions
1165
+
1166
+ 2. LLM evaluates:
1167
+ - Clarity and lack of ambiguity
1168
+ - Alignment with learning objective
1169
+ - Quality of distractors (incorrect options)
1170
+ - Feedback quality and helpfulness
1171
+ - Appropriate difficulty level
1172
+ - Adherence to all standards
1173
+
1174
+ 3. API call:
1175
+ - Unstructured text response
1176
+ - LLM returns: APPROVED or NOT APPROVED + reasoning
1177
+
1178
+ 4. Parsing:
1179
+ approved = "APPROVED" in response.upper()
1180
+ feedback = full response text
1181
+
1182
+ 5. Returns: (approved: bool, feedback: str)
1183
+ ```
1184
+
1185
+ **`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`**
1186
+
1187
+ **Process:**
1188
+ ```python
1189
+ 1. Extract incorrect options from question
1190
+
1191
+ 2. Create evaluation prompt with:
1192
+ - Each incorrect option
1193
+ - IMMEDIATE_RED_FLAGS checklist
1194
+ - Course content for context
1195
+
1196
+ 3. LLM checks each option for:
1197
+ - Contradictory second clauses
1198
+ - Explicit negations
1199
+ - Absolute terms
1200
+ - Opposite descriptions
1201
+ - Trade-off language
1202
+
1203
+ 4. Returns: needs_regeneration: bool
1204
+
1205
+ 5. If true:
1206
+ - Log to wrong_answer_debug/ directory
1207
+ - Provides detailed feedback on issues
1208
+ ```
1209
+
1210
+ **`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`**
1211
+
1212
+ **Process:**
1213
+ ```python
1214
+ 1. For each question:
1215
+ - Check if regeneration needed
1216
+ - If yes:
1217
+ a. Create new prompt with stricter constraints
1218
+ b. Include original question for context
1219
+ c. Add specific rules about avoiding red flags
1220
+ d. Regenerate options
1221
+ e. Validate again
1222
+ - If no: keep original
1223
+
1224
+ 2. Returns: List of questions with improved incorrect answers
1225
+ ```
1226
+
1227
+ #### **feedback_questions.py**
1228
+
1229
+ **Key Function: `generate_multiple_choice_question_from_feedback()`**
1230
+
1231
+ **Process:**
1232
+ ```python
1233
+ 1. Accept user feedback/guidance as free-form text
1234
+
1235
+ 2. Create prompt combining:
1236
+ - User feedback
1237
+ - All quality standards
1238
+ - Course content
1239
+ - Standard generation criteria
1240
+
1241
+ 3. LLM infers:
1242
+ - Learning objective from feedback
1243
+ - Appropriate question
1244
+ - 4 options with feedback
1245
+ - Source references
1246
+
1247
+ 4. API call:
1248
+ - Model: User-selected
1249
+ - Response format: MultipleChoiceQuestionFromFeedback
1250
+
1251
+ 5. Includes user feedback as metadata in response
1252
+
1253
+ 6. Returns: Single question object
1254
+ ```
1255
+
1256
+ #### **assessment.py**
1257
+
1258
+ **Key Functions:**
1259
+
1260
+ **`generate_questions_in_parallel()`**
1261
+
1262
+ **Parallel Processing Details:**
1263
+
1264
+ ```python
1265
+ 1. Setup:
1266
+ max_workers = min(len(learning_objectives), 5)
1267
+ # Limits to 5 concurrent threads
1268
+
1269
+ 2. Thread Pool Executor:
1270
+ with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
1271
+
1272
+ 3. For each objective (in separate thread):
1273
+
1274
+ Worker function:
1275
+ def generate_question_for_objective(objective, idx):
1276
+ - Generate question
1277
+ - Judge quality
1278
+ - Update with approval and feedback
1279
+ - Handle errors gracefully
1280
+ - Return complete question
1281
+
1282
+ 4. Submit all tasks:
1283
+ future_to_idx = {
1284
+ executor.submit(generate_question_for_objective, obj, i): i
1285
+ for i, obj in enumerate(learning_objectives)
1286
+ }
1287
+
1288
+ 5. Collect results as completed:
1289
+ for future in concurrent.futures.as_completed(future_to_idx):
1290
+ question = future.result()
1291
+ questions.append(question)
1292
+ print progress
1293
+
1294
+ 6. Error handling:
1295
+ - Individual failures don't stop other threads
1296
+ - Placeholder questions created on error
1297
+ - All errors logged
1298
+
1299
+ 7. Returns: List[MultipleChoiceQuestion] with quality judgments
1300
+ ```
1301
+
1302
+ **`save_assessment_to_json(assessment, output_path)`**
1303
+
1304
+ ```python
1305
+ 1. Convert Pydantic model to dict:
1306
+ assessment_dict = assessment.model_dump()
1307
+
1308
+ 2. Write to JSON file:
1309
+ with open(output_path, "w") as f:
1310
+ json.dump(assessment_dict, f, indent=2)
1311
+
1312
+ 3. File contains:
1313
+ {
1314
+ "learning_objectives": [...],
1315
+ "questions": [...]
1316
+ }
1317
+ ```
1318
+
1319
+ ### State Management (`ui/state.py`)
1320
+
1321
+ **Global State Variables:**
1322
+ ```python
1323
+ processed_file_contents = [] # List of XML-tagged content strings
1324
+ generated_learning_objectives = [] # List of learning objective objects
1325
+ ```
1326
+
1327
+ **Functions:**
1328
+ - `get_processed_contents()` → retrieves file contents
1329
+ - `set_processed_contents(contents)` → stores file contents
1330
+ - `get_learning_objectives()` → retrieves objectives
1331
+ - `set_learning_objectives(objectives)` → stores objectives
1332
+ - `clear_state()` → resets both variables
1333
+
1334
+ **Purpose:**
1335
+ - Persists data between UI tabs
1336
+ - Allows Tab 2 to access content processed in Tab 1
1337
+ - Allows Tab 3 to access content for custom questions
1338
+ - Enables regeneration with feedback
1339
+
1340
+ ### UI Handlers
1341
+
1342
+ #### **objective_handlers.py**
1343
+
1344
+ **`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`**
1345
+
1346
+ **Complete Workflow:**
1347
+ ```python
1348
+ 1. Validate inputs (files exist, API key present)
1349
+ 2. Extract file paths from Gradio file objects
1350
+ 3. Process files → get XML-tagged content
1351
+ 4. Store in state
1352
+ 5. Create QuizGenerator
1353
+ 6. Generate multiple runs of base objectives
1354
+ 7. Group and rank objectives
1355
+ 8. Generate incorrect answers for best-in-group
1356
+ 9. Improve incorrect answers
1357
+ 10. Reassign IDs (best from 001 group → ID=1)
1358
+ 11. Format results for display
1359
+ 12. Store in state
1360
+ 13. Return 4 outputs: status, best-in-group, all-grouped, raw
1361
+ ```
1362
+
1363
+ **`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`**
1364
+
1365
+ **Workflow:**
1366
+ ```python
1367
+ 1. Retrieve processed contents from state
1368
+ 2. Append feedback to content:
1369
+ file_contents_with_feedback.append(f"FEEDBACK: {feedback}")
1370
+ 3. Generate new objectives with feedback context
1371
+ 4. Group and rank
1372
+ 5. Return regenerated objectives
1373
+ ```
1374
+
1375
+ **`_reassign_objective_ids(grouped_objectives)`**
1376
+
1377
+ **ID Assignment Logic:**
1378
+ ```python
1379
+ 1. Find all objectives with IDs ending in 001 (1001, 2001, etc.)
1380
+ 2. Identify their groups
1381
+ 3. Find best_in_group objective from these groups
1382
+ 4. Assign it ID = 1
1383
+ 5. Assign all other objectives sequential IDs starting from 2
1384
+ ```
1385
+
1386
+ **`_format_objective_results(grouped_result, all_learning_objectives)`**
1387
+
1388
+ **Formatting:**
1389
+ ```python
1390
+ 1. Sort by ID
1391
+ 2. Create dictionaries from Pydantic objects
1392
+ 3. Include all metadata fields
1393
+ 4. Convert to JSON with indent=2
1394
+ 5. Return 3 formatted outputs + status message
1395
+ ```
1396
+
1397
+ #### **question_handlers.py**
1398
+
1399
+ **`generate_questions(objectives_json, model_name, temperature, num_runs)`**
1400
+
1401
+ **Complete Workflow:**
1402
+ ```python
1403
+ 1. Validate inputs
1404
+ 2. Parse objectives JSON → create LearningObjective objects
1405
+ 3. Retrieve processed contents from state
1406
+ 4. Create QuizGenerator
1407
+ 5. Generate questions (multiple runs in parallel)
1408
+ 6. Group questions by similarity
1409
+ 7. Rank best-in-group questions
1410
+ 8. Optionally improve incorrect answers (currently commented out)
1411
+ 9. Format results
1412
+ 10. Return 4 outputs: status, best-ranked, all-grouped, formatted
1413
+ ```
1414
+
1415
+ **`_generate_questions_multiple_runs()`**
1416
+
1417
+ ```python
1418
+ For each run:
1419
+ 1. Call generate_questions_in_parallel()
1420
+ 2. Assign unique IDs across runs:
1421
+ start_id = len(all_questions) + 1
1422
+ for i, q in enumerate(run_questions):
1423
+ q.id = start_id + i
1424
+ 3. Aggregate all questions
1425
+ ```
1426
+
1427
+ **`_group_and_rank_questions()`**
1428
+
1429
+ ```python
1430
+ 1. Group all questions → get grouped and best_in_group
1431
+ 2. Rank only best_in_group questions
1432
+ 3. Return:
1433
+ {
1434
+ "grouped": all with group metadata,
1435
+ "best_in_group_ranked": best with ranks
1436
+ }
1437
+ ```
1438
+
1439
+ #### **feedback_handlers.py**
1440
+
1441
+ **`propose_question_handler(guidance, model_name, temperature)`**
1442
+
1443
+ **Workflow:**
1444
+ ```python
1445
+ 1. Validate state (processed contents available)
1446
+ 2. Create QuizGenerator
1447
+ 3. Call generate_multiple_choice_question_from_feedback()
1448
+ - Passes user guidance and course content
1449
+ - LLM infers learning objective
1450
+ - Generates complete question
1451
+ 4. Format as JSON
1452
+ 5. Return status and question JSON
1453
+ ```
1454
+
1455
+ ### Formatting Utilities (`ui/formatting.py`)
1456
+
1457
+ **`format_quiz_for_ui(questions_json)`**
1458
+
1459
+ **Process:**
1460
+ ```python
1461
+ 1. Parse JSON to list of question dictionaries
1462
+ 2. Sort by rank if available
1463
+ 3. For each question:
1464
+ - Add header: "**Question N [Rank: X]:** {question_text}"
1465
+ - Add ranking reasoning if available
1466
+ - For each option:
1467
+ - Add letter (A, B, C, D)
1468
+ - Mark correct option
1469
+ - Include option text
1470
+ - Include feedback indented
1471
+ 4. Return formatted string with markdown
1472
+ ```
1473
+
1474
+ **Output Example:**
1475
+ ```
1476
+ **Question 1 [Rank: 2]:** What is the primary purpose of AI agents?
1477
+
1478
+ Ranking Reasoning: Clear question that tests fundamental understanding...
1479
+
1480
+ • A [Correct]: To automate tasks and make decisions
1481
+ ◦ Feedback: Correct! AI agents are designed to automate tasks...
1482
+
1483
+ • B: To replace human workers entirely
1484
+ ◦ Feedback: While AI agents can automate tasks, they are not...
1485
+
1486
+ [continues...]
1487
+ ```
1488
+
1489
+ ---
1490
+
1491
+ ## Quality Standards and Prompts
1492
+
1493
+ ### Learning Objectives Quality Standards
1494
+
1495
+ **From `prompts/learning_objectives.py`:**
1496
+
1497
+ **BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:**
1498
+
1499
+ 1. **Assessability:**
1500
+ - Must be testable via multiple-choice questions
1501
+ - Cannot be about "building", "creating", "developing"
1502
+ - Should use verbs like: identify, list, describe, define, compare
1503
+
1504
+ 2. **Specificity:**
1505
+ - One goal per objective
1506
+ - Don't combine multiple action verbs
1507
+ - Example of what NOT to do: "identify X and explain Y"
1508
+
1509
+ 3. **Source Alignment:**
1510
+ - Derived DIRECTLY from course content
1511
+ - No topics not covered in content
1512
+ - Appropriate difficulty level for course
1513
+
1514
+ 4. **Independence:**
1515
+ - Each objective stands alone
1516
+ - No dependencies on other objectives
1517
+ - No context required from other objectives
1518
+
1519
+ 5. **Focus:**
1520
+ - Address "why" over "what" when possible
1521
+ - Critical knowledge over trivial facts
1522
+ - Principles over specific implementation details
1523
+
1524
+ 6. **Tool/Framework Agnosticism:**
1525
+ - Don't mention specific tools/frameworks
1526
+ - Focus on underlying principles
1527
+ - Example: Don't ask about "Pandas DataFrame methods",
1528
+ ask about "data filtering concepts"
1529
+
1530
+ 7. **First Objective Rule:**
1531
+ - Should be relatively easy recall question
1532
+ - Address main topic/concept of course
1533
+ - Format: "Identify what X is" or "Explain why X is important"
1534
+
1535
+ 8. **Answer Length:**
1536
+ - Aim for ≤20 words in correct answer
1537
+ - Avoid unnecessary elaboration
1538
+ - No compound sentences with extra consequences
1539
+
1540
+ **BLOOMS_TAXONOMY_LEVELS:**
1541
+
1542
+ Levels from lowest to highest:
1543
+ - **Recall:** Retention of key concepts (not trivialities)
1544
+ - **Comprehension:** Connect ideas, demonstrate understanding
1545
+ - **Application:** Apply concept to new but similar scenario
1546
+ - **Analysis:** Examine parts, determine relationships, make inferences
1547
+ - **Evaluation:** Make judgments requiring critical thinking
1548
+
1549
+ **LEARNING_OBJECTIVE_EXAMPLES:**
1550
+
1551
+ Includes 7 high-quality examples with:
1552
+ - Appropriate action verbs
1553
+ - Clear learning objectives
1554
+ - Concise correct answers (mostly <20 words)
1555
+ - Multiple source references
1556
+ - Framework-agnostic language
1557
+
1558
+ ### Question Quality Standards
1559
+
1560
+ **From `prompts/questions.py`:**
1561
+
1562
+ **GENERAL_QUALITY_STANDARDS:**
1563
+
1564
+ - Overall goal: Set learner up for success
1565
+ - Perfect score attainable for thoughtful students
1566
+ - Aligned with course content
1567
+ - Aligned with learning objective and correct answer
1568
+ - No references to manual intervention (software/AI course)
1569
+
1570
+ **MULTIPLE_CHOICE_STANDARDS:**
1571
+
1572
+ - **EXACTLY ONE** correct answer per question
1573
+ - Clear, unambiguous correct answer
1574
+ - Plausible distractors representing common misconceptions
1575
+ - Not obviously wrong distractors
1576
+ - All options similar length and detail
1577
+ - Mutually exclusive options
1578
+ - Avoid "all/none of the above"
1579
+ - Typically 4 options (A, B, C, D)
1580
+ - Don't start feedback with "Correct" or "Incorrect"
1581
+
1582
+ **QUESTION_SPECIFIC_QUALITY_STANDARDS:**
1583
+
1584
+ Questions must:
1585
+ - Match language and tone of course
1586
+ - Match difficulty level of course
1587
+ - Assess only course information
1588
+ - Not teach as part of quiz
1589
+ - Use clear, concise language
1590
+ - Not induce confusion
1591
+ - Provide slight (not major) challenge
1592
+ - Be easily interpreted and unambiguous
1593
+ - Have proper grammar and sentence structure
1594
+ - Be thoughtful and specific (not broad and ambiguous)
1595
+ - Be complete in wording (understanding question shouldn't be part of assessment)
1596
+
1597
+ **CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
1598
+
1599
+ Correct answers must:
1600
+ - Be factually correct and unambiguous
1601
+ - Match course language and tone
1602
+ - Be complete sentences
1603
+ - Match course difficulty level
1604
+ - Contain only course information
1605
+ - Not teach during quiz
1606
+ - Use clear, concise language
1607
+ - Be thoughtful and specific
1608
+ - Be complete (identifying correct answer shouldn't require interpretation)
1609
+
1610
+ **INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
1611
+
1612
+ Incorrect answers should:
1613
+ - Represent reasonable potential misconceptions
1614
+ - Sound plausible to non-experts
1615
+ - Require thought even from diligent learners
1616
+ - Not be obviously wrong
1617
+ - Use incorrect_answer_suggestions from objective (as starting point)
1618
+
1619
+ **Avoid:**
1620
+ - Obviously wrong options anyone can eliminate
1621
+ - Absolute terms: "always", "never", "only", "exclusively"
1622
+ - Phrases like "used exclusively for scenarios where..."
1623
+
1624
+ **ANSWER_FEEDBACK_QUALITY_STANDARDS:**
1625
+
1626
+ **For Incorrect Answers:**
1627
+ - Be informational and encouraging (not punitive)
1628
+ - Single sentence, concise
1629
+ - Do NOT say "Incorrect" or "Wrong"
1630
+
1631
+ **For Correct Answers:**
1632
+ - Be informational and encouraging
1633
+ - Single sentence, concise
1634
+ - Do NOT say "Correct!" (redundant after "Correct: " prefix)
1635
+
1636
+ ### Incorrect Answer Generation Guidelines
1637
+
1638
+ **From `prompts/incorrect_answers.py`:**
1639
+
1640
+ **Core Principles:**
1641
+
1642
+ 1. **Create Common Misunderstandings:**
1643
+ - Represent how students actually misunderstand
1644
+ - Confuse related concepts
1645
+ - Mix up terminology
1646
+
1647
+ 2. **Maintain Identical Structure:**
1648
+ - Match grammatical pattern of correct answer
1649
+ - Same length and complexity
1650
+ - Same formatting style
1651
+
1652
+ 3. **Use Course Terminology Correctly but in Wrong Contexts:**
1653
+ - Apply correct terms incorrectly
1654
+ - Confuse with related concepts
1655
+ - Example: Describe backpropagation but actually describe forward propagation
1656
+
1657
+ 4. **Include Partially Correct Information:**
1658
+ - First part correct, second part wrong
1659
+ - Correct process but wrong application
1660
+ - Correct concept but incomplete
1661
+
1662
+ 5. **Avoid Obviously Wrong Answers:**
1663
+ - No contradictions with basic knowledge
1664
+ - Not immediately eliminable
1665
+ - Require course knowledge to reject
1666
+
1667
+ 6. **Mirror Detail Level and Style:**
1668
+ - Match technical depth
1669
+ - Match tone
1670
+ - Same level of specificity
1671
+
1672
+ 7. **For Lists, Maintain Consistency:**
1673
+ - Same number of items
1674
+ - Same format
1675
+ - Mix some correct with incorrect items
1676
+
1677
+ 8. **AVOID ABSOLUTE TERMS:**
1678
+ - "always", "never", "exclusively", "primarily"
1679
+ - "all", "every", "none", "nothing", "only"
1680
+ - "must", "required", "impossible"
1681
+ - "rather than", "as opposed to", "instead of"
1682
+
1683
+ **IMMEDIATE_RED_FLAGS** (triggers regeneration):
1684
+
1685
+ **Contradictory Second Clauses:**
1686
+ - "but not necessarily"
1687
+ - "at the expense of"
1688
+ - "rather than [core concept]"
1689
+ - "ensuring X rather than Y"
1690
+ - "without necessarily"
1691
+ - "but has no impact on"
1692
+ - "but cannot", "but prevents", "but limits"
1693
+
1694
+ **Explicit Negations:**
1695
+ - "without automating", "without incorporating"
1696
+ - "preventing [main benefit]"
1697
+ - "limiting [main capability]"
1698
+
1699
+ **Opposite Descriptions:**
1700
+ - "fixed steps" (for flexible systems)
1701
+ - "manual intervention" (for automation)
1702
+ - "simple question answering" (for complex processing)
1703
+
1704
+ **Hedging Creating Limitations:**
1705
+ - "sometimes", "occasionally", "might"
1706
+ - "to some extent", "partially", "somewhat"
1707
+
1708
+ **INCORRECT_ANSWER_EXAMPLES:**
1709
+
1710
+ Includes 10 detailed examples showing:
1711
+ - Learning objective
1712
+ - Correct answer
1713
+ - 3 plausible incorrect suggestions
1714
+ - Explanation of why each is plausible but wrong
1715
+ - Consistent formatting across all options
1716
+
1717
+ ### Ranking and Grouping
1718
+
1719
+ **RANK_QUESTIONS_PROMPT:**
1720
+
1721
+ **Criteria:**
1722
+ 1. Question clarity and unambiguity
1723
+ 2. Alignment with learning objective
1724
+ 3. Quality of incorrect options
1725
+ 4. Quality of feedback
1726
+ 5. Appropriate difficulty (simple English preferred)
1727
+ 6. Adherence to all guidelines
1728
+
1729
+ **Critical Instructions:**
1730
+ - DO NOT change question with ID=1
1731
+ - Rank starting from 2
1732
+ - Each question unique rank
1733
+ - Must return ALL questions
1734
+ - No omissions
1735
+ - No duplicate ranks
1736
+
1737
+ **Simple vs Complex English:**
1738
+ ```
1739
+ Simple: "AI engineers create computer programs that learn from data"
1740
+ Complex: "AI engineering practitioners architect computational paradigms
1741
+ exhibiting autonomous erudition capabilities"
1742
+ ```
1743
+
1744
+ **GROUP_QUESTIONS_PROMPT:**
1745
+
1746
+ **Grouping Logic:**
1747
+ - Questions with same learning_objective_id are similar
1748
+ - Identify topic overlap
1749
+ - Mark best_in_group within each group
1750
+ - Single-member groups: best_in_group = true
1751
+
1752
+ **Critical Instructions:**
1753
+ - Must return ALL questions
1754
+ - Each question needs group metadata
1755
+ - No omissions
1756
+ - Best in group marked appropriately
1757
+
1758
+ ---
1759
+
1760
+ ## Summary of Data Flow
1761
+
1762
+ ### Complete End-to-End Flow
1763
+
1764
+ ```
1765
+ User Uploads Files
1766
+
1767
+ ContentProcessor extracts and tags content
1768
+
1769
+ [Stored in global state]
1770
+
1771
+ Generate Base Objectives (multiple runs)
1772
+
1773
+ Group Base Objectives (by similarity)
1774
+
1775
+ Generate Incorrect Answers (for best-in-group only)
1776
+
1777
+ Improve Incorrect Answers (quality check)
1778
+
1779
+ Reassign IDs (best from 001 group → ID=1)
1780
+
1781
+ [Objectives displayed in UI, stored in state]
1782
+
1783
+ Generate Questions (parallel, multiple runs)
1784
+
1785
+ Judge Question Quality (parallel)
1786
+
1787
+ Group Questions (by similarity)
1788
+
1789
+ Rank Questions (best-in-group only)
1790
+
1791
+ [Questions displayed in UI]
1792
+
1793
+ Format for Display
1794
+
1795
+ Export to JSON (optional)
1796
+ ```
1797
+
1798
+ ### Key Optimization Strategies
1799
+
1800
+ 1. **Multiple Generation Runs:**
1801
+ - Generates variety of objectives/questions
1802
+ - Grouping identifies best versions
1803
+ - Reduces risk of poor quality individual outputs
1804
+
1805
+ 2. **Hierarchical Processing:**
1806
+ - Generate base → Group → Enhance → Improve
1807
+ - Only enhances best candidates (saves API calls)
1808
+ - Progressive refinement
1809
+
1810
+ 3. **Parallel Processing:**
1811
+ - Questions generated concurrently (up to 5 threads)
1812
+ - Significant time savings for multiple objectives
1813
+ - Independent evaluations
1814
+
1815
+ 4. **Quality Gating:**
1816
+ - LLM judges question quality
1817
+ - Checks for red flags in incorrect answers
1818
+ - Regenerates problematic content
1819
+
1820
+ 5. **Source Tracking:**
1821
+ - XML tags preserve origin
1822
+ - Questions link back to source materials
1823
+ - Enables accurate content matching
1824
+
1825
+ 6. **Modular Prompts:**
1826
+ - Reusable quality standards
1827
+ - Consistent across all generations
1828
+ - Easy to update centrally
1829
+
1830
+ ---
1831
+
1832
+ ## Configuration and Customization
1833
+
1834
+ ### Available Models
1835
+
1836
+ **Configured in `models/config.py`:**
1837
+ ```python
1838
+ MODELS = [
1839
+ "o3-mini", "o1", # Reasoning models (no temperature)
1840
+ "gpt-4.1", "gpt-4o", # GPT-4 variants
1841
+ "gpt-4o-mini", "gpt-4",
1842
+ "gpt-3.5-turbo", # Legacy
1843
+ "gpt-5", # Latest (no temperature)
1844
+ "gpt-5-mini", # Efficient (no temperature)
1845
+ "gpt-5-nano" # Ultra-efficient (no temperature)
1846
+ ]
1847
+ ```
1848
+
1849
+ **Temperature Support:**
1850
+ - Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature
1851
+ - Other models: Temperature 0.0 to 1.0
1852
+
1853
+ **Model Selection Strategy:**
1854
+ - **Base objectives:** User-selected (default: gpt-5)
1855
+ - **Grouping:** Hardcoded gpt-5-mini (efficiency)
1856
+ - **Incorrect answers:** Separate user selection (default: gpt-5)
1857
+ - **Questions:** User-selected (default: gpt-5)
1858
+ - **Quality judging:** User-selected or gpt-5-mini
1859
+
1860
+ ### Environment Variables
1861
+
1862
+ **Required:**
1863
+ ```
1864
+ OPENAI_API_KEY=your_api_key_here
1865
+ ```
1866
+
1867
+ **Configured via `.env` file in project root**
1868
+
1869
+ ### Customization Points
1870
+
1871
+ 1. **Quality Standards:**
1872
+ - Edit `prompts/learning_objectives.py`
1873
+ - Edit `prompts/questions.py`
1874
+ - Edit `prompts/incorrect_answers.py`
1875
+ - Changes apply to all future generations
1876
+
1877
+ 2. **Example Questions/Objectives:**
1878
+ - Modify LEARNING_OBJECTIVE_EXAMPLES
1879
+ - Modify EXAMPLE_QUESTIONS
1880
+ - Modify INCORRECT_ANSWER_EXAMPLES
1881
+ - LLM learns from these examples
1882
+
1883
+ 3. **Generation Parameters:**
1884
+ - Number of objectives per run
1885
+ - Number of runs (variety)
1886
+ - Temperature (creativity vs consistency)
1887
+ - Model selection (quality vs cost/speed)
1888
+
1889
+ 4. **Parallel Processing:**
1890
+ - `max_workers` in assessment.py
1891
+ - Currently: min(len(objectives), 5)
1892
+ - Adjust for your rate limits
1893
+
1894
+ 5. **Output Formats:**
1895
+ - Modify `formatting.py` for display
1896
+ - Assessment JSON structure in `models/assessment.py`
1897
+
1898
+ ---
1899
+
1900
+ ## Error Handling and Resilience
1901
+
1902
+ ### Content Processing Errors
1903
+
1904
+ - **Invalid JSON notebooks:** Falls back to raw text
1905
+ - **Parse failures:** Wraps in code blocks, continues
1906
+ - **Missing files:** Logged, skipped
1907
+ - **Encoding issues:** UTF-8 fallback
1908
+
1909
+ ### Generation Errors
1910
+
1911
+ - **API failures:** Logged with traceback
1912
+ - **Structured output parse errors:** Fallback responses created
1913
+ - **Missing required fields:** Default values assigned
1914
+ - **Validation errors:** Caught and logged
1915
+
1916
+ ### Parallel Processing Errors
1917
+
1918
+ - **Individual thread failures:** Don't stop other threads
1919
+ - **Placeholder questions:** Created on error
1920
+ - **Complete error details:** Logged for debugging
1921
+ - **Graceful degradation:** Partial results returned
1922
+
1923
+ ### Quality Check Failures
1924
+
1925
+ - **Regeneration failures:** Original kept with warning
1926
+ - **Judge unavailable:** Questions marked unapproved
1927
+ - **Validation failures:** Detailed logs in debug directories
1928
+
1929
+ ---
1930
+
1931
+ ## Debug and Logging
1932
+
1933
+ ### Debug Directories
1934
+
1935
+ 1. **`incorrect_suggestion_debug/`**
1936
+ - Created during objective enhancement
1937
+ - Contains logs of problematic incorrect answers
1938
+ - Format: `{objective_id}.txt`
1939
+ - Includes: Original suggestions, identified issues, regeneration attempts
1940
+
1941
+ 2. **`wrong_answer_debug/`**
1942
+ - Created during question improvement
1943
+ - Logs question-level incorrect answer issues
1944
+ - Regeneration history
1945
+
1946
+ ### Console Logging
1947
+
1948
+ **Extensive logging throughout:**
1949
+ - File processing status
1950
+ - Generation progress (run numbers)
1951
+ - Parallel thread activity (thread IDs)
1952
+ - API call results
1953
+ - Error messages with tracebacks
1954
+ - Timing information (start/end times)
1955
+
1956
+ **Example Log Output:**
1957
+ ```
1958
+ DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt']
1959
+ DEBUG - Found source file: file1.vtt
1960
+ Generating 3 learning objectives from 3 files
1961
+ Successfully generated 3 learning objectives without correct answers
1962
+ Generated correct answer for objective 1
1963
+ Grouping 9 base learning objectives
1964
+ Received 9 grouped results
1965
+ Generating incorrect answer options only for best-in-group objectives...
1966
+ PARALLEL: Starting ThreadPoolExecutor with 3 workers
1967
+ PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective...
1968
+ Question generation completed in 45.23 seconds
1969
+ ```
1970
+
1971
+ ---
1972
+
1973
+ ## Performance Considerations
1974
+
1975
+ ### API Call Optimization
1976
+
1977
+ **Calls per Workflow:**
1978
+
1979
+ For 3 objectives × 3 runs = 9 base objectives:
1980
+
1981
+ 1. **Learning Objectives:**
1982
+ - Base generation: 3 calls (one per run)
1983
+ - Correct answers: 9 calls (one per objective)
1984
+ - Grouping: 1 call
1985
+ - Incorrect answers: ~3 calls (best-in-group only)
1986
+ - Improvement checks: ~3 calls
1987
+ - **Total: ~19 calls**
1988
+
1989
+ 2. **Questions (for 3 objectives × 1 run):**
1990
+ - Question generation: 3 calls (parallel)
1991
+ - Quality judging: 3 calls (parallel)
1992
+ - Grouping: 1 call
1993
+ - Ranking: 1 call
1994
+ - **Total: ~8 calls**
1995
+
1996
+ **Total for complete workflow: ~27 API calls**
1997
+
1998
+ ### Time Estimates
1999
+
2000
+ **Typical Execution Times:**
2001
+ - File processing: <1 second
2002
+ - Objective generation (3×3): 30-60 seconds
2003
+ - Question generation (3×1): 20-40 seconds (with parallelization)
2004
+ - **Total: 1-2 minutes for small course**
2005
+
2006
+ **Factors Affecting Speed:**
2007
+ - Model selection (gpt-5 slower than gpt-5-mini)
2008
+ - Number of runs
2009
+ - Number of objectives/questions
2010
+ - API rate limits
2011
+ - Network latency
2012
+ - Parallel worker count
2013
+
2014
+ ### Cost Optimization
2015
+
2016
+ **Strategies:**
2017
+ 1. Use gpt-5-mini for grouping/ranking (hardcoded)
2018
+ 2. Reduce number of runs (trade-off: variety)
2019
+ 3. Generate fewer objectives initially
2020
+ 4. Use faster models for initial exploration
2021
+ 5. Use premium models for final production
2022
+
2023
+ ---
2024
+
2025
+ ## Conclusion
2026
+
2027
+ The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs:
2028
+
2029
+ - **Modular architecture** for maintainability
2030
+ - **Structured output generation** for reliability
2031
+ - **Quality-driven iterative refinement** for excellence
2032
+ - **Parallel processing** for efficiency
2033
+ - **Comprehensive error handling** for resilience
2034
+
2035
+ The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.
CLAUDE.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Development Commands
6
+
7
+ ### Running the Application
8
+ ```bash
9
+ python app.py
10
+ ```
11
+ The application will start a Gradio web interface at http://127.0.0.1:7860
12
+
13
+ ### Environment Setup
14
+ ```bash
15
+ # Using uv (recommended)
16
+ uv venv -p 3.12
17
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
18
+ uv pip install -r requirements.txt
19
+
20
+ # Using pip
21
+ pip install -r requirements.txt
22
+ ```
23
+
24
+ ### Environment Variables
25
+ Create a `.env` file with:
26
+ ```
27
+ OPENAI_API_KEY=your_api_key_here
28
+ ```
29
+
30
+ ## Architecture Overview
31
+
32
+ This is an AI Course Assessment Generator that creates learning objectives and multiple-choice questions from course materials. The system uses OpenAI's language models with structured output generation via the `instructor` library.
33
+
34
+ ### Core Workflow
35
+ 1. **Content Processing**: Upload course materials (.vtt, .srt, .ipynb) → Extract and tag content with XML source references
36
+ 2. **Learning Objective Generation**: Generate base objectives → Group and rank → Enhance with incorrect answer suggestions
37
+ 3. **Question Generation**: Create multiple-choice questions from objectives → Quality assessment → Ranking and grouping
38
+ 4. **Assessment Export**: Save final assessment to JSON format
39
+
40
+ ### Key Architecture Patterns
41
+
42
+ **Modular Prompt System**: The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules. This allows for consistent quality standards across different generation tasks.
43
+
44
+ **Orchestrator Pattern**: Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.
45
+
46
+ **Structured Output**: All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats.
47
+
48
+ **Source Tracking**: Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.
49
+
50
+ ## Key Components
51
+
52
+ ### Main Generators
53
+ - `LearningObjectiveGenerator` (`learning_objective_generator/generator.py`): Orchestrates learning objective generation, grouping, and enhancement
54
+ - `QuizGenerator` (`quiz_generator/generator.py`): Orchestrates question generation, quality assessment, and ranking
55
+
56
+ ### Data Models (`models/`)
57
+ - Learning objectives progress from `BaseLearningObjective` → `LearningObjective` (with incorrect answers) → `GroupedLearningObjective`
58
+ - Questions progress from `MultipleChoiceQuestion` → `RankedMultipleChoiceQuestion` → `GroupedMultipleChoiceQuestion`
59
+ - Final output is an `Assessment` containing both objectives and questions
60
+
61
+ ### Generation Pipeline
62
+ 1. **Base Generation**: Create initial learning objectives from content
63
+ 2. **Grouping & Ranking**: Group similar objectives and select best in each group
64
+ 3. **Enhancement**: Add incorrect answer suggestions to selected objectives
65
+ 4. **Question Generation**: Create multiple-choice questions with feedback
66
+ 5. **Quality Assessment**: Use LLM judge to evaluate question quality
67
+ 6. **Final Ranking**: Rank and group questions for output
68
+
69
+ ### UI Structure (`ui/`)
70
+ - `app.py`: Gradio interface with tabs for objectives, questions, and export
71
+ - Handler modules process user interactions and coordinate with generators
72
+ - State management tracks data between UI components
73
+
74
+ ## Development Notes
75
+
76
+ ### Model Configuration
77
+ - Default model: `gpt-5` with temperature `1.0`
78
+ - Separate model selection for incorrect answer generation (typically `o1`)
79
+ - Quality assessment often uses `gpt-5-mini` for cost efficiency
80
+
81
+ ### Content Processing
82
+ - Supports `.vtt/.srt` subtitle files and `.ipynb` Jupyter notebooks
83
+ - All content is tagged with XML source references for traceability
84
+ - Content processor handles multiple file formats uniformly
85
+
86
+ ### Quality Standards
87
+ The system enforces educational quality through modular prompt components:
88
+ - General quality standards apply to all generated content
89
+ - Specific standards for questions, correct answers, and incorrect answers
90
+ - Bloom's taxonomy integration for appropriate learning levels
91
+ - Example-based prompting for consistency
README.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Course Assessment Generator
2
+
3
+ This application generates learning objectives and multiple-choice questions for AI course materials based on uploaded content files. It uses OpenAI's language models to create high-quality educational assessments that adhere to specified quality standards.
4
+
5
+ ## Features
6
+
7
+ - Upload course materials in various formats (.vtt, .srt, .ipynb)
8
+ - Generate customizable number of learning objectives
9
+ - Create multiple-choice questions based on learning objectives
10
+ - Evaluate question quality using an LLM judge
11
+ - Save assessments to JSON format
12
+ - Track source references for each learning objective and question
13
+
14
+ ## Setup
15
+
16
+ 1. Clone this repository
17
+ 2. Install the required dependencies:
18
+ ```
19
+ pip install -r requirements.txt
20
+ ```
21
+ 3. Create a `.env` file in the project root with your OpenAI API key:
22
+ ```
23
+ OPENAI_API_KEY=your_api_key_here
24
+ ```
25
+
26
+ ## Usage
27
+
28
+ 1. Run the application:
29
+ ```
30
+ python app.py
31
+ ```
32
+ 2. Open the Gradio interface in your web browser (typically at http://127.0.0.1:7860)
33
+ 3. Upload your course materials (.vtt, .srt, .ipynb files)
34
+ 4. Specify the number of learning objectives to generate
35
+ 5. Select the OpenAI model to use
36
+ 6. Generate learning objectives
37
+ 7. Review and provide feedback on the generated objectives
38
+ 8. Generate multiple-choice questions based on the approved objectives
39
+ 9. Review the generated questions and their quality assessments
40
+ 10. The final assessment will be saved as `assessment.json` in the project directory
41
+
42
+ ## Project Structure
43
+
44
+ - `app.py`: Entry point for the application
45
+
46
+ ### Modules
47
+
48
+ - `models/`: Pydantic data models
49
+ - `__init__.py`: Exports all models
50
+ - `learning_objectives.py`: Learning objective data models
51
+ - `questions.py`: Question and option data models
52
+ - `assessment.py`: Assessment data models
53
+
54
+ - `ui/`: User interface components
55
+ - `__init__.py`: Package initialization
56
+ - `app.py`: Gradio UI implementation
57
+ - `content_processor.py`: Processes uploaded files and extracts content
58
+ - `objective_handlers.py`: Handlers for learning objective generation
59
+ - `question_handlers.py`: Handlers for question generation
60
+ - `feedback_handlers.py`: Handlers for feedback and regeneration
61
+ - `formatting.py`: Formatting utilities for UI display
62
+ - `state.py`: State management for the UI
63
+
64
+ - `quiz_generator/`: Quiz generation components
65
+ - `__init__.py`: Package initialization
66
+ - `generator.py`: Main QuizGenerator class
67
+ - `assessment.py`: Assessment generation logic
68
+ - `question_generation.py`: Question generation logic
69
+ - `question_improvement.py`: Question quality improvement logic
70
+ - `question_ranking.py`: Question ranking and grouping logic
71
+ - `feedback_questions.py`: Feedback-based question generation
72
+
73
+ - `learning_objective_generator/`: Learning objective generation components
74
+ - `__init__.py`: Package initialization
75
+ - `generator.py`: Main generator class
76
+ - `base_generation.py`: Base generation logic
77
+ - `enhancement.py`: Enhancement logic
78
+ - `grouping_and_ranking.py`: Grouping and ranking logic
79
+
80
+ - `prompts/`: Prompt templates and components
81
+ - `questions.py`: Question generation prompts
82
+ - `incorrect_answers.py`: Incorrect answer generation prompts
83
+ - `learning_objectives.py`: Learning objective generation prompts
84
+
85
+ - `obsolete/`: Deprecated files (not used in current implementation)
86
+
87
+ - `specs.md`: Project specifications
88
+ - `project_flow.md`: Detailed description of the project architecture and workflow
89
+
90
+ ## Requirements
91
+
92
+ - Python 3.8+
93
+ - Gradio 4.19.2+
94
+ - Pydantic 2.8.0+
95
+ - OpenAI 1.52.0+
96
+ - nbformat 5.9.2+
97
+ - instructor 1.7.9+
98
+ - python-dotenv 1.0.0+
99
+
100
+ Install dependencies using uv (recommended):
101
+ ```
102
+ uv venv -p 3.12
103
+ source .venv/bin/activate # On Windows use: .venv\Scripts\activate
104
+ uv pip install -r requirements.txt
105
+ ```
106
+
107
+ ## Notes
108
+
109
+ - The application uses XML-style source tags to track which file each piece of content comes from
110
+ - Questions are evaluated against quality standards to ensure they meet educational requirements
111
+ - Each question includes feedback for both correct and incorrect answers
112
+
113
+ ## Prompt Structure
114
+
115
+ The application's prompt system in `prompts.py` has been refactored into modular components for better maintainability:
116
+
117
+ - `GENERAL_QUALITY_STANDARDS`: Overall quality standards for all generated content
118
+ - `QUESTION_SPECIFIC_QUALITY_STANDARDS`: Standards specific to question generation
119
+ - `CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS`: Standards for correct answer options
120
+ - `INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS`: Standards for creating plausible incorrect answers
121
+ - `EXAMPLE_QUESTIONS`: A collection of high-quality example questions for model guidance
122
+ - `MULTIPLE_CHOICE_STANDARDS`: Standards specific to multiple-choice question format
123
+ - `BLOOMS_TAXONOMY_LEVELS`: Educational taxonomy for different levels of learning
124
+ - `ANSWER_FEEDBACK_QUALITY_STANDARDS`: Standards for providing helpful feedback
125
+ - `LEARNING_OBJECTIVES_PROMPT`: Template for generating learning objectives
126
+ - `LEARNING_OBJECTIVE_EXAMPLES`: Examples of well-formulated learning objectives
127
+
128
+ These components are imported and combined in `quiz_generator.py` to create comprehensive prompts for different generation tasks. This modular approach makes it easier to:
129
+
130
+ 1. Update individual aspects of the prompt without affecting others
131
+ 2. Reuse common standards across different generation tasks
132
+ 3. Maintain consistent quality across all generated content
133
+
134
+ ## Detailed Project Flow
135
+
136
+ ### Overview
137
+
138
+ This section provides a more detailed look at how the various components of the system work together to generate educational assessments.
139
+
140
+ ### Core Components
141
+
142
+ 1. **Content Processing**: Handles ingestion of course materials from different file formats
143
+ 2. **Learning Objective Generation**: Creates learning objectives from the processed content
144
+ 3. **Question Generation**: Produces multiple-choice questions for each learning objective
145
+ 4. **Quality Assessment**: Evaluates the generated questions for quality
146
+ 5. **UI Interface**: Provides a Gradio-based web interface for user interaction
147
+
148
+ ### Application Entry Point (`app.py`)
149
+
150
+ - Serves as the entry point for the application
151
+ - Loads environment variables (including OpenAI API key)
152
+ - Creates and launches the Gradio UI
153
+
154
+ ### User Interface (`ui/` module)
155
+
156
+ - Creates the Gradio interface for user interaction
157
+ - Organizes functionality into tabs:
158
+ - File upload and learning objective generation
159
+ - Question generation
160
+ - Preview and export
161
+
162
+ - Key components:
163
+ - `app.py`: Creates the Gradio interface and defines the UI layout
164
+ - `objective_handlers.py`: Handles learning objective generation and regeneration
165
+ - `question_handlers.py`: Handles question generation and regeneration
166
+ - `feedback_handlers.py`: Handles user feedback and custom question generation
167
+ - `formatting.py`: Formats quiz data for UI display
168
+ - `state.py`: Manages state between UI components
169
+
170
+ ### Content Processing (`ui/content_processor.py`)
171
+
172
+ - `ContentProcessor` class processes different file types:
173
+ - `.vtt` and `.srt` subtitle files
174
+ - `.ipynb` Jupyter notebook files
175
+ - For each file, adds XML source tags to track the origin of content
176
+ - Returns structured content for further processing
177
+
178
+ ### Quiz Generation (`quiz_generator/` module)
179
+
180
+ - `QuizGenerator` class is the central component that:
181
+ - Generates learning objectives from processed content
182
+ - Creates multiple-choice questions for each objective
183
+ - Judges question quality
184
+ - Saves assessments to JSON
185
+
186
+ #### Learning Objective Generation
187
+
188
+ 1. Takes processed file contents as input
189
+ 2. Combines content and creates a prompt (utilizing modular components from `prompts.py`)
190
+ 3. Uses OpenAI's API with instructor to generate learning objectives
191
+ 4. Returns structured `LearningObjective` objects
192
+
193
+ #### Question Generation
194
+
195
+ 1. For each learning objective:
196
+ - Retrieves relevant content from source files
197
+ - Creates a prompt by combining modular components from `prompts.py`
198
+ - Generates a multiple-choice question with feedback for each option
199
+ - Returns a structured `MultipleChoiceQuestion` object
200
+
201
+ ### Data Models (`models/` module)
202
+
203
+ Defines the data structures used throughout the application:
204
+ - `LearningObjective`: Represents a learning objective with ID, text, and source references
205
+ - `MultipleChoiceOption`: Represents an answer option with text, correctness flag, and feedback
206
+ - `MultipleChoiceQuestion`: Represents a complete question with options, linked to learning objectives
207
+ - `RankedMultipleChoiceQuestion`: Extends MultipleChoiceQuestion with ranking information
208
+ - `GroupedMultipleChoiceQuestion`: Extends RankedMultipleChoiceQuestion with grouping information
209
+ - `Assessment`: Collection of learning objectives and questions
210
+
211
+ ### Prompt Component Integration
212
+
213
+ The modular prompt components in the `prompts/` directory are imported into the quiz generation modules and assembled into complete prompts as needed:
214
+
215
+ 1. **Learning Objective Generation**:
216
+ - Components like `LEARNING_OBJECTIVES_PROMPT`, `LEARNING_OBJECTIVE_EXAMPLES`, and `BLOOMS_TAXONOMY_LEVELS` are combined with course content
217
+ - This creates a comprehensive prompt that guides the LLM in generating relevant and well-structured learning objectives
218
+
219
+ 2. **Question Generation**:
220
+ - Components like `GENERAL_QUALITY_STANDARDS`, `MULTIPLE_CHOICE_STANDARDS`, `QUESTION_SPECIFIC_QUALITY_STANDARDS`, etc. are combined
221
+ - Along with the learning objective and course content, these form a detailed prompt that ensures high-quality question generation
222
+
223
+ ### Workflow Summary
224
+
225
+ 1. User uploads content files (notebooks, subtitles) through the UI
226
+ 2. System processes files and extracts content with source references
227
+ 3. LLM generates learning objectives based on content
228
+ 4. User reviews and approves learning objectives
229
+ 5. System generates multiple-choice questions for each approved objective
230
+ 6. Questions are presented to the user for review and export
231
+
232
+ This modular approach makes it easier to maintain, update, and experiment with different prompt components without disrupting the overall system. Any changes to the components in `prompts.py` will affect how learning objectives and questions are generated, potentially changing the style, format, and quality of the output.
app.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from dotenv import load_dotenv
3
+ from ui.app import create_ui
4
+
5
+ # Load environment variables
6
+ load_dotenv()
7
+
8
+ # Check if API key is set
9
+ if not os.getenv("OPENAI_API_KEY"):
10
+ print("Warning: OPENAI_API_KEY environment variable not set.")
11
+ print("Please set it in a .env file or in your environment variables.")
12
+
13
+ if __name__ == "__main__":
14
+ # Create and launch the Gradio UI
15
+ app = create_ui()
16
+ app.launch(share=False)
diagram.mmd ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ sequenceDiagram
2
+ participant U as User
3
+ participant App as app.py
4
+ participant UI as ui/app.py
5
+ participant OH as objective_handlers.py
6
+ participant QH as question_handlers.py
7
+ participant FH as feedback_handlers.py
8
+ participant CP as ContentProcessor
9
+ participant QG as QuizGenerator
10
+ participant State as state.py
11
+ participant OpenAI as OpenAI API
12
+
13
+ Note over U, OpenAI: Application Startup
14
+ U->>App: python app.py
15
+ App->>App: load_dotenv()
16
+ App->>App: Check OPENAI_API_KEY
17
+ App->>UI: create_ui()
18
+ UI->>UI: Create Gradio interface with 3 tabs
19
+ UI->>U: Launch web interface at http://127.0.0.1:7860
20
+
21
+ Note over U, OpenAI: Tab 1: Generate Learning Objectives
22
+ U->>UI: Upload files (.vtt, .srt, .ipynb)
23
+ U->>UI: Set parameters (objectives, runs, model, temperature)
24
+ U->>UI: Click "Generate Learning Objectives"
25
+
26
+ UI->>OH: process_files(files, params)
27
+ OH->>OH: _extract_file_paths(files)
28
+ OH->>CP: process_files(file_paths)
29
+ CP->>OH: file_contents (with XML tags)
30
+ OH->>State: set_processed_contents(file_contents)
31
+
32
+ OH->>QG: QuizGenerator(api_key, model, temperature)
33
+ OH->>OH: _generate_multiple_runs()
34
+ loop For each run
35
+ OH->>QG: generate_base_learning_objectives()
36
+ QG->>OpenAI: API call for objectives
37
+ OpenAI->>QG: Base learning objectives
38
+ end
39
+
40
+ OH->>OH: _group_base_objectives_add_incorrect_answers()
41
+ OH->>QG: group_base_learning_objectives()
42
+ QG->>OpenAI: API call for grouping/ranking
43
+ OpenAI->>QG: Grouped objectives
44
+ OH->>QG: generate_lo_incorrect_answer_options()
45
+ QG->>OpenAI: API call for incorrect answers
46
+ OpenAI->>QG: Enhanced objectives
47
+
48
+ OH->>State: set_learning_objectives(grouped_result)
49
+ OH->>OH: _format_objective_results()
50
+ OH->>UI: Return formatted results
51
+ UI->>U: Display objectives in 3 text boxes
52
+
53
+ Note over U, OpenAI: Tab 2: Generate Questions
54
+ U->>UI: Review objectives JSON (auto-populated)
55
+ U->>UI: Set question generation parameters
56
+ U->>UI: Click "Generate Questions"
57
+
58
+ UI->>QH: generate_questions(objectives_json, params)
59
+ QH->>QH: _parse_learning_objectives(objectives_json)
60
+ QH->>State: get_processed_contents()
61
+ QH->>QG: QuizGenerator(api_key, model, temperature)
62
+
63
+ QH->>QH: _generate_questions_multiple_runs()
64
+ loop For each run
65
+ QH->>QG: generate_questions_in_parallel()
66
+ QG->>OpenAI: API calls for questions
67
+ OpenAI->>QG: Multiple choice questions
68
+ end
69
+
70
+ QH->>QH: _group_and_rank_questions()
71
+ QH->>QG: group_questions()
72
+ QG->>OpenAI: API call for grouping
73
+ OpenAI->>QG: Grouped questions
74
+ QH->>QG: rank_questions()
75
+ QG->>OpenAI: API call for ranking
76
+ OpenAI->>QG: Ranked questions
77
+
78
+ QH->>QH: _format_question_results()
79
+ QH->>UI: Return formatted quiz results
80
+ UI->>U: Display questions and formatted quiz
81
+
82
+ Note over U, OpenAI: Tab 3: Propose/Edit Question
83
+ U->>UI: Enter question guidance/feedback
84
+ U->>UI: Set model parameters
85
+ U->>UI: Click "Generate Question"
86
+
87
+ UI->>FH: propose_question_handler(guidance, params)
88
+ FH->>State: get_processed_contents()
89
+ FH->>QG: QuizGenerator(api_key, model, temperature)
90
+ FH->>QG: generate_multiple_choice_question_from_feedback()
91
+ QG->>OpenAI: API call with feedback
92
+ OpenAI->>QG: Single question
93
+ FH->>UI: Return formatted question JSON
94
+ UI->>U: Display generated question
95
+
96
+ Note over U, OpenAI: Optional: Regenerate Objectives
97
+ U->>UI: Provide feedback on objectives
98
+ U->>UI: Click "Regenerate Learning Objectives"
99
+ UI->>OH: regenerate_objectives(objectives, feedback, params)
100
+ OH->>State: get_processed_contents()
101
+ OH->>OH: Add feedback to file_contents
102
+ OH->>QG: Generate with feedback context
103
+ QG->>OpenAI: API calls with feedback
104
+ OpenAI->>QG: Improved objectives
105
+ OH->>UI: Return regenerated objectives
106
+ UI->>U: Display updated objectives
learning_objective_generator/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .generator import LearningObjectiveGenerator
2
+
3
+ __all__ = ['LearningObjectiveGenerator']
learning_objective_generator/base_generation.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from openai import OpenAI
3
+ import re
4
+ import os
5
+ from models import BaseLearningObjective, BaseLearningObjectiveWithoutCorrectAnswer, BaseLearningObjectivesWithoutCorrectAnswerResponse, TEMPERATURE_UNAVAILABLE
6
+ from prompts.learning_objectives import BASE_LEARNING_OBJECTIVES_PROMPT, BLOOMS_TAXONOMY_LEVELS, LEARNING_OBJECTIVE_EXAMPLES, LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
7
+
8
+
9
+ def _get_run_manager():
10
+ """Get run manager if available, otherwise return None."""
11
+ try:
12
+ from ui.run_manager import get_run_manager
13
+ return get_run_manager()
14
+ except:
15
+ return None
16
+
17
+ def generate_base_learning_objectives(client: OpenAI, model: str, temperature: float, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjective]:
18
+ """
19
+ Generate learning objectives with correct answers by first generating the objectives and then adding correct answers.
20
+ This is a wrapper function that calls the two separate functions for better separation of concerns.
21
+ """
22
+ print(f"Generating {num_objectives} learning objectives from {len(file_contents)} files")
23
+
24
+ # First, generate the learning objectives without correct answers
25
+ objectives_without_answers = generate_base_learning_objectives_without_correct_answers(
26
+ client, model, temperature, file_contents, num_objectives
27
+ )
28
+
29
+ # Then, generate correct answers for those objectives
30
+ objectives_with_answers = generate_correct_answers_for_objectives(
31
+ client, model, temperature, file_contents, objectives_without_answers
32
+ )
33
+
34
+ return objectives_with_answers
35
+
36
+ def generate_base_learning_objectives_without_correct_answers(client: OpenAI, model: str, temperature: float, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjectiveWithoutCorrectAnswer]:
37
+ """Generate learning objectives without correct answers from course content."""
38
+ # Extract the source filenames for reference
39
+ sources = set()
40
+ for file_content in file_contents:
41
+ source_match = re.search(r"<source file='([^']+)'>", file_content)
42
+ if source_match:
43
+ source = source_match.group(1)
44
+ sources.add(source)
45
+ print(f"DEBUG - Found source file: {source}")
46
+ print(f"DEBUG - Found {len(sources)} source files: {sources}")
47
+ print(f"DEBUG - Using {len(file_contents)} files for learning objectives")
48
+ combined_content = "\n\n".join(file_contents)
49
+ prompt = f"""
50
+ You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials. Based on the following course content, generate {num_objectives} clear and concise learning objectives.
51
+
52
+
53
+ {BASE_LEARNING_OBJECTIVES_PROMPT}
54
+
55
+ Consider Bloom's taxonomy in the context of the learning objective you are writing and choose the appropriate framing for the question
56
+ and answer options in the context of Bloom's taxonomy.
57
+ <BloomsTaxonomyLevels>
58
+ {BLOOMS_TAXONOMY_LEVELS}
59
+ </BloomsTaxonomyLevels>
60
+
61
+ Format your response like this, according to the data model provided for each objective:
62
+ ```json
63
+ {{
64
+ id: int = Unique identifier for the learning objective,
65
+ learning_objective: str = the learning objective text,
66
+ source_reference: Union[List[str], str] = List of paths to the files from which this learning objective was extracted,
67
+ }}
68
+ ```
69
+
70
+ Here is an example of high quality learning objectives:
71
+ <learning objectives>
72
+ {LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS}
73
+ </learning objectives>
74
+
75
+ Below is the course content. The source references are embedded in xml tags within the context.
76
+ <course content>
77
+ {combined_content}
78
+ </course content>
79
+ """
80
+ try:
81
+ # Use OpenAI beta API for structured output
82
+
83
+ params = {
84
+ "model": model,
85
+ "messages": [
86
+ {"role": "system", "content": "You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials."},
87
+ {"role": "user", "content": prompt}
88
+ ],
89
+ "response_format": BaseLearningObjectivesWithoutCorrectAnswerResponse
90
+ }
91
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
92
+ params["temperature"] = temperature
93
+
94
+ completion = client.beta.chat.completions.parse(**params)
95
+ response = completion.choices[0].message.parsed.objectives
96
+ # Assign IDs and format source_reference
97
+ for i, objective in enumerate(response):
98
+ objective.id = i + 1
99
+ if isinstance(objective.source_reference, str):
100
+ if "," in objective.source_reference:
101
+ source_refs = [os.path.basename(src.strip()) for src in objective.source_reference.split(",")]
102
+ objective.source_reference = source_refs
103
+ else:
104
+ objective.source_reference = os.path.basename(objective.source_reference)
105
+ elif isinstance(objective.source_reference, list):
106
+ objective.source_reference = [os.path.basename(src) for src in objective.source_reference]
107
+ print(f"Successfully generated {len(response)} learning objectives without correct answers")
108
+ return response
109
+ except Exception as e:
110
+ print(f"Error generating learning objectives without correct answers: {e}")
111
+ # Re-raise the exception instead of generating fallbacks
112
+ raise
113
+
114
+ def generate_correct_answers_for_objectives(client: OpenAI, model: str, temperature: float, file_contents: List[str], objectives_without_answers: List[BaseLearningObjectiveWithoutCorrectAnswer]) -> List[BaseLearningObjective]:
115
+ """Generate correct answers for the given learning objectives."""
116
+ combined_content = "\n\n".join(file_contents)
117
+ run_manager = _get_run_manager()
118
+
119
+ # Create a list to store the objectives with answers
120
+ objectives_with_answers = []
121
+
122
+ # Process each objective to generate a correct answer
123
+ for objective in objectives_without_answers:
124
+ prompt = f"""
125
+ You are an expert educational content creator specializing in creating precise, relevant, and concise correct answers for learning objectives.
126
+
127
+
128
+ Use the below learning objective to generate the correct answer:
129
+
130
+ <learning_objective>
131
+ "id": {objective.id},
132
+ "learning_objective": "{objective.learning_objective}",
133
+ "source_reference": "{objective.source_reference}"
134
+ </learning_objective>
135
+
136
+
137
+ Use the below course content to generate the correct answer:
138
+ <course_content>
139
+ {combined_content}
140
+ </course_content>
141
+
142
+
143
+
144
+
145
+ Please provide a clear, concise, and accurate correct answer for this learning objective. The answer should be:
146
+ 1. Directly answering the learning objective
147
+ 2. Concise (preferably under 20 words). Avoids unnecessary length. See example below on avoiding unnecessary length.
148
+ 3. Focused on the core concept without unnecessary elaboration
149
+ 4. Based on the course content provided
150
+
151
+ Format your response as a plain text answer only, without any additional explanation or formatting.
152
+
153
+ Here are examples of high quality learning objective examples with correct answers:
154
+
155
+ <learning_objective_examples>
156
+ {LEARNING_OBJECTIVE_EXAMPLES}
157
+ </learning_objective_examples>
158
+
159
+ """
160
+
161
+ try:
162
+ params = {
163
+ "model": model,
164
+ "messages": [
165
+ {"role": "system", "content": "You are an expert educational content creator specializing in creating precise, relevant correct answers for learning objectives."},
166
+ {"role": "user", "content": prompt}
167
+ ]
168
+ }
169
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
170
+ params["temperature"] = temperature
171
+
172
+ completion = client.chat.completions.create(**params)
173
+ correct_answer = completion.choices[0].message.content.strip()
174
+
175
+ # Create a new BaseLearningObjective with the correct answer
176
+ objective_with_answer = BaseLearningObjective(
177
+ id=objective.id,
178
+ learning_objective=objective.learning_objective,
179
+ source_reference=objective.source_reference,
180
+ correct_answer=correct_answer
181
+ )
182
+
183
+ objectives_with_answers.append(objective_with_answer)
184
+ if run_manager:
185
+ run_manager.log(f"Generated correct answer for objective {objective.id}", level="INFO")
186
+
187
+ except Exception as e:
188
+ if run_manager:
189
+ run_manager.log(f"Error generating correct answer for objective {objective.id}: {e}", level="ERROR")
190
+ # Create an objective with an error message as the correct answer
191
+ objective_with_answer = BaseLearningObjective(
192
+ id=objective.id,
193
+ learning_objective=objective.learning_objective,
194
+ source_reference=objective.source_reference,
195
+ correct_answer="[Error generating correct answer]"
196
+ )
197
+ objectives_with_answers.append(objective_with_answer)
198
+
199
+ print(f"Successfully generated correct answers for {len(objectives_with_answers)} learning objectives")
200
+ return objectives_with_answers
201
+
learning_objective_generator/enhancement.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from openai import OpenAI
3
+ from models import BaseLearningObjective, LearningObjective, TEMPERATURE_UNAVAILABLE
4
+ from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
5
+
6
+ def generate_incorrect_answer_options(client: OpenAI, model: str, temperature: float, file_contents: List[str], base_objectives: List[BaseLearningObjective], model_override: str = None) -> List[LearningObjective]:
7
+ """
8
+ Generate incorrect answer options for each base learning objective.
9
+
10
+ Args:
11
+ file_contents: List of file contents with source tags
12
+ base_objectives: List of base learning objectives to enhance
13
+
14
+ Returns:
15
+ List of learning objectives with incorrect answer suggestions
16
+ """
17
+ print(f"Generating incorrect answer options for {len(base_objectives)} learning objectives")
18
+
19
+ # Create combined content for context
20
+ combined_content = "\n\n".join(file_contents)
21
+ enhanced_objectives = []
22
+
23
+ for i, objective in enumerate(base_objectives):
24
+ print(f"Processing objective {i+1}/{len(base_objectives)}: {objective.learning_objective[:50]}...")
25
+ print(f"Learning objective: {objective.learning_objective}")
26
+ print(f"Correct answer: {objective.correct_answer}")
27
+ # # Create the prompt for generating incorrect answer options
28
+ prompt = f"""
29
+ Based on the learning objective and correct answer provided below.
30
+
31
+ Learning Objective: {objective.learning_objective}
32
+ Correct Answer: {objective.correct_answer}
33
+
34
+
35
+ Generate 3 incorrect answer options.
36
+
37
+
38
+ Use the examples with explanations below to guide you in generating incorrect answer options:
39
+ <examples_with_explanation>
40
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
41
+ </examples_with_explanation>
42
+
43
+
44
+
45
+ Here's the course content that the student has been exposed to:
46
+ <course_content>
47
+ {combined_content}
48
+ </course_content>
49
+
50
+ Here's the learning objective that was identified:
51
+
52
+ <learning_objective>
53
+ "id": {objective.id},
54
+ "learning_objective": "{objective.learning_objective}",
55
+ "source_reference": "{objective.source_reference}",
56
+ "correct_answer": "{objective.correct_answer}"
57
+ </learning_objective>
58
+
59
+ When creating incorrect answers, refer to the correct answer <correct_answer>{objective.correct_answer}</correct_answer>.
60
+ Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
61
+ Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence and 20 words long. Pay attention to the
62
+ example in <examples_with_explanation> about avoiding unnecessary length.
63
+ """
64
+
65
+
66
+
67
+
68
+
69
+ try:
70
+ model_to_use = model_override if model_override else model
71
+
72
+ # Use OpenAI beta API for structured output
73
+
74
+
75
+ system_prompt = "You are an expert in designing effective multiple-choice questions that assess higher-order thinking skills while following established educational best practices."
76
+ params = {
77
+ "model": model_to_use,
78
+ "messages": [
79
+ {"role": "system", "content": system_prompt},
80
+ {"role": "user", "content": prompt}
81
+ ],
82
+ "response_format": LearningObjective
83
+ }
84
+ if not TEMPERATURE_UNAVAILABLE.get(model_to_use, True):
85
+ params["temperature"] = temperature # Use higher temperature for creative misconceptions
86
+
87
+ print(f"DEBUG - Using model {model_to_use} for incorrect answer options with temperature {params.get('temperature', 'N/A')}")
88
+
89
+ completion = client.beta.chat.completions.parse(**params)
90
+ enhanced_obj = completion.choices[0].message.parsed
91
+ # Simple debugging for incorrect answer suggestions
92
+ if enhanced_obj.incorrect_answer_options:
93
+ print(f" → Got {len(enhanced_obj.incorrect_answer_options)} incorrect answers")
94
+ print(f" → First option: {enhanced_obj.incorrect_answer_options[0][:100]}..." if len(enhanced_obj.incorrect_answer_options[0]) > 100 else enhanced_obj.incorrect_answer_options[0])
95
+ else:
96
+ print(" → No incorrect answer options received!")
97
+
98
+ # Preserve grouping metadata from the original objective
99
+ enhanced_obj.in_group = getattr(objective, 'in_group', None)
100
+ enhanced_obj.group_members = getattr(objective, 'group_members', None)
101
+ enhanced_obj.best_in_group = getattr(objective, 'best_in_group', None)
102
+
103
+ enhanced_objectives.append(enhanced_obj)
104
+
105
+ except Exception as e:
106
+ print(f"Error generating incorrect answer options for objective {objective.id}: {e}")
107
+ # If there's an error, create a learning objective without suggestions
108
+ enhanced_obj = LearningObjective(
109
+ id=objective.id,
110
+ learning_objective=objective.learning_objective,
111
+ source_reference=objective.source_reference,
112
+ correct_answer=objective.correct_answer,
113
+ incorrect_answer_options=None,
114
+ in_group=getattr(objective, 'in_group', None),
115
+ group_members=getattr(objective, 'group_members', None),
116
+ best_in_group=getattr(objective, 'best_in_group', None)
117
+ )
118
+ enhanced_objectives.append(enhanced_obj)
119
+
120
+ print(f"Generated incorrect answer options for {len(enhanced_objectives)} learning objectives")
121
+ return enhanced_objectives
learning_objective_generator/generator.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict, Any
2
+ from openai import OpenAI
3
+ from models import BaseLearningObjective, LearningObjective
4
+ from .base_generation import generate_base_learning_objectives
5
+ from .enhancement import generate_incorrect_answer_options
6
+ from .grouping_and_ranking import group_base_learning_objectives, get_best_in_group_objectives
7
+ from .suggestion_improvement import regenerate_incorrect_answers
8
+
9
+ class LearningObjectiveGenerator:
10
+ """Simple orchestrator for learning objective generation."""
11
+
12
+ def __init__(self, api_key: str, model: str = "gpt-5", temperature: float = 1.0):
13
+ self.client = OpenAI(api_key=api_key)
14
+ self.model = model
15
+ self.temperature = temperature
16
+
17
+ def generate_base_learning_objectives(self, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjective]:
18
+ """Generate base learning objectives without incorrect answer suggestions."""
19
+ return generate_base_learning_objectives(
20
+ self.client, self.model, self.temperature, file_contents, num_objectives
21
+ )
22
+
23
+ def group_base_learning_objectives(self, base_objectives: List[BaseLearningObjective], file_contents: List[str]) -> Dict[str, List]:
24
+ """Group base learning objectives and identify the best in each group."""
25
+ return group_base_learning_objectives(
26
+ self.client, self.model, self.temperature, base_objectives, file_contents
27
+ )
28
+
29
+ def generate_incorrect_answer_options(self, file_contents: List[str], base_objectives: List[BaseLearningObjective], model_override: str = None) -> List[LearningObjective]:
30
+ """Generate incorrect answer options for the given base learning objectives."""
31
+ return generate_incorrect_answer_options(
32
+ self.client, self.model, self.temperature, file_contents, base_objectives, model_override
33
+ )
34
+
35
+ def generate_and_group_learning_objectives(self, file_contents: List[str], num_objectives: int, model_override: str = None) -> Dict[str, List]:
36
+ """Complete workflow: generate base objectives, group them, and generate incorrect answers only for best in group."""
37
+ # Step 1: Generate base learning objectives
38
+ base_objectives = self.generate_base_learning_objectives(file_contents, num_objectives)
39
+
40
+ # Step 2: Group base learning objectives and get best in group
41
+ grouped_result = self.group_base_learning_objectives(base_objectives, file_contents)
42
+ best_in_group_base = grouped_result["best_in_group"]
43
+
44
+ # Step 3: Generate incorrect answer suggestions only for best in group objectives
45
+ enhanced_best_objectives = self.generate_incorrect_answer_options(file_contents, best_in_group_base, model_override)
46
+
47
+ # Return both the full grouped list (without enhancements) and the enhanced best-in-group list
48
+ return {
49
+ "all_grouped": grouped_result["all_grouped"],
50
+ "best_in_group": enhanced_best_objectives
51
+ }
52
+
53
+ def regenerate_incorrect_answers(self, learning_objectives: List[LearningObjective], file_contents: List[str]) -> List[LearningObjective]:
54
+ """Regenerate incorrect answer suggestions for learning objectives that need improvement."""
55
+ return regenerate_incorrect_answers(
56
+ self.client, self.model, self.temperature, learning_objectives, file_contents
57
+ )
learning_objective_generator/grouping_and_ranking.py ADDED
@@ -0,0 +1,328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict, Any
2
+ from openai import OpenAI
3
+ import json
4
+ from models import LearningObjective, BaseLearningObjective, GroupedLearningObjectivesResponse, GroupedBaseLearningObjectivesResponse
5
+ from prompts.learning_objectives import BASE_LEARNING_OBJECTIVES_PROMPT, BLOOMS_TAXONOMY_LEVELS, LEARNING_OBJECTIVE_EXAMPLES
6
+
7
+
8
+ def group_learning_objectives(client: OpenAI, model: str, temperature: float, learning_objectives: List[LearningObjective], file_contents: List[str]) -> dict:
9
+ """Group learning objectives and return both the full ranked list and the best-in-group list as Python objects."""
10
+ try:
11
+ print(f"Grouping {len(learning_objectives)} learning objectives")
12
+
13
+ objectives_to_rank = learning_objectives
14
+
15
+ if not objectives_to_rank:
16
+ return learning_objectives # Nothing to rank
17
+
18
+ # Create combined content for context
19
+ combined_content = "\n\n".join(file_contents)
20
+
21
+ # Format the objectives for display in the prompt
22
+ objectives_display = "\n".join([f"ID: {obj.id}\nLearning Objective: {obj.learning_objective}\nSource: {obj.source_reference}\nCorrect Answer: {getattr(obj, 'correct_answer', '')}\nIncorrect Answer Options: {json.dumps(getattr(obj, 'incorrect_answer_options', []))}\n" for obj in objectives_to_rank])
23
+
24
+ # Create prompt for ranking using the same context as generation but without duplicating content
25
+ ranking_prompt = f"""
26
+ The generation prompt below was used to generate the learning objectives and now your job is to group and determine the best in the group. Group according
27
+ to topic overlap, and select the best in the group according to the criteria in the generation prompt.
28
+
29
+
30
+ Here's the generation prompt:
31
+
32
+ <generation prompt>
33
+
34
+ You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials.
35
+
36
+ {BASE_LEARNING_OBJECTIVES_PROMPT}
37
+
38
+ <BloomsTaxonomyLevels>
39
+ {BLOOMS_TAXONOMY_LEVELS}
40
+ </BloomsTaxonomyLevels>
41
+
42
+ Here is an example of high quality learning objectives:
43
+ <learning objectives>
44
+ {LEARNING_OBJECTIVE_EXAMPLES}
45
+ </learning objectives>
46
+
47
+ Use the below course content to assess topic overlap. The source references are embedded in xml tags within the context.
48
+ <course content>
49
+ {combined_content}
50
+ </course content>
51
+
52
+ </generation prompt>
53
+
54
+ The learning objectives below were generated based on the content and criteria in the generation prompt above. Now your task is to group these learning objectives
55
+ based on how well they meet the criteria described in the generation prompt above.
56
+
57
+ IMPORTANT GROUPING INSTRUCTIONS:
58
+ 1. Group learning objectives by similarity, including those that cover the same foundational concept.
59
+ 2. Return a JSON array with each objective's original ID and its group information ("in_group": bool, "group_members": list[int], "best_in_group": bool). See example below.
60
+ 3. Consider clarity, specificity, alignment with the course content, and how well each objective follows the criteria in the generation prompt.
61
+ 4. Identify groups of similar learning objectives that cover essentially the same concept or knowledge area.
62
+ 5. For each objective, indicate whether it belongs to a group of similar objectives by setting "in_group" to true or false.
63
+ 6. For objectives that are part of a group, include a "group_members" field with a list of all IDs in that group (including the objective itself). If an objective is not part of a group, set "group_members" to a list containing only the objective's ID.
64
+ 7. For each objective, add a boolean field "best_in_group": set this to true for the highest-quality objective in each group, and false for all others in the group. For objectives not in a group, set "best_in_group" to true by default.
65
+ 8. SPECIAL INSTRUCTION: All objectives with IDs ending in 1 (like 1001, 2001, etc.) are the first objectives from different generation runs. Group ALL of these together and mark the best one as "best_in_group": true. This is critical for ensuring one of these objectives is selected as the primary objective:
66
+ a. Group ALL objectives with IDs ending in 1 together in the SAME group.
67
+ b. Evaluate these objectives carefully and select the SINGLE best one based on clarity, specificity, and alignment with course content.
68
+ c. Mark ONLY the best one with "best_in_group": true and all others with "best_in_group": false.
69
+ d. This objective will later be assigned ID=1 and will serve as the primary objective, so choose the highest quality one.
70
+ e. If you find other objectives that cover the same concept but don't have IDs ending in 1, include them in this group but do NOT mark them as best_in_group.
71
+ Here are the learning objectives to group:
72
+
73
+ <learning objectives>
74
+ {objectives_display}
75
+ </learning objectives>
76
+
77
+ Return your grouped learning objectives as a JSON array in this format. Each objective must include ALL of the following fields:
78
+ [
79
+ {{
80
+ "id": int,
81
+ "learning_objective": str,
82
+ "source_reference": list[str] or str,
83
+ "correct_answer": str,
84
+ "incorrect_answer_suggestions": list[str],
85
+ "in_group": bool,
86
+ "group_members": list[int],
87
+ "best_in_group": bool
88
+ }},
89
+ ...
90
+ ]
91
+ Example:
92
+ [
93
+ {{
94
+ "id": 3,
95
+ "learning_objective": "Describe the main applications of AI agents.",
96
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
97
+ "correct_answer": "AI agents are used for automation, decision-making, and information retrieval.",
98
+ "incorrect_answer_suggestions": [
99
+ "AI agents are used for automation and data analysis",
100
+ "AI agents are designed for information retrieval and prediction",
101
+ "AI agents are specialized for either automation or decision-making",
102
+ ],
103
+ "in_group": true,
104
+ "group_members": [3, 5, 7],
105
+ "best_in_group": true
106
+ }}
107
+ ]
108
+ """
109
+
110
+ # Use OpenAI beta API for structured output
111
+ try:
112
+
113
+ params = {
114
+ "model": "gpt-5-mini",
115
+ "messages": [
116
+ {"role": "system", "content": "You are an expert educational content evaluator."},
117
+ {"role": "user", "content": ranking_prompt}
118
+ ],
119
+ "response_format": GroupedLearningObjectivesResponse
120
+ }
121
+
122
+ completion = client.beta.chat.completions.parse(**params)
123
+ grouped_results = completion.choices[0].message.parsed.grouped_objectives
124
+ print(f"Received {len(grouped_results)} grouped results")
125
+
126
+
127
+ # Normalize best_in_group to Python bool
128
+ for obj in grouped_results:
129
+ val = getattr(obj, "best_in_group", False)
130
+ if isinstance(val, str):
131
+ obj.best_in_group = val.lower() == "true"
132
+ elif isinstance(val, (bool, int)):
133
+ obj.best_in_group = bool(val)
134
+ else:
135
+ obj.best_in_group = False
136
+ # if id_one_objective:
137
+ # final_objectives[0].best_in_group = True
138
+ # Initialize final_objectives with the grouped results
139
+ final_objectives = []
140
+ for obj in grouped_results:
141
+ final_objectives.append(obj)
142
+
143
+ # Filter for best-in-group objectives (including id==1 always)
144
+ best_in_group_objectives = [obj for obj in final_objectives if getattr(obj, "best_in_group", False) is True]
145
+
146
+ return {
147
+ "all_grouped": final_objectives,
148
+ "best_in_group": best_in_group_objectives
149
+ }
150
+
151
+ except Exception as e:
152
+ print(f"Error ranking learning objectives: {e}")
153
+ return {"all_grouped": learning_objectives, "best_in_group": get_best_in_group_objectives(learning_objectives)}
154
+
155
+
156
+
157
+ except Exception as e:
158
+ print(f"Error ranking learning objectives: {e}")
159
+ return {"all_grouped": learning_objectives, "best_in_group": get_best_in_group_objectives(learning_objectives)}
160
+
161
+ def get_best_in_group_objectives(grouped_objectives: list) -> list:
162
+ """Return only objectives where best_in_group is True or id==1, ensuring Python bools."""
163
+ best_in_group_objectives = []
164
+ for obj in grouped_objectives:
165
+ val = getattr(obj, "best_in_group", False)
166
+ if isinstance(val, str):
167
+ obj.best_in_group = val.lower() == "true"
168
+ elif isinstance(val, (bool, int)):
169
+ obj.best_in_group = bool(val)
170
+ else:
171
+ obj.best_in_group = False
172
+ if obj.best_in_group is True:
173
+ best_in_group_objectives.append(obj)
174
+ return best_in_group_objectives
175
+
176
+
177
+ def group_base_learning_objectives(client: OpenAI, model: str, temperature: float, base_objectives: List[BaseLearningObjective], file_contents: List[str]) -> Dict[str, List]:
178
+ """Group base learning objectives (without incorrect answer options) and return both the full grouped list and the best-in-group list."""
179
+ try:
180
+ print(f"Grouping {len(base_objectives)} base learning objectives")
181
+
182
+ objectives_to_group = base_objectives
183
+
184
+ if not objectives_to_group:
185
+ return {"all_grouped": base_objectives, "best_in_group": base_objectives} # Nothing to group
186
+
187
+ # Create combined content for context
188
+ combined_content = "\n\n".join(file_contents)
189
+
190
+ # Format the objectives for display in the prompt
191
+ objectives_display = "\n".join([f"ID: {obj.id}\nLearning Objective: {obj.learning_objective}\nSource: {obj.source_reference}\nCorrect Answer: {getattr(obj, 'correct_answer', '')}\n" for obj in objectives_to_group])
192
+
193
+ # Create prompt for grouping using the same context as generation but without duplicating content
194
+ grouping_prompt = f"""
195
+ The generation prompt below was used to generate the learning objectives and now your job is to group and determine the best in the group. Group according
196
+ to topic overlap, and select the best in the group according to the criteria in the generation prompt.
197
+
198
+
199
+ Here's the generation prompt:
200
+
201
+ <generation prompt>
202
+
203
+ You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials.
204
+
205
+ {BASE_LEARNING_OBJECTIVES_PROMPT}
206
+
207
+ <BloomsTaxonomyLevels>
208
+ {BLOOMS_TAXONOMY_LEVELS}
209
+ </BloomsTaxonomyLevels>
210
+
211
+ Here is an example of high quality learning objectives:
212
+ <learning objectives>
213
+ {LEARNING_OBJECTIVE_EXAMPLES}
214
+ </learning objectives>
215
+
216
+ Below is the course content. The source references are embedded in xml tags within the context.
217
+ <course content>
218
+ {combined_content}
219
+ </course content>
220
+
221
+ </generation prompt>
222
+
223
+ The learning objectives below were generated based on the content and criteria in the generation prompt above. Now your task is to group these learning objectives
224
+ based on how well they meet the criteria described in the generation prompt above.
225
+
226
+ IMPORTANT GROUPING INSTRUCTIONS:
227
+ 1. Group learning objectives by similarity, including those that cover the same foundational concept.
228
+ 2. Return a JSON array with each objective's original ID and its group information ("in_group": bool, "group_members": list[int], "best_in_group": bool). See example below.
229
+ 3. Consider clarity, specificity, alignment with the course content, and how well each objective follows the criteria in the generation prompt.
230
+ 4. Identify groups of similar learning objectives that cover essentially the same concept or knowledge area.
231
+ 5. For each objective, indicate whether it belongs to a group of similar objectives by setting "in_group" to true or false.
232
+ 6. For objectives that are part of a group, include a "group_members" field with a list of all IDs in that group (including the objective itself). If an objective is not part of a group, set "group_members" to a list containing only the objective's ID.
233
+ 7. For each objective, add a boolean field "best_in_group": set this to true for the highest-quality objective in each group, and false for all others in the group. For objectives not in a group, set "best_in_group" to true by default.
234
+ 8. SPECIAL INSTRUCTION: All objectives with IDs ending in 1 (like 1001, 2001, etc.) are the first objectives from different generation runs. Group ALL of these together and mark the best one as "best_in_group": true. This is critical for ensuring one of these objectives is selected as the primary objective:
235
+ a. Group ALL objectives with IDs ending in 1 together in the SAME group.
236
+ b. Evaluate these objectives carefully and select the SINGLE best one based on clarity, specificity, and alignment with course content.
237
+ c. Mark ONLY the best one with "best_in_group": true and all others with "best_in_group": false.
238
+ d. This objective will later be assigned ID=1 and will serve as the primary objective, so choose the highest quality one.
239
+ e. If you find other objectives that cover the same concept but don't have IDs ending in 1, include them in this group but do NOT mark them as best_in_group.
240
+
241
+ Here are the learning objectives to group:
242
+ <learning_objectives>
243
+ {objectives_display}
244
+ </learning_objectives>
245
+
246
+ Your response should be a JSON array of objects with this structure:
247
+ [
248
+ {{
249
+ "id": int,
250
+ "learning_objective": str,
251
+ "source_reference": Union[List[str], str],
252
+ "correct_answer": str,
253
+ "in_group": bool,
254
+ "group_members": list[int],
255
+ "best_in_group": bool
256
+ }},
257
+ ...
258
+ ]
259
+ Example:
260
+ [
261
+ {{
262
+ "id": 3,
263
+ "learning_objective": "Describe the main applications of AI agents.",
264
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
265
+ "correct_answer": "AI agents are used for automation, decision-making, and information retrieval.",
266
+ "in_group": true,
267
+ "group_members": [3, 5, 7],
268
+ "best_in_group": true
269
+ }}
270
+ ]
271
+ """
272
+
273
+ # Use OpenAI beta API for structured output
274
+ try:
275
+ params = {
276
+ "model": "gpt-5-mini",
277
+ "messages": [
278
+ {"role": "system", "content": "You are an expert educational content evaluator."},
279
+ {"role": "user", "content": grouping_prompt}
280
+ ],
281
+ "response_format": GroupedBaseLearningObjectivesResponse
282
+ }
283
+
284
+ completion = client.beta.chat.completions.parse(**params)
285
+ grouped_results = completion.choices[0].message.parsed.grouped_objectives
286
+ print(f"Received {len(grouped_results)} grouped results")
287
+
288
+ # Normalize best_in_group to Python bool
289
+ for obj in grouped_results:
290
+ val = getattr(obj, "best_in_group", False)
291
+ if isinstance(val, str):
292
+ obj.best_in_group = val.lower() == "true"
293
+ elif isinstance(val, (bool, int)):
294
+ obj.best_in_group = bool(val)
295
+ else:
296
+ obj.best_in_group = False
297
+
298
+ # Initialize final_objectives with the grouped results
299
+ final_objectives = []
300
+ for obj in grouped_results:
301
+ final_objectives.append(obj)
302
+
303
+ # Filter for best-in-group objectives (including id==1 always)
304
+ best_in_group_objectives = [obj for obj in final_objectives if getattr(obj, "best_in_group", False) is True]
305
+
306
+ return {
307
+ "all_grouped": final_objectives,
308
+ "best_in_group": best_in_group_objectives
309
+ }
310
+
311
+ except Exception as e:
312
+ print(f"Error grouping base learning objectives: {e}")
313
+ # If there's an error, just mark all objectives as best-in-group
314
+ for obj in base_objectives:
315
+ obj.in_group = False
316
+ obj.group_members = [obj.id]
317
+ obj.best_in_group = True
318
+ return {"all_grouped": base_objectives, "best_in_group": base_objectives}
319
+
320
+ except Exception as e:
321
+ print(f"Error grouping base learning objectives: {e}")
322
+ # If there's an error, just mark all objectives as best-in-group
323
+ for obj in base_objectives:
324
+ obj.in_group = False
325
+ obj.group_members = [obj.id]
326
+ obj.best_in_group = True
327
+ return {"all_grouped": base_objectives, "best_in_group": base_objectives}
328
+
learning_objective_generator/suggestion_improvement.py ADDED
@@ -0,0 +1,393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Tuple
2
+ import os
3
+ import json
4
+ from openai import OpenAI
5
+ from models import LearningObjective
6
+ from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
7
+
8
+ def _get_run_manager():
9
+ """Get run manager if available, otherwise return None."""
10
+ try:
11
+ from ui.run_manager import get_run_manager
12
+ return get_run_manager()
13
+ except:
14
+ return None
15
+
16
+ def should_regenerate_individual_suggestion(client: OpenAI, model: str, temperature: float,
17
+ learning_objective: LearningObjective,
18
+ option: str,
19
+ file_contents: List[str]) -> Tuple[bool, str]:
20
+ """
21
+ Check if an individual incorrect answer option needs regeneration.
22
+
23
+ Args:
24
+ client: OpenAI client
25
+ model: Model name to use for regeneration
26
+ temperature: Temperature for generation
27
+ learning_objective: Learning objective to check
28
+ option: The individual option to check
29
+ file_contents: List of file contents with source tags
30
+
31
+ Returns:
32
+ Tuple of (needs_regeneration, reason)
33
+ """
34
+ # Extract relevant content from file_contents
35
+ combined_content = ""
36
+ if hasattr(learning_objective, 'source_reference') and learning_objective.source_reference:
37
+ source_references = learning_objective.source_reference if isinstance(learning_objective.source_reference, list) else [learning_objective.source_reference]
38
+
39
+ for source_file in source_references:
40
+ for file_content in file_contents:
41
+ if f"<source file='{source_file}'>" in file_content:
42
+ if combined_content:
43
+ combined_content += "\n\n"
44
+ combined_content += file_content
45
+ break
46
+
47
+ # If no content found, use all content
48
+ if not combined_content:
49
+ combined_content = "\n\n".join(file_contents)
50
+
51
+ # Create a prompt to evaluate the individual suggestion
52
+ prompt = f"""
53
+ You are evaluating the quality of an incorrect answer suggestion for a learning objective. You are going to the incorrect answer option and determine if it needs to be regenerated.
54
+
55
+ Learning Objective: {learning_objective.learning_objective}
56
+
57
+ Use the correct answer to help you make informed decisions:
58
+
59
+ Correct Answer: {learning_objective.correct_answer}
60
+
61
+ Incorrect Answer Option to Evaluate: {option}
62
+
63
+ Use the relevant content from the course content to help you make informed decisions:
64
+
65
+ COURSE CONTENT:
66
+ {combined_content}
67
+
68
+ Here are some examples of high quality incorrect answer suggestions which you should use to make informed decisions about whether regeneration of options is needed:
69
+ <incorrect_answer_examples_with_explanation>
70
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
71
+ </incorrect_answer_examples_with_explanation>
72
+
73
+ Based on the above examples, evaluate this incorrect answer suggestion.
74
+ Respond with TRUE if the incorrect answer suggestion needs regeneration, or FALSE if it is good quality.
75
+ If TRUE, briefly explain why regeneration is needed in this format: "true – reason for regeneration". Cite the examples with explanation that you used to make your decision.
76
+ If FALSE, respond with just "false".
77
+ """
78
+
79
+ # Use a lightweight model for evaluation
80
+ params = {
81
+ "model": "gpt-5-mini",
82
+ "messages": [
83
+ {"role": "system", "content": "You are an expert in educational assessment design and will determine if an incorrect answer option needs to be regenerated according to a set of quality standards, and examples of good and bad incorrect answer options."},
84
+ {"role": "user", "content": prompt}
85
+ ]
86
+ }
87
+
88
+ try:
89
+ completion = client.chat.completions.create(**params)
90
+ response_text = completion.choices[0].message.content.strip().lower()
91
+
92
+ # Check if regeneration is needed and extract reason
93
+ needs_regeneration = response_text.startswith("true")
94
+ reason = ""
95
+
96
+ if needs_regeneration and "–" in response_text:
97
+ parts = response_text.split("–", 1)
98
+ if len(parts) > 1:
99
+ reason = "– " + parts[1].strip()
100
+
101
+ # Log the evaluation result
102
+ run_manager = _get_run_manager()
103
+ if needs_regeneration:
104
+ # # Create debug directory if it doesn't exist
105
+ # debug_dir = os.path.join("incorrect_suggestion_debug")
106
+ # os.makedirs(debug_dir, exist_ok=True)
107
+
108
+ # suggestion_id = learning_objective.incorrect_answer_options.index(suggestion) if suggestion in learning_objective.incorrect_answer_options else "unknown"
109
+ # with open(os.path.join(debug_dir, f"lo_{learning_objective.id}_suggestion_{suggestion_id}_evaluation.txt"), "w") as f:
110
+ # f.write(f"Learning Objective: {learning_objective.learning_objective}\n")
111
+ # f.write(f"Correct Answer: {learning_objective.correct_answer}\n")
112
+ # f.write(f"Incorrect Answer Option: {option}\n\n")
113
+ # f.write(f"Evaluation Response: {response_text}\n")
114
+
115
+ if run_manager:
116
+ run_manager.log(f"Option '{option[:50]}...' needs regeneration: True - {reason}", level="DEBUG")
117
+ else:
118
+ print(f"Option '{option[:50]}...' needs regeneration: True - {reason}")
119
+ else:
120
+ if run_manager:
121
+ run_manager.log(f"Option '{option[:50]}...' is good quality, keeping as is", level="DEBUG")
122
+ else:
123
+ print(f"Option '{option[:50]}...' is good quality, keeping as is")
124
+
125
+ return needs_regeneration, reason
126
+
127
+ except Exception as e:
128
+ run_manager = _get_run_manager()
129
+ if run_manager:
130
+ run_manager.log(f"Error evaluating option '{option[:50]}...': {e}", level="ERROR")
131
+ else:
132
+ print(f"Error evaluating option '{option[:50]}...': {e}")
133
+ # If there's an error, assume regeneration is needed with a generic reason
134
+ return True, "– error during evaluation"
135
+
136
+ def regenerate_individual_suggestion(client: OpenAI, model: str, temperature: float,
137
+ learning_objective: LearningObjective,
138
+ option_to_replace: str,
139
+ file_contents: List[str],
140
+ reason: str = "") -> str:
141
+ """
142
+ Regenerate an individual incorrect answer option.
143
+
144
+ Args:
145
+ client: OpenAI client
146
+ model: Model name to use for regeneration
147
+ temperature: Temperature for generation
148
+ learning_objective: Learning objective containing the option
149
+ option_to_replace: The incorrect answer option to replace
150
+ file_contents: List of file contents with source tags
151
+ reason: The reason for regeneration (optional)
152
+
153
+ Returns:
154
+ A new incorrect answer option
155
+ """
156
+ run_manager = _get_run_manager()
157
+ if run_manager:
158
+ run_manager.log(f"Regenerating suggestion for learning objective {learning_objective.id}", level="DEBUG")
159
+ else:
160
+ print(f"Regenerating suggestion for learning objective {learning_objective.id}")
161
+
162
+ # Extract relevant content from file_contents
163
+ combined_content = ""
164
+ if hasattr(learning_objective, 'source_reference') and learning_objective.source_reference:
165
+ source_references = learning_objective.source_reference if isinstance(learning_objective.source_reference, list) else [learning_objective.source_reference]
166
+
167
+ for source_file in source_references:
168
+ for file_content in file_contents:
169
+ if f"<source file='{source_file}'>" in file_content:
170
+ if combined_content:
171
+ combined_content += "\n\n"
172
+ combined_content += file_content
173
+ break
174
+
175
+ # If no content found, use all content
176
+ if not combined_content:
177
+ combined_content = "\n\n".join(file_contents)
178
+
179
+ # If no reason provided, use a default one
180
+ if not reason:
181
+ reason = "– no reason provided"
182
+
183
+ # Create a prompt to regenerate the suggestion
184
+ prompt = f"""
185
+ You are generating a high-quality incorrect answer option for a learning objective.
186
+
187
+ Consider the learning objective and it's correct answer to generate an incorrect answer option.
188
+
189
+ Learning Objective: {learning_objective.learning_objective}
190
+ Correct Answer: {learning_objective.correct_answer}
191
+
192
+ Current Incorrect Answer Options:
193
+ {json.dumps(learning_objective.incorrect_answer_options, indent=2)}
194
+
195
+ The following option needs improvement: {option_to_replace}
196
+
197
+ Consider the following reason for improvement in order to make the option better: {reason}
198
+
199
+ Use the relevant content from the course content to help you make informed decisions:
200
+
201
+ COURSE CONTENT:
202
+ {combined_content}
203
+
204
+
205
+ Refer to the examples with explanation below to generate a new incorrect answer option:
206
+ <incorrect_answer_examples_with_explanation>
207
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
208
+ </incorrect_answer_examples_with_explanation>
209
+
210
+ Based on the above quality standards and examples, generate a new incorrect answer option.
211
+ Provide ONLY the new incorrect answer option, with no additional explanation.
212
+ """
213
+
214
+ # # Use the specified model for regeneration
215
+ # params = {
216
+ # "model": model,
217
+ # "messages": [
218
+ # {"role": "system", "content": "You are an expert in educational assessment design."},
219
+ # {"role": "user", "content": prompt}
220
+ # ],
221
+ # "temperature": temperature
222
+ # }
223
+
224
+ params = {
225
+ "model": "gpt-5-mini",
226
+ "messages": [
227
+ {"role": "system", "content": "You are an expert in educational assessment design. You will generate a new incorrect answer option for a learning objective based on a set of quality standards, and examples of good and bad incorrect answer options."},
228
+ {"role": "user", "content": prompt}
229
+ ]
230
+ }
231
+
232
+ try:
233
+ completion = client.chat.completions.create(**params)
234
+ new_suggestion = completion.choices[0].message.content.strip()
235
+
236
+ # Only create debug files if the suggestion actually changed
237
+ run_manager = _get_run_manager()
238
+ if new_suggestion != option_to_replace:
239
+ # Create debug directory if it doesn't exist
240
+ debug_dir = os.path.join("incorrect_suggestion_debug")
241
+ os.makedirs(debug_dir, exist_ok=True)
242
+
243
+ # Log the regeneration in the question-style format
244
+ suggestion_id = learning_objective.incorrect_answer_options.index(option_to_replace) if option_to_replace in learning_objective.incorrect_answer_options else "unknown"
245
+
246
+ # Format the log message in the same format as question regeneration
247
+ log_message = f"""Learning Objective ID: {learning_objective.id}
248
+ Learning Objective: {learning_objective.learning_objective}
249
+
250
+ REASON FOR REGENERATION:
251
+ {reason}
252
+
253
+ BEFORE:
254
+ Option Text: {option_to_replace}
255
+ Feedback: Incorrect answer representing a common misconception.
256
+
257
+ AFTER:
258
+ Option Text: {new_suggestion}
259
+ Feedback: Incorrect answer representing a common misconception.
260
+ """
261
+
262
+ # Write to the log file
263
+ log_file = os.path.join(debug_dir, f"lo_{learning_objective.id}_suggestion_{suggestion_id}.txt")
264
+ with open(log_file, "w") as f:
265
+ f.write(log_message)
266
+
267
+ # Also log to run manager
268
+ if run_manager:
269
+ run_manager.log(f"Regenerated Option for Learning Objective {learning_objective.id}, Option {suggestion_id}", level="DEBUG")
270
+ run_manager.log(f"BEFORE: {option_to_replace[:80]}...", level="DEBUG")
271
+ run_manager.log(f"AFTER: {new_suggestion[:80]}...", level="DEBUG")
272
+ run_manager.log(f"Log saved to {log_file}", level="DEBUG")
273
+ else:
274
+ print(f"\n--- Regenerated Option for Learning Objective {learning_objective.id}, Option {suggestion_id} ---")
275
+ print(f"BEFORE: {option_to_replace}")
276
+ print(f"AFTER: {new_suggestion}")
277
+ print(f"Log saved to {log_file}")
278
+ else:
279
+ if run_manager:
280
+ run_manager.log(f"Generated option is identical to original, not saving debug file", level="DEBUG")
281
+ else:
282
+ print(f"Generated option is identical to original, not saving debug file")
283
+
284
+ return new_suggestion
285
+
286
+ except Exception as e:
287
+ run_manager = _get_run_manager()
288
+ if run_manager:
289
+ run_manager.log(f"Error regenerating option: {e}", level="ERROR")
290
+ else:
291
+ print(f"Error regenerating option: {e}")
292
+ # If there's an error, return the original option
293
+ return option_to_replace
294
+
295
+ def regenerate_incorrect_answers(client: OpenAI, model: str, temperature: float,
296
+ learning_objectives: List[LearningObjective],
297
+ file_contents: List[str]) -> List[LearningObjective]:
298
+ """
299
+ Regenerate incorrect answer options for all learning objectives.
300
+
301
+ Args:
302
+ client: OpenAI client
303
+ model: Model name to use for regeneration
304
+ temperature: Temperature for generation
305
+ learning_objectives: List of learning objectives to improve
306
+ file_contents: List of file contents with source tags
307
+
308
+ Returns:
309
+ The same list of learning objectives with improved incorrect answer options
310
+ """
311
+ run_manager = _get_run_manager()
312
+ if run_manager:
313
+ run_manager.log(f"Regenerating incorrect answers for {len(learning_objectives)} learning objectives", level="INFO")
314
+ else:
315
+ print(f"Regenerating incorrect answers for {len(learning_objectives)} learning objectives")
316
+
317
+ for i, lo in enumerate(learning_objectives):
318
+ if run_manager:
319
+ run_manager.log(f"Processing learning objective {i+1}/{len(learning_objectives)}: {lo.id}", level="INFO")
320
+ else:
321
+ print(f"Processing learning objective {i+1}/{len(learning_objectives)}: {lo.id}")
322
+
323
+ # Check each suggestion individually
324
+ if lo.incorrect_answer_options:
325
+ new_suggestions = []
326
+ for j, option in enumerate(lo.incorrect_answer_options):
327
+ # Check if this specific suggestion needs regeneration
328
+ needs_regeneration, reason = should_regenerate_individual_suggestion(client, model, temperature, lo, option, file_contents)
329
+
330
+ if needs_regeneration:
331
+ # Regenerate this specific suggestion with the reason
332
+ if run_manager:
333
+ run_manager.log(f"Regenerating option '{option[:50]}...' for learning objective {lo.id}", level="INFO")
334
+ else:
335
+ print(f"Regenerating option '{option[:50]}...' for learning objective {lo.id}")
336
+
337
+ # Initialize variables for the regeneration loop
338
+ current_option = option
339
+ max_iterations = 5
340
+ iteration = 0
341
+
342
+ # Loop until we get a good option or reach max iterations
343
+ while needs_regeneration and iteration < max_iterations:
344
+ iteration += 1
345
+ if run_manager:
346
+ run_manager.log(f" Regeneration attempt {iteration}/{max_iterations}", level="INFO")
347
+ else:
348
+ print(f" Regeneration attempt {iteration}/{max_iterations}")
349
+
350
+ # Regenerate the option
351
+ new_option = regenerate_individual_suggestion(client, model, temperature, lo, current_option, file_contents, reason)
352
+
353
+ # Check if the new option still needs regeneration
354
+ if iteration < max_iterations: # Skip check on last iteration to save API calls
355
+ needs_regeneration, new_reason = should_regenerate_individual_suggestion(client, model, temperature, lo, new_option, file_contents)
356
+ if needs_regeneration:
357
+ if run_manager:
358
+ run_manager.log(f" Regenerated option still needs improvement: {new_reason}", level="DEBUG")
359
+ else:
360
+ print(f" Regenerated option still needs improvement: {new_reason}")
361
+ current_option = new_option
362
+ reason = new_reason
363
+ else:
364
+ if run_manager:
365
+ run_manager.log(f" Regenerated option passes quality check on attempt {iteration}", level="INFO")
366
+ else:
367
+ print(f" Regenerated option passes quality check on attempt {iteration}")
368
+ else:
369
+ needs_regeneration = False
370
+
371
+ # Use the final regenerated option
372
+ new_suggestions.append(new_option)
373
+ else:
374
+ # Keep the original suggestion
375
+ if run_manager:
376
+ run_manager.log(f"Keeping original option '{option[:50]}...' for learning objective {lo.id}", level="INFO")
377
+ else:
378
+ print(f"Keeping original option '{option[:50]}...' for learning objective {lo.id}")
379
+ new_suggestions.append(option)
380
+
381
+ # Update the learning objective with the new suggestions
382
+ lo.incorrect_answer_options = new_suggestions
383
+ else:
384
+ # If there are no suggestions, generate completely new ones
385
+ if run_manager:
386
+ run_manager.log(f"No incorrect answer options found for learning objective {lo.id}, generating new ones", level="INFO")
387
+ else:
388
+ print(f"No incorrect answer options found for learning objective {lo.id}, generating new ones")
389
+ # This would typically call back to the enhancement.py function, but to avoid circular imports,
390
+ # we'll just leave it empty and let the next generation cycle handle it
391
+ lo.incorrect_answer_options = []
392
+
393
+ return learning_objectives
models/__init__.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Learning objectives
2
+ from .learning_objectives import (
3
+ BaseLearningObjective,
4
+ LearningObjective,
5
+ GroupedLearningObjective,
6
+ GroupedBaseLearningObjective,
7
+ LearningObjectivesResponse,
8
+ GroupedLearningObjectivesResponse,
9
+ GroupedBaseLearningObjectivesResponse,
10
+ BaseLearningObjectiveWithoutCorrectAnswer, BaseLearningObjectivesWithoutCorrectAnswerResponse
11
+ )
12
+
13
+ # Config
14
+ from .config import (MODELS, TEMPERATURE_UNAVAILABLE)
15
+
16
+ # Questions
17
+ from .questions import (
18
+ MultipleChoiceOption,
19
+ MultipleChoiceQuestion,
20
+ RankedNoGroupMultipleChoiceQuestion,
21
+ RankedMultipleChoiceQuestion,
22
+ GroupedMultipleChoiceQuestion,
23
+ MultipleChoiceQuestionFromFeedback,
24
+ RankedNoGroupMultipleChoiceQuestionsResponse,
25
+ RankedMultipleChoiceQuestionsResponse,
26
+ GroupedMultipleChoiceQuestionsResponse
27
+ )
28
+
29
+ # Assessment
30
+ from .assessment import Assessment
31
+
32
+ __all__ = [
33
+ # Learning objectives
34
+ 'BaseLearningObjective',
35
+ 'LearningObjective',
36
+ 'GroupedLearningObjective',
37
+ 'GroupedBaseLearningObjective',
38
+ 'LearningObjectivesResponse',
39
+ 'GroupedLearningObjectivesResponse',
40
+ 'GroupedBaseLearningObjectivesResponse',
41
+ 'BaseLearningObjectiveWithoutCorrectAnswer', 'BaseLearningObjectivesWithoutCorrectAnswerResponse',
42
+
43
+ # Config
44
+ 'MODELS',
45
+ 'TEMPERATURE_UNAVAILABLE',
46
+
47
+ # Questions
48
+ 'MultipleChoiceOption',
49
+ 'MultipleChoiceQuestion',
50
+ 'RankedNoGroupMultipleChoiceQuestion',
51
+ 'RankedMultipleChoiceQuestion',
52
+ 'GroupedMultipleChoiceQuestion',
53
+ 'MultipleChoiceQuestionFromFeedback',
54
+ 'RankedNoGroupMultipleChoiceQuestionsResponse',
55
+ 'RankedMultipleChoiceQuestionsResponse',
56
+ 'GroupedMultipleChoiceQuestionsResponse',
57
+
58
+ # Assessment
59
+ 'Assessment'
60
+ ]
models/assessment.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from pydantic import BaseModel, Field
3
+ from .learning_objectives import LearningObjective
4
+ from .questions import RankedMultipleChoiceQuestion
5
+
6
+
7
+ class Assessment(BaseModel):
8
+ """Model for an assessment."""
9
+ learning_objectives: List[LearningObjective] = Field(description="List of learning objectives")
10
+ questions: List[RankedMultipleChoiceQuestion] = Field(description="List of ranked questions")
models/config.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ MODELS = ["o3-mini","o1","gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5.2", "gpt-5.1", "gpt-5", "gpt-5-mini", "gpt-5-nano"]
2
+
3
+ TEMPERATURE_UNAVAILABLE = {"o3-mini": True,"o1": True,"gpt-4.1": False, "gpt-4o": False, "gpt-4o-mini": False, "gpt-4": False, "gpt-3.5-turbo": False, "gpt-5.2": True, "gpt-5.1": True, "gpt-5": True, "gpt-5-mini": True, "gpt-5-nano": True}
4
+
models/learning_objectives.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Optional, Union
2
+ from pydantic import BaseModel, Field
3
+
4
+
5
+ class BaseLearningObjectiveWithoutCorrectAnswer(BaseModel):
6
+ """Model for a learning objective without a correct answer."""
7
+ id: int = Field(description="Unique identifier for the learning objective")
8
+ learning_objective: str = Field(description="Description of the learning objective")
9
+ source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
10
+
11
+
12
+ class BaseLearningObjective(BaseModel):
13
+ """Model for a learning objective."""
14
+ id: int = Field(description="Unique identifier for the learning objective")
15
+ learning_objective: str = Field(description="Description of the learning objective")
16
+ source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
17
+ correct_answer: str = Field(description="Correct answer to the learning objective")
18
+
19
+
20
+ class LearningObjective(BaseModel):
21
+ """Model for a learning objective."""
22
+ id: int = Field(description="Unique identifier for the learning objective")
23
+ learning_objective: str = Field(description="Description of the learning objective")
24
+ source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
25
+ correct_answer: str = Field(description="Correct answer to the learning objective")
26
+ incorrect_answer_options: Union[List[str], str] = Field(description="A list of five incorrect answer options")
27
+ in_group: Optional[bool] = Field(default=None, description="Whether this objective is part of a group")
28
+ group_members: Optional[List[int]] = Field(default=None, description="List of IDs of objectives in the same group")
29
+ best_in_group: Optional[bool] = Field(default=None, description="Whether this is the best objective in its group")
30
+
31
+
32
+ class GroupedLearningObjective(LearningObjective):
33
+ """Model for a learning objective that has been grouped."""
34
+ in_group: bool = Field(description="Whether this objective is part of a group of similar objectives")
35
+ group_members: List[int] = Field(description="List of IDs of all objectives in the same similarity group, including this one")
36
+ best_in_group: bool = Field(description="True if this objective is the highest ranked in its group")
37
+
38
+
39
+ class GroupedBaseLearningObjective(BaseLearningObjective):
40
+ """Model for a base learning objective that has been grouped (without incorrect answer suggestions)."""
41
+ in_group: bool = Field(description="Whether this objective is part of a group of similar objectives")
42
+ group_members: List[int] = Field(description="List of IDs of all objectives in the same similarity group, including this one")
43
+ best_in_group: bool = Field(description="True if this objective is the highest ranked in its group")
44
+
45
+
46
+ # Response models for learning objectives
47
+ class BaseLearningObjectivesWithoutCorrectAnswerResponse(BaseModel):
48
+ objectives: List[BaseLearningObjectiveWithoutCorrectAnswer] = Field(description="List of learning objectives without correct answers")
49
+
50
+ class LearningObjectivesResponse(BaseModel):
51
+ objectives: List[LearningObjective] = Field(description="List of learning objectives")
52
+
53
+
54
+ class GroupedLearningObjectivesResponse(BaseModel):
55
+ grouped_objectives: List[GroupedLearningObjective] = Field(description="List of grouped learning objectives")
56
+
57
+
58
+ class GroupedBaseLearningObjectivesResponse(BaseModel):
59
+ grouped_objectives: List[GroupedBaseLearningObjective] = Field(description="List of grouped base learning objectives")
models/questions.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Optional, Union
2
+ from pydantic import BaseModel, Field
3
+
4
+
5
+ class MultipleChoiceOption(BaseModel):
6
+ """Model for a multiple choice option."""
7
+ option_text: str = Field(description="Text of the option")
8
+ is_correct: bool = Field(description="Whether this option is correct")
9
+ feedback: str = Field(description="Feedback for this option")
10
+
11
+
12
+ class MultipleChoiceQuestion(BaseModel):
13
+ """Model for a multiple choice question."""
14
+ id: int = Field(description="Unique identifier for the question")
15
+ question_text: str = Field(description="Text of the question")
16
+ options: List[MultipleChoiceOption] = Field(description="List of options for the question")
17
+ learning_objective_id: int = Field(description="ID of the learning objective this question addresses")
18
+ learning_objective: str = Field(description="Learning objective this question addresses")
19
+ correct_answer: str = Field(description="Correct answer to the question")
20
+ source_reference: Union[List[str], str] = Field(description="Paths to the files from which this question was extracted")
21
+ judge_feedback: Optional[str] = Field(None, description="Feedback from the LLM judge")
22
+ approved: Optional[bool] = Field(None, description="Whether this question has been approved by the LLM judge")
23
+
24
+
25
+ class RankedNoGroupMultipleChoiceQuestion(MultipleChoiceQuestion):
26
+ """Model for a multiple choice question that has been ranked but not grouped."""
27
+ rank: int = Field(description="Rank assigned to the question (1 = best)")
28
+ ranking_reasoning: str = Field(description="Reasoning for the assigned rank")
29
+
30
+
31
+ class RankedMultipleChoiceQuestion(MultipleChoiceQuestion):
32
+ """Model for a multiple choice question that has been ranked."""
33
+ rank: int = Field(description="Rank assigned to the question (1 = best)")
34
+ ranking_reasoning: str = Field(description="Reasoning for the assigned rank")
35
+ in_group: bool = Field(description="Whether this question is part of a group of similar questions")
36
+ group_members: List[int] = Field(description="IDs of questions in the same group")
37
+ best_in_group: bool = Field(description="Whether this is the best question in its group")
38
+
39
+
40
+ class GroupedMultipleChoiceQuestion(MultipleChoiceQuestion):
41
+ """Model for a multiple choice question that has been grouped but not ranked."""
42
+ in_group: bool = Field(description="Whether this question is part of a group of similar questions")
43
+ group_members: List[int] = Field(description="IDs of questions in the same group")
44
+ best_in_group: bool = Field(description="Whether this is the best question in its group")
45
+
46
+
47
+ class MultipleChoiceQuestionFromFeedback(BaseModel):
48
+ """Model for a multiple choice question."""
49
+ id: int = Field(description="Unique identifier for the question")
50
+ question_text: str = Field(description="Text of the question")
51
+ options: List[MultipleChoiceOption] = Field(description="List of options for the question")
52
+ learning_objective: str = Field(description="Learning objective this question addresses")
53
+ source_reference: Union[List[str], str] = Field(description="Paths to the files from which this question was extracted")
54
+ feedback: str = Field(description="User criticism for this question, this will be found at the bottom of <QUESTION FOLLOWED BY USER CRITICISM> and it is a criticism of something which suggests a change.")
55
+
56
+
57
+ # Response models for questions
58
+ class RankedNoGroupMultipleChoiceQuestionsResponse(BaseModel):
59
+ ranked_questions: List[RankedNoGroupMultipleChoiceQuestion] = Field(description="List of ranked multiple choice questions without grouping")
60
+
61
+
62
+ class RankedMultipleChoiceQuestionsResponse(BaseModel):
63
+ ranked_questions: List[RankedMultipleChoiceQuestion] = Field(description="List of ranked multiple choice questions")
64
+
65
+
66
+ class GroupedMultipleChoiceQuestionsResponse(BaseModel):
67
+ grouped_questions: List[GroupedMultipleChoiceQuestion] = Field(description="List of grouped multiple choice questions")
prompts/__init__.py ADDED
File without changes
prompts/all_quality_standards.py ADDED
File without changes
prompts/incorrect_answers.py ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION = """
4
+
5
+
6
+ ## 1. Here are examples of inappropriate incorrect answer options that demonstrate the importance of avoiding absolute terms and unnecessary comparisons because these terms and comparisons tend to lead to obviously incorrect options. Specifically, you should avoid using words like "always," "only", "solely", "never", "mainly", "exclusively", "primarily" or phrases like "rather than", "instead of", "regardless of", "as opposed to" . These words are absolute or extreme qualifiers and comparative terms that make it too easy to recognize incorrect answer options.
7
+ More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
8
+ <example>
9
+ Learning Objective: "Explain the purpose of index partitioning in databases."
10
+ Correct Answer: "Index partitioning improves query performance by dividing large indexes into smaller, more manageable segments."
11
+ Inappropriate incorrect answer options to avoid:
12
+
13
+ "Index partitioning always guarantees the fastest possible query performance regardless of data size." (Obviously wrong because it uses absolute terms "always" and "regardless of")
14
+ "Index partitioning improves query performance rather than ensuring data integrity or providing backup functionality." (Obviously wrong with the unnecessary comparison using "rather than")
15
+ "Index partitioning exclusively supports distributed database systems that never operate on a single server." (Obviously wrong because it uses absolute terms "exclusively" and "never")
16
+
17
+ Appropriate incorrect answer option:
18
+
19
+ "Index partitioning improves query performance by distributing data across multiple servers to reduce network latency." (Confuses partitioning with distributed computing but avoids absolute terms)
20
+
21
+ </example>
22
+
23
+
24
+ ## 2. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when creating distractors that use explicit negation patterns with key terms from the learning objective.
25
+ Rule: Flag options that contain explicit negation patterns combined with key terms from the learning objective.
26
+ Detect these specific patterns:
27
+
28
+ "[Topic/Concept] without [key term]"
29
+ "[Topic/Concept] by avoiding [key action]"
30
+ "[Topic/Concept] by minimizing [core concept]"
31
+ "[Topic/Concept] that skips [essential process]"
32
+ "[Topic/Concept] by eliminating [key component]"
33
+
34
+ Common negation words to watch for: without, avoiding, minimizing, skipping, eliminating, excluding, preventing, reducing, limiting (when used with core concepts)
35
+ Do NOT flag: Options that swap roles/behaviors between related concepts or assign different characteristics to similar methods.
36
+ <example>
37
+ Learning Objective: "Identify the main purpose of evaluation-driven development in the context of AI agents."
38
+ Correct Answer: "The main purpose of evaluation-driven development is to iteratively improve AI agents by using structured evaluations to guide development and ensure consistent performance."
39
+ Inappropriate incorrect answer options with explicit negation patterns:
40
+
41
+ "The main purpose of evaluation-driven development is to automate all agent decisions by relying solely on predefined rules and minimizing the need for ongoing evaluation." (Pattern: "[Topic] by minimizing [key term]")
42
+ "The main purpose of evaluation-driven development is to accelerate agent deployment by skipping iterative testing and focusing on rapid feature addition." (Pattern: "[Topic] by skipping [essential process]")
43
+
44
+ Appropriate incorrect answer option:
45
+
46
+ "The main purpose of evaluation-driven development is to establish standardized benchmarks that ensure consistent performance across different AI agent architectures." (No negation patterns - legitimate misconception)
47
+
48
+ </example>
49
+
50
+ ## 3. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when using "but" clauses that explicitly negate the core concept in the learning objective, making the options obviously wrong.
51
+
52
+ Avoid contradictory second clauses - Don't add qualifying phrases that explicitly negate the main benefit or create obvious limitations
53
+ Keep second clauses supportive - If you include a second clause, it should reinforce the incorrect direction, not contradict it
54
+ * Look for explicit negations using "without," "lacking", "rather than," "instead of," "but not," "but", "except", or "excluding" that directly contradict the core concept
55
+
56
+ <example>
57
+ Learning Objective: "Explain why observability and tracing are important when developing and evaluating AI agents."
58
+ Correct Answer: "Observability and tracing provide detailed visibility into every step taken by an agent, making it easier to debug, monitor performance, and evaluate agent behavior systematically."
59
+ Inappropriate incorrect answer options with problematic "but" clauses:
60
+
61
+ "Observability and tracing facilitate real-time scaling of agent components in response to usage spikes, improving resource management but not revealing detailed decision logic." (Uses "but not revealing" to explicitly negate a core benefit of observability)
62
+ "Observability and tracing collect aggregated performance metrics and logs for monitoring agent uptime and error trends, making them useful for system health checks but insufficient for diagnosing specific agent decision flows." (Uses "but insufficient for diagnosing" to directly contradict the debugging purpose)
63
+ "Observability and tracing provide high-level summaries of agent outputs, which are useful for presenting results but do not address debugging or understanding the agent's internal processes." (Uses "but do not address debugging" to explicitly exclude the main benefit)
64
+
65
+ Appropriate incorrect answer option:
66
+
67
+ "Observability and tracing provide comprehensive logging capabilities that help teams maintain detailed audit trails for compliance and regulatory reporting requirements." (No contradictory second clause)
68
+
69
+ </example>
70
+
71
+
72
+ ## 4. Here is an example of inappropriate incorrect answer options that demonstrates what to avoid when adding unnecessary clauses that extend beyond the core misconception being tested.
73
+ Rule: Avoid compound sentences where the second clause introduces additional consequences, effects, or elaborations that are not essential to the primary misconception.
74
+ Look for these patterns that indicate unnecessary length:
75
+
76
+ - Clauses beginning with "which," "that," "providing," "enabling," "reducing," "ensuring"
77
+ - Phrases connected by commas that add consequences or effects
78
+ - Additional explanations after the main misconception is established
79
+
80
+ As you can see the first example, the sentence in the incorrect answer option has two parts. The second part "reducing the complexity of data transformations in automations." must be discarded and you should regard any such type of sentence as inappropriate. This is because the option could have been considered complete without the second part.
81
+ In the second example the section that says "simplifying tool configuration" is unnecessary and should also be discarded. The first part of the sentence would have been enough.
82
+ <example>
83
+ Learning Objective:
84
+ "Identify why integrating internal and external systems is important when building multi-agent AI applications",
85
+ Correct Answer: "Integrating internal and external systems enables AI agents to access, process, and act on real-world data.",
86
+
87
+ Inappropriate incorrect answer options:
88
+
89
+
90
+ "Integrating internal and external systems enables AI agents to unify disparate data schemas into a single standardized format, reducing the complexity of data transformations in automations."
91
+
92
+ Appropriate incorrect answer option:
93
+
94
+ "Integrating internal and external systems enables AI agents to unify disparate data schemas into a single standardized format."
95
+
96
+ </example>
97
+
98
+ <example>
99
+
100
+ "learning_objective": "Describe the importance of integrating AI agents with internal and external systems.",
101
+
102
+ "correct_answer": "Integrating with internal and external systems enables AI agents to access and process relevant real-world data during automation.",
103
+
104
+ Inappropriate incorrect answer options:
105
+ "Integrating with internal and external systems enables AI agents to unify diverse APIs into a standard interface, simplifying tool configuration.",
106
+ Appropriate incorrect answer option:
107
+ "Integrating with internal and external systems enables AI agents to unify diverse APIs into a standard interface"
108
+ </example>
109
+
110
+ ## 5. Here is a learning objective and correct answer with appropriate incorrect answer options that demonstrate structural consistency with the correct answer's grammatical pattern, length, and formatting:
111
+
112
+ <example>
113
+ Learning Objective: "Identify the three primary data structures used in machine learning algorithms."
114
+ Correct Answer: "The three primary data structures used in machine learning algorithms are arrays, matrices, and tensors."
115
+ Appropriate incorrect answer options:
116
+
117
+ "The three primary data structures used in machine learning algorithms are dictionaries, trees, and queues." (Same structure, different data structures)
118
+ "The three primary data structures used in machine learning algorithms are lists, sets, and databases." (Same structure but mixing concepts)
119
+ "The three primary data structures used in machine learning algorithms are features, labels, and parameters." (Same structure, but confuses data structures with other ML concepts)
120
+
121
+ Inappropriate incorrect answer option:
122
+
123
+ "Machine learning algorithms first store data in arrays, then process it using functional programming techniques." (Different grammatical structure - doesn't follow the pattern)
124
+
125
+ </example>
126
+
127
+
128
+
129
+ ## 6. Here is an example of appropriate incorrect answer options for list questions. Here all options give a list of three items:
130
+ <example>
131
+ Learning Objective: "Identify the three key principles of object-oriented programming."
132
+ Correct Answer: "The three key principles of object-oriented programming are encapsulation, inheritance, and polymorphism."
133
+ Appropriate incorrect answer options:
134
+
135
+ "The three key principles of object-oriented programming are encapsulation, inheritance, and composition." (Same structure, two correct, one incorrect)
136
+ "The three key principles of object-oriented programming are abstraction, polymorphism, and delegation." (Same structure, mix of correct/incorrect)
137
+ "The three key principles of object-oriented programming are instantiation, implementation, and isolation." (Same structure, all incorrect terms)
138
+
139
+ Inappropriate incorrect answer option:
140
+
141
+ "Object-oriented programming uses classes and objects to organize code into reusable components." (Different structure and doesn't list three specific principles)
142
+
143
+ </example>
144
+
145
+
146
+
147
+
148
+ ## 7. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when using unnecessary second clauses with comparative phrases that create obvious contradictions.
149
+
150
+ <example>
151
+ Learning Objective: "Explain the primary benefit of incorporating human feedback into AI agent workflows."
152
+ Correct Answer: "The primary benefit of incorporating human feedback is to enable agents to correct errors and improve their decision-making accuracy."
153
+ Inappropriate incorrect answer options with problematic second clauses:
154
+
155
+ "The primary benefit of incorporating human feedback is to validate agent outputs rather than actually improving their performance over time." (Uses "rather than" to create false dichotomy that makes the option obviously wrong)
156
+ "The primary benefit of incorporating human feedback is to increase processing speed instead of focusing on accuracy or quality improvements." (Uses "instead of" to unnecessarily contrast with core benefits, making it clearly incorrect)
157
+ "The primary benefit of incorporating human feedback is to maintain consistency across outputs but not necessarily to enhance the agent's learning capabilities." (Uses "but not necessarily" to explicitly negate a key benefit, making it obviously eliminable)
158
+
159
+ Appropriate incorrect answer option:
160
+
161
+ "The primary benefit of incorporating human feedback is to establish standardized response formats that ensure consistent output structure across different tasks." (Doesn't have a contradictory second clause)
162
+
163
+ </example>
164
+
165
+
166
+
167
+
168
+ ## 8. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when the question asks for positive aspects but distractors present negative aspects, making them obviously wrong. Same applies for negative aspects and distractors being positive.
169
+ <example>
170
+ Learning Objective: "Identify the main advantages of using cloud computing for business applications."
171
+ Correct Answer: "Cloud computing provides cost savings, scalability, and improved accessibility for business applications."
172
+ Inappropriate incorrect answer options that present disadvantages when asked for advantages:
173
+
174
+ "Cloud computing increases security risks, creates vendor dependency, and requires constant internet connectivity." (Obviously wrong - lists disadvantages when the question asks for advantages)
175
+
176
+ Appropriate incorrect answer option:
177
+
178
+ "Cloud computing provides faster processing speeds, enhanced data encryption, and simplified software licensing." (All purported benefits as the question asks about benefits)
179
+
180
+ </example>
181
+
182
+
183
+
184
+ """
prompts/learning_objectives.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BASE_LEARNING_OBJECTIVES_PROMPT = """
2
+ The learning objectives you generate will be assessed through multiple choice
3
+ quiz questions, so the learning objectives cannnot be about building, creating,
4
+ developing, etc. Instead, the objectives should be about identifying, listing,
5
+ describing, defining, comparing, e.g., the kinds of things that can be assessed in a multiple choice quiz. For example, a learning objective like: “identify the key reason for providing examples of the response format you’re looking for in an LLM prompt”, where then the correct answer might be “providing an example of the response format you're looking for gives the LLM clear guidance on the expected output”. Limit learning objectives to one goal per objective, i.e., don't say "identify the <key concepts of the course> and explain why they are important in the context of <topic of the course>". Either choose which of these (identify or explain) is most relevant to the learning objective or create two learning objectives.
6
+
7
+ INSTRUCTIONS:
8
+ 1. Each learning objective must be derived DIRECTLY from the COURSE CONTENT provided
9
+ below. Do not create objectives for topics not covered in the content.
10
+ 2. Learning objectives should be specific, measurable, and focused on important
11
+ concepts.
12
+ 3. Each objective should start with an action verb that allows for assessment using
13
+ a multiple choice question (e.g., identify, describe, define, list, compare, etc.). Do not include more than one action verb per learning objective, e.g., do not say "identify and explain" or similar.
14
+ 4. Make each objective unique and independent, covering different aspects of the content. It is ok if two objectives address different aspects or angles of the same topic.
15
+ 5. Learning objectives should not contain part or all of the correct answer associated with them.
16
+ 6. No learning objective should depend on context from another learning objective, or in other words, each learning objective should be able to stand alone without knowing anything about what the other learning objectives are.
17
+ 7. Ensure objectives are at an appropriate level of difficulty for the course, meaning they are consistent with the difficulty level of the course content.
18
+ 8. Write learning objectives that address critical knowledge and skills in the content, not trivial facts or details of a specific use case implementation or coding exercise.
19
+ 9. Wherever possible, write learning objectives that address the “why” of the concepts presented in the course rather than the “what”. For example, if the course presents an implementation of a use case using a particular framework or tool, don’t write learning objectives that ask about the details of exactly what was presented in the implementation or how the framework or tool was used. Rather, write a learning objective that addresses the why behind the example.
20
+ 10. The course content you are provided is presenting principles or methods in the artificial intelligence space, and the means of presentation is through the use of a particular tool or framework. Do not mention the name of whatever tool or framework is used in the course as part of a learning objective. Instead aim for tool or framework agnostic learning objectives that address the principles or methods as well as topics and concepts being presented.
21
+ 11. Do not write learning objectives that address specific tool or framework functionality presented in the course content. Write learning objectives that are completely tool or framework agnostic that get at the core principles or methods being presented.
22
+ 12. Because this is a software development course and an AI development course, refrain from any references to manual intervention unless absolutely relevant
23
+ 12. Write the first learning objective to address the “what” or "why" of the main topic of the course in a way that will lead to a relatively easy recall question as the first question in the quiz. To write this first learning objective you should identify the main
24
+ topic, concept or principle that the course is about and form an objective
25
+ something like “identify what <important main topic / concept / principle> is” or
26
+ “explain why <important main topic / concept / principle> is important?” or something of similar simplicity.
27
+
28
+
29
+ """
30
+
31
+
32
+ BLOOMS_TAXONOMY_LEVELS = """
33
+ The levels of Bloom's taxonomy from lowest to highest are as follows:
34
+ - Recall: Demonstrates the retention of key concepts and facts, not trivialities.
35
+ Avoid a simple recall structure, where there is an opportunity to ask learners a
36
+ question at a higher level of Bloom's taxonomy, for example, asking the learner
37
+ to apply a concept seen in course to a new but similar scenario.
38
+ - Comprehension: Connect ideas and concepts to demonstrate a deeper grasp of
39
+ the material beyond simple recall.
40
+ - Application: Apply a concept to a new / different but similar scenario to that
41
+ seen in course.
42
+ - Analysis: Examine and break information into parts, determine how the parts
43
+ relate, identify motives or causes, make inferences or calculations, and find
44
+ evidence to support generalizations.
45
+ - Evaluation: Make judgments, assessments, or evaluations regarding a scenario,
46
+ statement, or concept. The answer choices should offer different plausible options
47
+ that require critical thinking to discern the most valid or appropriate choice.
48
+ """
49
+
50
+ LEARNING_OBJECTIVE_EXAMPLES = """
51
+
52
+ <appropriate_learning_objectives_and_correct_answers>
53
+
54
+ [
55
+ {
56
+ "id": 1,
57
+ "learning_objective": "Identify what a code agent is.",
58
+ "source_reference": [
59
+ "sc-HuggingFace-C5-L0_v1.vtt",
60
+ "sc-HuggingFace-C5-L1_v4.vtt",
61
+ "sc-HuggingFace-C5-L2_v4.vtt"
62
+ ],
63
+ "correct_answer": "A code agent is a system that uses an AI model to generate and execute code as its way of performing tasks.",
64
+
65
+ },
66
+ {
67
+ "id": 2,
68
+ "learning_objective": "Explain how code agents can be more efficient than traditional tool calling agents.",
69
+ "source_reference": [
70
+ "sc-HuggingFace-C5-L2_v4.vtt"
71
+ ],
72
+ "correct_answer": "Code agent actions are more compact and can execute loops and variables in a single code snippet, which reduces the number of steps, latency, and risk of errors in comparison to traditional tool calling agents.",
73
+ },
74
+ {
75
+ "id": 3,
76
+ "learning_objective": "Describe how running code agents using a custom Python interpreter, like the one used in the course, can mitigate security concerns associated with executing AI-generated code.",
77
+ "source_reference": [
78
+ "sc-HuggingFace-C5-L3_v3.vtt"
79
+ ],
80
+ "correct_answer": "Running code in a dedicated interpreter can help mitigate security risks by restricting imports, blocking undefined commands, and limiting resource usage.",
81
+ },
82
+ {
83
+ "id": 4,
84
+ "learning_objective": "Describe how running code agents in a remote sandbox can mitigate security concerns associated with executing AI-generated code.": [
85
+ "sc-HuggingFace-C5-L3_v3.vtt"
86
+ ],
87
+ "correct_answer": "Running code in a remote sandbox environment helps prevent malicious or accidental damage to your local system."
88
+ },
89
+ {
90
+ "id": 5,
91
+ "learning_objective": "Describe why and how agent performance can be tracked or traced during execution.",
92
+ "source_reference": [
93
+ "sc-HuggingFace-C5-L4_v3.vtt"
94
+ ],
95
+ "correct_answer": "Tracing captures the agent's reasoning steps and tool usage, making it possible to evaluate the correctness, efficiency, and reliability of its decisions over multiple steps."
96
+ },
97
+ {
98
+ "id": 6,
99
+ "learning_objective": "Discuss the benefits of using multiple specialized agents that collaborate on complex tasks.",
100
+ "source_reference": [
101
+ "sc-HuggingFace-C5-L5_v2.vtt"
102
+ ],
103
+ "correct_answer": "Splitting tasks among distinct agents with focused roles, each having its own memory and capabilities, can improve performance, reduce errors, and enable more advanced planning."
104
+ },
105
+ {
106
+ "id": 7,
107
+ "learning_objective": "Explain what it means to calculate advantages by normalizing reward scores across generated responses and centering them around zero in reinforcement fine-tuning.",
108
+ "source_reference": [
109
+ "sc-Predibase-C2-L4.vtt"
110
+ ],
111
+ "correct_answer": "Advantages are calculated by normalizing reward scores across generated responses, centering them around zero to highlight which outputs are better or worse than average."
112
+ }
113
+ ]
114
+ </appropriate_learning_objectives_and_correct_answers>
115
+
116
+ Avoid adding unnecessary length to the correct answer. Aim for 20 word correct answers or less. Below is an example of unnecessary length, which typically occurs in the last part of a long sentence:
117
+
118
+ <inappropriate_learning_objectives_and_correct_answers>
119
+ [
120
+ {
121
+ "id": 1,
122
+ "learning_objective": "Identify why integrating internal and external systems is important when building multi-agent AI applications",
123
+ "source_reference": [
124
+ "sc-CrewAI-C2-L2_eng.vtt"
125
+ ],
126
+ "correct_answer": "Integrating internal and external systems enables AI agents to access, process, and act on real-world data, expanding the usefulness and applicability of automations.",
127
+ },
128
+ {
129
+ "id": 2,
130
+ "learning_objective": "Identify the main advantage of assigning different AI models to different agents in a multi-agent system, as described in the course"
131
+ "source_reference": [
132
+ "sc-CrewAI-C2-L2_eng.vtt"
133
+ ],
134
+ "correct_answer": "Assigning different AI models to different agents enables the system to optimize for factors like speed, quality, or task complexity, making the overall workflow more efficient and effective"
135
+ }
136
+ ]
137
+
138
+
139
+
140
+
141
+ In id: 1 we should avoid "expanding the usefulness and applicability of automations." and in id: 2 we should avoid "making the overall workflow more efficient and effective". These statements are considered unnecessary length.
142
+
143
+ Rule: Avoid compound sentences where the second clause introduces additional consequences, effects, or elaborations that are not essential to the core concept.
144
+ Look for these patterns that indicate unnecessary length:
145
+
146
+ - Clauses beginning with "which," "that," "providing," "enabling," "reducing," "ensuring"
147
+ - Phrases connected by commas that add consequences or effects
148
+ - Additional explanations after the main misconception is established
149
+
150
+ </inappropriate_learning_objectives_and_correct_answers>
151
+ """
152
+
153
+
154
+ LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS = """
155
+
156
+ <learning_objectives>
157
+
158
+ [
159
+ {
160
+ "id": 1,
161
+ "learning_objective": "Identify what a code agent is.",
162
+ "source_reference": [
163
+ "sc-HuggingFace-C5-L0_v1.vtt",
164
+ "sc-HuggingFace-C5-L1_v4.vtt",
165
+ "sc-HuggingFace-C5-L2_v4.vtt"
166
+ ],
167
+
168
+ },
169
+ {
170
+ "id": 2,
171
+ "learning_objective": "Explain how code agents can be more efficient than traditional tool calling agents.",
172
+ "source_reference": [
173
+ "sc-HuggingFace-C5-L2_v4.vtt"
174
+ ],
175
+
176
+ },
177
+ {
178
+ "id": 3,
179
+ "learning_objective": "Describe how running code agents using a custom Python interpreter, like the one used in the course, can mitigate security concerns associated with executing AI-generated code.",
180
+ "source_reference": [
181
+ "sc-HuggingFace-C5-L3_v3.vtt"
182
+ ],
183
+ },
184
+ {
185
+ "id": 4,
186
+ "learning_objective": "Describe how running code agents in a remote sandbox can mitigate security concerns associated with executing AI-generated code.",
187
+ "source_reference": [
188
+ "sc-HuggingFace-C5-L3_v3.vtt"
189
+ ],
190
+
191
+ },
192
+ {
193
+ "id": 5,
194
+ "learning_objective": "Describe why and how agent performance can be tracked or traced during execution.",
195
+ "source_reference": [
196
+ "sc-HuggingFace-C5-L4_v3.vtt"
197
+ ],
198
+ },
199
+ {
200
+ "id": 6,
201
+ "learning_objective": "Discuss the benefits of using multiple specialized agents that collaborate on complex tasks.",
202
+ "source_reference": [
203
+ "sc-HuggingFace-C5-L5_v2.vtt"
204
+ ],
205
+ },
206
+ {
207
+ "id": 7,
208
+ "learning_objective": "Explain what it means to calculate advantages by normalizing reward scores across generated responses and centering them around zero in reinforcement fine-tuning.",
209
+ "source_reference": [
210
+ "sc-Predibase-C2-L4.vtt"
211
+ ]
212
+ }
213
+ ]
214
+ </learning_objectives>
215
+
216
+ """
prompts/questions.py ADDED
@@ -0,0 +1,886 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GENERAL_QUALITY_STANDARDS = """
2
+ The overall goal of the quiz is to set the learner up for success in answering
3
+ interesting and non-trivial questions (not to give them a painful or discouraging
4
+ experience). A perfect score should be attainable for someone who thoughtfully
5
+ consumed the course content the quiz is based on. The question you write must be
6
+ aligned with the course content and the provided
7
+ learning objective and correct answer.
8
+
9
+ Because this is a software development course and an AI development course, refrain from any references to manual intervention unless absolutely relevant
10
+ """
11
+ MULTIPLE_CHOICE_STANDARDS = """
12
+ - Each question must have EXACTLY ONE correct answer (no more, no less)
13
+ - Each question should have a clear, unambiguous correct answer
14
+ - Distractors (wrong answer options) should be plausible and represent common misconceptions. Not obviously wrong.
15
+ - All options should be of similar length and detail
16
+ - Options should be mutually exclusive
17
+ - Avoid "all/none of the above" options unless pedagogically necessary
18
+ - Typically include 4 options (A, B, C, D)
19
+ - IMPORTANT NOTE: do not start the answer feedback with “correct” or “i
20
+
21
+ """
22
+ EXAMPLE_QUESTIONS = """
23
+
24
+ <EXAMPLE_QUESTION_1>
25
+ What is a code agent in the context of an LLM workflow?
26
+
27
+ A: An AI model that generates text responses without executing external actions.
28
+ Feedback: Code agents do more than generate text—they generate and run code to perform actions.
29
+
30
+ *B: An AI agent that can write and execute code as part of its decision-making process.
31
+ Feedback: Well done! Code agents can write and execute code to handle tasks, rather than just output text or follow a strict script.
32
+
33
+ C: A pre-programmed or “coded” AI system that follows a strict decision tree to perform tasks.
34
+ Feedback: Code agents are not fixed, rule-based systems. Code agents can write and execute code, rather than following a single predetermined decision tree.
35
+
36
+ D: An AI assistant that can perform calculations and simple tasks without external interaction.
37
+ Feedback: Code agents can write and execute code to perform complex tasks, not just simple calculations.
38
+ </EXAMPLE_QUESTION_1>
39
+
40
+ <EXAMPLE_QUESTION_2>
41
+ In the context of agent architectures, what is a key performance trade-off between representing agent actions as code (code agents) versus representing them as JSON tool calls, when it comes to complex multi-step tasks?
42
+
43
+ *A: Code agents use fewer tokens, exhibit lower latency, and have reduced error rates, since complex actions can be represented and executed in a single, consolidated code snippet.
44
+ Feedback: Nice work! Code agents can execute loops, reuse variables, and call multiple tools with a single code snippet, so they need far fewer LLM turns. Fewer turns mean lower token usage, shorter round-trip latency, and fewer chances for the model to make mistakes.
45
+
46
+ B: JSON-based agents require fewer tokens because each step is more compact, resulting in faster execution and fewer errors for complex tasks.
47
+ Feedback: JSON tool calls actually increase token usage because each micro-action and its context must be sent back to the model repeatedly. This longer chain of steps results in slower execution and an increased chance of errors.
48
+
49
+ C: Both code-based and JSON-based representations are equivalent in terms of token usage, latency, and error rates for complex multi-step tasks.
50
+ Feedback: Code agents can chain many actions in a single code execution step, which reduces token usage, latency, and error rates, while tool calling agents execute a chain step-by-step, which typically increases token usage, latency, and the chance of errors.
51
+
52
+ D: Code agents have higher latency due to the complexity of parsing code, while JSON action representation avoids errors by breaking down tasks into isolated calls.
53
+ Feedback: Parsing a short code snippet is straightforward and fast, and overall latency tends to be dominated by LLM-token traffic. Code agents can execute complex tasks in one step, so they typically run faster and with a reduced chance of errors. JSON action agents often execute the same complex task step-by-step using many LLM turns, so latency and error rate are higher, not lower.
54
+ </EXAMPLE_QUESTION_2>
55
+
56
+ <EXAMPLE_QUESTION_3>
57
+ What is one of the main risks associated with running code agents on your local computer?
58
+
59
+ A: They might send spam emails from your email account.
60
+ Feedback: The lesson highlights risks such as file deletion, resource abuse, or network compromise, not sending spam emails.
61
+
62
+ *B: They could execute code that deletes critical files or creates many files that bloat your system.
63
+ Feedback: Good work! Letting an agent run code locally can compromise your system in a number of ways, such as deleting vital files, or generating a large number of files.
64
+
65
+ C: They might launch a denial-of-service attack on your own computer.
66
+ Feedback: The lesson focuses on more direct threats such as file deletion and creation, or installing malware. While a denial-of-service is theoretically possible, it isn��t emphasized as a primary risk.
67
+
68
+ D: They might cause your computer to overheat by overusing the CPU.
69
+ Feedback: Overheating and CPU overuse isn’t one of the primary risks discussed in the lesson.
70
+ </EXAMPLE_QUESTION_3>
71
+
72
+ <EXAMPLE_QUESTION_4>
73
+ How does the custom local Python interpreter demonstrated in the course mitigate risks from harmful code execution?
74
+
75
+ A: The local interpreter uses the standard Python interpreter but logs all output for manual review.
76
+ Feedback: The custom interpreter presented in the course does not rely on the normal Python interpreter at all. It enforces safeguards such as blocking imports, ignoring shell commands, and capping the number of loop-iteration caps.
77
+
78
+ *B: It ignores undefined commands, disallows imports outside an explicit whitelist, and sets a hard cap on loop iterations to prevent infinite loops and resource abuse.
79
+ Feedback: That’s right! The custom Python interpreter presented in the course skips undefined shell-style commands, blocks any import not explicitly approved, and stops loop executions that exceed the cap, all of which help mitigate security and resource risks.
80
+
81
+ C: It only allows execution of code that does not require any external packages, preventing all imports regardless of configuration.
82
+ Feedback: The interpreter isn’t a blanket “no-imports” sandbox. It blocks imports by default, but you can pass an explicit whitelist (e.g., LocalPythonExecutor(["numpy", "PIL"])) that lets approved external packages load.
83
+
84
+ D: It prevents all code execution by rejecting any code containing loops or function definitions.
85
+ Feedback: The interpreter doesn’t blanket-ban loops or function definitions. It still runs normal Python code—including loops and functions—but adds safeguards: it caps loop iterations, blocks disallowed imports, and skips undefined commands.
86
+ </EXAMPLE_QUESTION_4>
87
+
88
+ <EXAMPLE_QUESTION_5>
89
+ Which of the following is a key security advantage of using a remote sandbox environment for executing code agents, as discussed in the lesson?
90
+
91
+ A: It ensures faster execution of agents by optimizing code compilation. Feedback: Remote sandboxes primarily protect systems from potential harm, not enhance code execution speed.
92
+
93
+ *B: It allows execution of code without the risk of affecting local systems.
94
+ Feedback: Great job! Running agents in a remote sandbox isolates their code execution so that any errors or malicious actions cannot harm your local system.
95
+
96
+ C: It provides detailed real-time monitoring of all code executions.
97
+ Feedback: A remote sandbox mainly prevents harmful code from threatening your local system, not through real-time monitoring.
98
+
99
+ D: It guarantees execution of the code with reduced computational cost.
100
+ Feedback: Using a remote sandbox protects your local system from malicious or faulty code. It does not guarantee reduced computational cost.
101
+
102
+ </EXAMPLE_QUESTION_5>
103
+
104
+ Note that all example questions follow the general quality standards as well as
105
+ the question specific quality standards. The correct answer (marked with a *) and
106
+ incorrect answer options follow the standards specific to correct and incorrect
107
+ answers.
108
+
109
+ """
110
+ QUESTION_SPECIFIC_QUALITY_STANDARDS = """
111
+ The question you write must:
112
+ - be in the language and tone of the course.
113
+ - be at a similar level of difficulty or complexity as encountered in the course.
114
+ - assess only information from the course and not depend on information that was
115
+ not covered in the course.
116
+ - not attempt to teach something as part of the quiz.
117
+ - use clear and concise language
118
+ - not induce confusion
119
+ - provide a slight (not major) challenge.
120
+ - be easily interpreted and unambiguous.
121
+ - be well written in clear and concise language, proper grammar, good sentence
122
+ structure, and consistent formatting
123
+ - be thoughtful and specific rather than broad and ambiguous
124
+ - be complete in its wording such that understanding the question is not part
125
+ of the assessment
126
+ """
127
+ CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS = """
128
+ The correct answer you write:
129
+ - must be factually correct and unambiguous
130
+ - be in the language and tone of the course and in complete sentence form.
131
+ - be at a similar level of difficulty or complexity as encountered in the course.
132
+ - contain only information from the course and not depend on information that was
133
+ not covered in the course.
134
+ - not attempt to teach something as part of the quiz.
135
+ - use clear and concise language
136
+ - be thoughtful and specific rather than broad and ambiguous
137
+ - be complete in its wording such that understanding which is the correct answer
138
+ is not part of the assessment
139
+ """
140
+ INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS = """
141
+ The incorrect answer options you write should ideally represent reasonable potential misconceptions, but they could also be answers that would sound plausible to someone who has not taken the course, or was not paying close enough attention during the course. In that sense, they should require some thought, even by the learner who has diligently completed the course, to determine they are incorrect.
142
+
143
+ When constructing incorrect answer feedback, pay attention to the incorrect_answer_suggestions provided along with the learning objective. These are not your only options for incorrect answers but you can use them directly, or as a starting point or in addition to other plausible incorrect answers.
144
+
145
+ Wrong answeres should not be so obviously wrong that a learner who has not taken the course can immediately rule them out.
146
+
147
+ Here are some examples of poorly written incorrect answer options for a particular
148
+ question:
149
+
150
+ <QUESTION>
151
+ Which statement best explains why monitoring an agent's trace can be helpful
152
+ in debugging and establishing the performance of and agent?
153
+ </QUESTION>
154
+
155
+ <POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_1>
156
+ The agent's trace is a visual interface for users that adds limited insight into
157
+ the agent's internal processes.
158
+ </POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_1>
159
+
160
+ The example above is poorly written because it is obviously wrong. The question
161
+ is asking for why monitoring the agent trace can be helpful and this answer option
162
+ states that it is a visual interface that provides limited insight, which is
163
+ certainly incorrect, but does not represent a reasonable potential misconception
164
+ and even a learner who has not taken the course can immediately rule it out.
165
+
166
+ <POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_2>
167
+ The agent trace is used exclusively for scenarios where the agent is underperforming.
168
+ </POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_2>
169
+
170
+ This answer option is also poorly written because it is obviously wrong. The use of the word "exclusively" is a tipoff that this is not the right answer. Similarly, formulating incorrect answer options with words like "only", "always", "never", and similar words that in and of themselves make the answer option wrong represent poor word choice for incorrect answer options.
171
+
172
+ Below is an example of a well written incorrect answer option for the same question:
173
+
174
+ <WELL_WRITTEN_INCORRECT_ANSWER_OPTION>
175
+ The agent trace comprises a list of the error messages generated during an agent's
176
+ execution, which is helpful for debugging.
177
+ </WELL_WRITTEN_INCORRECT_ANSWER_OPTION>
178
+
179
+ The example above is well written because it is not obviously wrong and represents
180
+ a reasonable potential misconception. It requires some thought to determine it is
181
+ incorrect and a learner who has not taken the course will not be able to
182
+ immediately rule it out. In fact, if you changed the word "comprises" to "includes" the answer option would be correct, in a sense, just incomplete. But in this case, the learner needs to be paying close attention to identify this as incorrect.a
183
+ """
184
+ ANSWER_FEEDBACK_QUALITY_STANDARDS = """
185
+ Every correct and incorrect answer must include feedback.
186
+
187
+ Incorrect answer feedback should:
188
+ - be informational and encouraging, not punitive.
189
+ - be a single sentence, concise and to the point.
190
+ - Do not say "Incorrect" or "Wrong".
191
+
192
+ Correct answer feedback should:
193
+ - be informational and encouraging.
194
+ - be a single sentence, concise and to the point.
195
+ - Do not say Correct! or anything that will sound redundant after the string "Correct: ", e.g. "Correct: Correct!".
196
+
197
+ """
198
+ INCORRECT_ANSWER_PROMPT = """
199
+
200
+ # CORE PRINCIPLES WITH EXAMPLES:
201
+
202
+ ## 1. CREATE COMMON MISUNDERSTANDINGS
203
+ Create incorrect answer suggestions that represent how students actually misunderstand the material:
204
+
205
+ <example>
206
+ Learning Objective: "What is version control in software development?"
207
+ Correct Answer: "A system that tracks changes to files over time so specific versions can be recalled later."
208
+
209
+ Plausible Incorrect Answer Suggestions:
210
+ - "A testing method that ensures software works correctly across different operating system versions." (Confuses with cross-platform testing)
211
+ - "A project management approach where each team member works on a separate software version." (Misunderstands the concept entirely)
212
+ - "A release strategy that maintains multiple versions of software for different customer needs." (Confuses with product versioning)
213
+ </example>
214
+
215
+ ## 2. MAINTAIN IDENTICAL STRUCTURE
216
+ All incorrect answer suggestions must match the correct answer's grammatical pattern, length, and formatting:
217
+
218
+ <example>
219
+ Learning Objective: "What are the three primary data structures used in machine learning algorithms?"
220
+ Correct Answer: "Arrays, matrices, and graphs."
221
+
222
+ Good Incorrect Answer Suggestions:
223
+ - "Dictionaries, trees, and queues." (Same structure, different data structures)
224
+ - "Tensors, vectors, and databases." (Same structure but mixing concepts)
225
+ - "Features, labels, and parameters." (Same structure, but confuses data structures with ML concepts)
226
+
227
+ Bad Incorrect Answer Suggestion:
228
+ - "Machine learning algorithms first store data in arrays, then process it using functional programming." (Different structure)
229
+ </example>
230
+
231
+ ## 3. USE COURSE TERMINOLOGY CORRECTLY BUT IN WRONG CONTEXTS
232
+ Use terms from the course material but apply them incorrectly:
233
+
234
+ <example>
235
+ Learning Objective: "What is the purpose of backpropagation in neural networks?"
236
+ Correct Answer: "To calculate gradients used to update weights during training."
237
+
238
+ Plausible Incorrect Answer Suggestions:
239
+ - "To normalize input data across layers to prevent gradient explosion." (Uses correct terms but describes batch normalization)
240
+ - "To optimize the activation functions by adjusting their thresholds during inference." (Misapplies neural network terminology)
241
+ - "To propagate inputs forward through the network during the prediction phase." (Confuses with forward propagation)
242
+ </example>
243
+
244
+ ## 4. INCLUDE PARTIALLY CORRECT INFORMATION
245
+ Create incorrect answer suggestions that contain some correct elements but miss critical aspects:
246
+
247
+ <example>
248
+ Learning Objective: "How does transfer learning improve deep neural network training?"
249
+ Correct Answer: "By reusing features learned from a large dataset to initialize a model that can then be fine-tuned on a smaller, task-specific dataset."
250
+
251
+ Plausible Incorrect Answer Suggestions:
252
+ - "By transferring trained models between different neural network frameworks to improve compatibility and deployment options." (Misunderstands the concept of knowledge transfer)
253
+ - "By reusing features learned from a large dataset and freezing all weights to prevent any updates during task-specific training." (First part correct, second part wrong)
254
+ - "By combining multiple pre-trained models into a committee that votes on final predictions for improved accuracy." (Confuses with ensemble learning)
255
+ </example>
256
+
257
+ ## 5. AVOID OBVIOUSLY WRONG ANSWERS
258
+ Don't create incorrect answer suggestions that anyone with basic knowledge could eliminate:
259
+
260
+ <example>
261
+ Learning Objective: "What is unit testing in software development?"
262
+ Correct Answer: "Testing individual components in isolation to verify they work as expected."
263
+
264
+ Bad Incorrect Answer Suggestions to Avoid:
265
+ - "A process where code is randomly modified to see if it still works." (Too obviously wrong)
266
+ - "Testing that should never be done because it wastes development time." (Contradicts basic principles)
267
+ - "Running the software on different units of hardware like phones and laptops." (Misunderstands the basic concept)
268
+ </example>
269
+
270
+ ## 6. MIRROR THE DETAIL LEVEL AND STYLE
271
+ Match the technical depth and tone of the correct answer:
272
+
273
+ <example>
274
+ Learning Objective: "What is the time complexity of quicksort in the average case?"
275
+ Correct Answer: "O(n log n), where n is the number of elements to be sorted."
276
+
277
+ Good Incorrect Answer Suggestions:
278
+ - "O(n^2), where n is the number of elements to be sorted." (Same level of detail)
279
+ - "O(n), where n is the number of elements to be sorted." (Same structure and detail)
280
+ - "O(log n), where n is the number of elements to be sorted." (Same structure but incorrect complexity)
281
+
282
+ Bad Incorrect Answer Suggestion:
283
+ - "Quicksort is generally faster than bubble sort but can perform poorly on already sorted arrays." (Different style and not answering the specific objective)
284
+ </example>
285
+
286
+ ## 7. FOR LIST QUESTIONS, MAINTAIN CONSISTENCY
287
+ If the correct answer lists specific items, all incorrect answer suggestions should list the same number of items:
288
+
289
+ <example>
290
+ Learning Objective: "What are the three key principles of object-oriented programming?"
291
+ Correct Answer: "Encapsulation, inheritance, and polymorphism."
292
+
293
+ Good Incorrect Answer Suggestions:
294
+ - "Encapsulation, inheritance, and composition." (Same structure, two correct, one incorrect)
295
+ - "Abstraction, polymorphism, and delegation." (Same structure, mix of correct/incorrect)
296
+ - "Instantiation, implementation, and isolation." (Same structure, all incorrect but plausible terms)
297
+ </example>
298
+
299
+ ## 8. AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
300
+ Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
301
+ These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
302
+ More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
303
+
304
+ <example>
305
+ Learning Objective: "What is the purpose of index partitioning in databases?"
306
+ Correct Answer: "To improve query performance by dividing large indexes into smaller, more manageable segments."
307
+
308
+ Bad Incorrect Answer Suggestions to Avoid:
309
+ - "To always guarantee the fastest possible query performance regardless of data size." (Uses "always")
310
+ - "To improve query performance rather than ensuring data integrity or providing backup functionality." (Unnecessary comparison)
311
+ - "To exclusively support distributed database systems that never operate on a single server." (Uses absolute terms)
312
+ </example>
313
+
314
+
315
+ """
316
+ INCORRECT_ANSWER_EXAMPLES = """
317
+
318
+ <example>
319
+ Learning Objective: "What is the purpose of activation functions in neural networks?"
320
+ Correct Answer: "To introduce non-linearity into the network's output."
321
+
322
+ Plausible Incorrect Answer Suggestions:
323
+ - "To normalize input data across different feature scales." (Confuses with data normalization)
324
+ - "To reduce computational complexity during forward propagation." (Misunderstands as performance optimization)
325
+ - "To prevent gradient explosion during backpropagation training." (Confuses with gradient clipping)
326
+ </example>
327
+
328
+ Note: All options follow the same grammatical structure ("To [verb] [object]") across all options.
329
+
330
+ <example>
331
+ Learning Objective: "What is the main function of Git branching?"
332
+ Correct Answer: "To separate work on different features or fixes from the main codebase."
333
+
334
+ Plausible Incorrect Answer Suggestions:
335
+ - "To create backup copies of the repository in case of system failure." (Confuses with backup functionality)
336
+ - "To track different versions of files across multiple development environments." (Mixes up with version tracking)
337
+ - "To isolate unstable code until it passes integration testing protocols." (Focuses only on testing aspects)
338
+ </example>
339
+
340
+ Note: All options maintain identical sentence structure ("To [verb] [object phrase]") with similar length and complexity.
341
+
342
+ <example>
343
+ Learning Objective: "Which category of machine learning algorithms does K-means clustering belong to?"
344
+ Correct Answer: "Unsupervised learning algorithms that identify patterns without labeled training data."
345
+
346
+ Plausible Incorrect Answer Suggestions:
347
+ - "Supervised learning algorithms that predict continuous values based on labeled examples." (Confuses with regression)
348
+ - "Reinforcement learning algorithms that optimize decisions through environment interaction." (Misclassifies algorithm type)
349
+ - "Semi-supervised learning algorithms that combine labeled and unlabeled data for training." (Incorrect classification)
350
+ </example>
351
+
352
+ Note: All options follow consistent structure: "[Category] algorithms that [what they do]" while using correct ML terminology in wrong contexts.
353
+
354
+ <example>
355
+ Learning Objective: "How does feature scaling improve the performance of distance-based machine learning models?"
356
+ Correct Answer: "By ensuring all features contribute equally to distance calculations regardless of their original ranges."
357
+
358
+ Plausible Incorrect Answer Suggestions:
359
+ - "By removing redundant features that would otherwise dominate the learning algorithm." (Confuses with feature selection)
360
+ - "By converting categorical variables into numerical representations for mathematical operations." (Mixes up with encoding)
361
+ - "By increasing the dimensionality of the feature space to capture more complex relationships." (Confuses with feature expansion)
362
+ </example>
363
+
364
+ Note: All options maintain consistent grammatical structure ("By [verb+ing] [object] [qualification]") while including partially correct concepts.
365
+
366
+ <example>
367
+ Learning Objective: "How do NoSQL databases differ from relational databases?"
368
+ Correct Answer: "NoSQL databases use flexible schema designs while relational databases enforce strict predefined schemas."
369
+
370
+ Plausible Incorrect Answer Suggestions:
371
+ - "NoSQL databases support ACID transactions while relational databases prioritize eventual consistency." (Reverses actual characteristics)
372
+ - "NoSQL databases require SQL for queries while relational databases support multiple query languages." (Fundamentally incorrect)
373
+ - "NoSQL databases are primarily used for small datasets while relational databases handle big data applications." (Inverts typical use cases)
374
+ </example>
375
+
376
+ Note: All options follow identical grammatical structure: "NoSQL databases [characteristic] while relational databases [contrasting characteristic]" with similar technical detail.
377
+
378
+ <example>
379
+ Learning Objective: "What are the three primary service models in cloud computing?"
380
+ Correct Answer: "Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS)."
381
+
382
+ Plausible Incorrect Answer Suggestions:
383
+ - "Virtual Machines as a Service (VMaaS), Containers as a Service (CaaS), and Functions as a Service (FaaS)." (Confuses with deployment methods)
384
+ - "Storage as a Service (STaaS), Network as a Service (NaaS), and Compute as a Service (CaaS)." (Mixes up with service categories)
385
+ - "Public Cloud, Private Cloud, and Hybrid Cloud." (Confuses with deployment models)
386
+ </example>
387
+
388
+ Note: Each option follows the pattern "[Item 1], [Item 2], and [Item 3]" with consistent abbreviation formatting and exactly three items.
389
+
390
+ <example>
391
+ Learning Objective: "What is the best practice for conducting effective code reviews?"
392
+ Correct Answer: "Review small, focused changes regularly rather than large batches of code infrequently."
393
+
394
+ Plausible Incorrect Answer Suggestions:
395
+ - "Ensure only senior developers conduct reviews to maintain code quality standards." (Overemphasizes seniority)
396
+ - "Focus on identifying bugs rather than architectural or stylistic issues." (Narrows scope too much)
397
+ - "Require code to pass automated tests with 100 percent coverage before human review." (Overstates requirements)
398
+ </example>
399
+
400
+ Note: All options follow similar imperative structure with concrete recommendations while avoiding absolute terms like "always" or "never".
401
+
402
+ <example>
403
+ Learning Objective: "Which statement accurately describes the role of a Scrum Master in agile development?"
404
+ Correct Answer: "A facilitator who removes impediments and ensures the team follows agile practices."
405
+
406
+ Plausible Incorrect Answer Suggestions:
407
+ - "A technical leader who reviews code quality and makes final architectural decisions." (Confuses with tech lead role)
408
+ - "A project manager who assigns tasks and tracks individual team member performance." (Mixes up with traditional PM)
409
+ - "A product owner who prioritizes features and accepts completed work on behalf of stakeholders." (Confuses with Product Owner)
410
+ </example>
411
+
412
+ Note: All options follow consistent grammatical structure ("A [role] who [does something specific]") with parallel descriptions.
413
+
414
+ <example>
415
+ Learning Objective: "What is the most likely cause of an SQL injection vulnerability?"
416
+ Correct Answer: "Directly incorporating user input into database queries without proper validation or parameterization."
417
+
418
+ Plausible Incorrect Answer Suggestions:
419
+ - "Using outdated database management systems that lack modern security features." (Confuses with database vulnerabilities)
420
+ - "Implementing weak password hashing algorithms for user authentication." (Mixes up with authentication issues)
421
+ - "Failing to enable HTTPS for secure data transmission between client and server." (Confuses with transport security)
422
+ </example>
423
+
424
+ Note: All options follow consistent structure describing a security issue while focusing on different security domains that students might confuse.
425
+
426
+
427
+ """
428
+ RANK_QUESTIONS_PROMPT = """
429
+ Rank the following multiple-choice questions based on their quality as assessment items.
430
+
431
+ These questions have been selected as the best in a group of questions already. Your task is to rank them based on their quality as assessment items.
432
+
433
+ <RANKING_CRITERIA>
434
+ 1. Question clarity and unambiguity
435
+ 2. Alignment with the stated learning objective
436
+ 3. Quality of incorrect answer options see guidelines
437
+ 4. Quality of feedback for each option
438
+ 5. Appropriate difficulty level and use of simple english. See below examples of simple versus complex english, and consider simple english better for your ranking.
439
+ <DIFFICULTY_LEVEL_GUIDELINES>
440
+ <EXAMPLE_1>
441
+ <SIMPLE_ENGLISH>AI engineers create computer programs that can learn from data and make decisions.</SIMPLE_ENGLISH>
442
+ <COMPLEX_ENGLISH>AI engineering practitioners architect computational paradigms exhibiting autonomous erudition capabilities via statistical data assimilation and subsequent decisional extrapolation.</COMPLEX_ENGLISH>
443
+ </EXAMPLE_1>
444
+
445
+ <EXAMPLE_2>
446
+ <SIMPLE_ENGLISH>Machine learning models need large amounts of good data to work well.</SIMPLE_ENGLISH>
447
+ <COMPLEX_ENGLISH>Machine learning algorithmic frameworks necessitate voluminous, high-fidelity datasets to achieve optimal efficacy in their inferential capacities.</COMPLEX_ENGLISH>
448
+ </EXAMPLE_2>
449
+ </DIFFICULTY_LEVEL_GUIDELINES>
450
+ 6. It's adherence to the below guidelines:
451
+ <GUIDELINES>
452
+ <General Quality Standards>
453
+ {GENERAL_QUALITY_STANDARDS}
454
+ </General Quality Standards>
455
+
456
+ <Multiple Choice Specific Standards>
457
+ {MULTIPLE_CHOICE_STANDARDS}
458
+ </Multiple Choice Specific Standards>
459
+
460
+ Follows these example questions:
461
+ <Example Questions>
462
+ {EXAMPLE_QUESTIONS}
463
+ </Example Questions>
464
+
465
+ Questions followed these instructions:
466
+ <Question Specific Quality Standards>
467
+ {QUESTION_SPECIFIC_QUALITY_STANDARDS}
468
+ </Question Specific Quality Standards>
469
+
470
+ Correct answers followed these instructions:
471
+ <Correct Answer Specific Quality Standards>
472
+ {CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS}
473
+ </Correct Answer Specific Quality Standards>
474
+
475
+ Incorrect answers followed these instructions:
476
+ <Incorrect Answer Specific Quality Standards>
477
+ {INCORRECT_ANSWER_PROMPT}
478
+ </Incorrect Answer Specific Quality Standards>
479
+
480
+ Here are some examples of high quality incorrect answer suggestions:
481
+ <incorrect_answer_examples>
482
+ {INCORRECT_ANSWER_EXAMPLES}
483
+ </incorrect_answer_examples>
484
+
485
+ Words to avoid:
486
+ <Words To Avoid>
487
+ AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
488
+ Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
489
+ These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
490
+ More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
491
+ </Words To Avoid>
492
+
493
+
494
+ <Answer Feedback Quality Standards>
495
+ {ANSWER_FEEDBACK_QUALITY_STANDARDS}
496
+ </Answer Feedback Quality Standards>
497
+ </GUIDELINES>
498
+ </RANKING_CRITERIA>
499
+
500
+
501
+ <IMPORTANT RANKING INSTRUCTIONS>
502
+ 1. DO NOT change the question with ID=1 (if present).
503
+ 2. Rank ONLY the questions listed below.
504
+ 3. Return a JSON array with each question's original ID and its rank (2, 3, 4, etc.).
505
+ 4. The best question should have rank 2 (since rank 1 is reserved).
506
+ 5. Consider clarity, specificity, alignment with the learning objectives, and how well each question follows the criteria above.
507
+ 6. CRITICAL: You MUST return ALL questions that were provided for ranking. Do not omit any questions. Each question must be assigned a unique rank.
508
+ 7. CRITICAL: Each question must have a UNIQUE rank. No two questions can have the same rank.
509
+
510
+ <CRITICAL INSTRUCTION - READ CAREFULLY>
511
+ YOU MUST RETURN ALL QUESTIONS THAT WERE PROVIDED FOR RANKING.
512
+ If you receive 30 questions to rank, you must return all 30 questions in your response.
513
+ DO NOT OMIT ANY QUESTIONS.
514
+ EACH QUESTION MUST HAVE A UNIQUE RANK (2, 3, 4, 5, etc. with no duplicates).
515
+ </CRITICAL INSTRUCTION - READ CAREFULLY>
516
+ Your response must be in the following JSON format. Each question must include ALL of the following fields:
517
+
518
+ [
519
+ {{
520
+ "id": int,
521
+ "question_text": str,
522
+ "options": list[dict],
523
+ "learning_objective": str,
524
+ "learning_objective_id": int,
525
+ "correct_answer": str,
526
+ "source_reference": list[str] or str,
527
+ "judge_feedback": str or null,
528
+ "approved": bool or null,
529
+ "rank": int,
530
+ "ranking_reasoning": str,
531
+ "in_group": bool,
532
+ "group_members": list[int],
533
+ "best_in_group": bool
534
+ }},
535
+ ...
536
+ ]
537
+ <RANKING EXAMPLE>
538
+ {
539
+ "id": 2,
540
+ "question_text": "What is the primary purpose of AI agents?",
541
+ "options": [...],
542
+ "learning_objective_id": 3,
543
+ "learning_objective": "Describe the main applications of AI agents.",
544
+ "correct_answer": "To automate tasks and make decisions",
545
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
546
+ "judge_feedback": "This question effectively tests understanding of AI agent applications.",
547
+ "approved": true,
548
+ "rank": 3,
549
+ "ranking_reasoning": "Clear question that tests understanding of AI agents, but could be more specific.",
550
+ "in_group": false,
551
+ "group_members": [2],
552
+ "best_in_group": true
553
+ }
554
+
555
+ {
556
+ "id": 3,
557
+ "question_text": "Which of the following best describes machine learning?",
558
+ "options": [...],
559
+ "learning_objective_id": 2,
560
+ "learning_objective": "Define machine learning.",
561
+ "correct_answer": "A subset of AI that enables systems to learn from data",
562
+ "source_reference": ["sc-Arize-C1-L2-eng.vtt"],
563
+ "judge_feedback": "Good fundamental question.",
564
+ "approved": true,
565
+ "rank": 2,
566
+ "ranking_reasoning": "Excellent clarity and directly addresses a fundamental concept.",
567
+ "in_group": true,
568
+ "group_members": [3, 8],
569
+ "best_in_group": true
570
+ }
571
+
572
+
573
+ {
574
+ "id": 4,
575
+ "question_text": "What is a neural network?",
576
+ "options": [...],
577
+ "learning_objective_id": 4,
578
+ "learning_objective": "Explain neural networks.",
579
+ "correct_answer": "A computing system inspired by biological neural networks",
580
+ "source_reference": ["sc-Arize-C1-L4-eng.vtt"],
581
+ "judge_feedback": "Basic definition question.",
582
+ "approved": true,
583
+ "rank": 4,
584
+ "ranking_reasoning": "Clear but very basic definition question without application context.",
585
+ "in_group": false,
586
+ "group_members": [4],
587
+ "best_in_group": true
588
+ }
589
+
590
+ </RANKING EXAMPLE>
591
+
592
+
593
+
594
+ </IMPORTANT RANKING INSTRUCTIONS>
595
+ """
596
+ GROUP_QUESTIONS_PROMPT = """
597
+ Group the following multiple-choice questions based on their quality as assessment items.
598
+
599
+ <GROUPING_INSTRUCTIONS>
600
+ 1. Identify groups of similar questions that test essentially the same concept or knowledge area.
601
+ 2. You can identify similar groups if the learning_objective.id is the same. If two questions have the same learning_objective.id assume they are testing the same concept.
602
+ 3. For each question, indicate whether it belongs to a group of similar questions by setting "in_group" to true or false.
603
+ 4. For questions that are part of a group, include a "group_members" field with a list of all IDs in that group (including the question itself). If a question has only one group member, set "group_members" to a list with the ID of the question itself.
604
+ 5. For each question, add a boolean field "best_in_group": set this to true for the highest-ranked (lowest rank number) question in each group, and false for all others in the group. For questions not in a group, set "best_in_group" to true by default.
605
+ 6. CRITICAL: You MUST return ALL questions that were provided for grouping. Do not omit any questions.
606
+ 7. CRITICAL: Each question must have a UNIQUE rank. No two questions can have the same rank.
607
+ Your response must be in the following JSON format. Each question must include ALL of the following fields:
608
+ </GROUPING_INSTRUCTIONS>
609
+ <CRITICAL INSTRUCTION - READ CAREFULLY>
610
+ YOU MUST RETURN ALL QUESTIONS THAT WERE PROVIDED FOR GROUPING.
611
+ If you receive 30 questions to group, you must return all 30 questions in your response.
612
+ DO NOT OMIT ANY QUESTIONS.
613
+ </CRITICAL INSTRUCTION - READ CAREFULLY>
614
+
615
+ Your response must be in the following JSON format. Each question must include ALL of the following fields:
616
+
617
+ [
618
+ {{
619
+ "id": int,
620
+ "question_text": str,
621
+ "options": list[dict],
622
+ "learning_objective_id": int,
623
+ "learning_objective": str,
624
+ "correct_answer": str,
625
+ "source_reference": list[str] or str,
626
+ "judge_feedback": str or null,
627
+ "approved": bool or null,
628
+ "in_group": bool,
629
+ "group_members": list[int],
630
+ "best_in_group": bool
631
+ }},
632
+ ...
633
+ ]
634
+ <Example>
635
+ [
636
+ {{
637
+ "id": 2,
638
+ "question_text": "What is the primary purpose of AI agents?",
639
+ "options": [
640
+ {{
641
+ "option_text": "To automate tasks and make decisions",
642
+ "is_correct": true,
643
+ "feedback": "Correct! AI agents are designed to automate tasks and make decisions based on their programming and environment."
644
+ }},
645
+ {{
646
+ "option_text": "To replace human workers entirely",
647
+ "is_correct": false,
648
+ "feedback": "Incorrect. While AI agents can automate certain tasks, they are not designed to replace humans entirely."
649
+ }},
650
+ {{
651
+ "option_text": "To process large amounts of data",
652
+ "is_correct": false,
653
+ "feedback": "Incorrect. While data processing is a capability of some AI systems, it's not the primary purpose of AI agents specifically."
654
+ }},
655
+ {{
656
+ "option_text": "To simulate human emotions",
657
+ "is_correct": false,
658
+ "feedback": "Incorrect. AI agents are not primarily designed to simulate human emotions."
659
+ }}
660
+ ],
661
+ "learning_objective_id": 3,
662
+ "learning_objective": "Describe the main applications of AI agents.",
663
+ "correct_answer": "To automate tasks and make decisions",
664
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
665
+ "judge_feedback": "This question effectively tests understanding of AI agent applications.",
666
+ "approved": true,
667
+ "in_group": true,
668
+ "group_members": [2, 5, 7],
669
+ "best_in_group": true
670
+ }}
671
+ ]
672
+ </Example>
673
+
674
+
675
+ <EXAMPLE OF COMPLETE GROUPING RESPONSE>
676
+ Here's an example of how to properly group a set of 5 questions:
677
+
678
+ Input questions with IDs: [2, 3, 4, 5, 6]
679
+
680
+ Correct output (all questions returned with unique ranks):
681
+ [
682
+ {
683
+ "id": 2,
684
+ "question_text": "What is the primary purpose of AI agents?",
685
+ "options": [...],
686
+ "learning_objective_id": 3,
687
+ "learning_objective": "Describe the main applications of AI agents.",
688
+ "correct_answer": "To automate tasks and make decisions",
689
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
690
+ "judge_feedback": "This question effectively tests understanding of AI agent applications.",
691
+ "approved": true,
692
+ "in_group": true,
693
+ "group_members": [2, 5],
694
+ "best_in_group": true
695
+ },
696
+ {
697
+ "id": 3,
698
+ "question_text": "Which of the following best describes machine learning?",
699
+ "options": [...],
700
+ "learning_objective_id": 2,
701
+ "learning_objective": "Define machine learning.",
702
+ "correct_answer": "A subset of AI that enables systems to learn from data",
703
+ "source_reference": ["sc-Arize-C1-L2-eng.vtt"],
704
+ "judge_feedback": "Good fundamental question.",
705
+ "approved": true,
706
+ "in_group": false,
707
+ "group_members": [3],
708
+ "best_in_group": true
709
+ },
710
+ {
711
+ "id": 4,
712
+ "question_text": "What is a neural network?",
713
+ "options": [...],
714
+ "learning_objective_id": 4,
715
+ "learning_objective": "Explain neural networks.",
716
+ "correct_answer": "A computing system inspired by biological neural networks",
717
+ "source_reference": ["sc-Arize-C1-L4-eng.vtt"],
718
+ "judge_feedback": "Basic definition question.",
719
+ "approved": true,
720
+ "in_group": false,
721
+ "group_members": [4],
722
+ "best_in_group": true
723
+ },
724
+ {
725
+ "id": 5,
726
+ "question_text": "How do AI agents help in automation?",
727
+ "options": [...],
728
+ "learning_objective_id": 3,
729
+ "learning_objective": "Describe the main applications of AI agents.",
730
+ "correct_answer": "By performing tasks based on programmed rules or learned patterns",
731
+ "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
732
+ "judge_feedback": "Related to question 2 but more specific.",
733
+ "approved": true,
734
+ "in_group": true,
735
+ "group_members": [2, 5],
736
+ "best_in_group": false
737
+ },
738
+ {
739
+ "id": 6,
740
+ "question_text": "What is deep learning?",
741
+ "options": [...],
742
+ "learning_objective_id": 5,
743
+ "learning_objective": "Differentiate deep learning from traditional machine learning.",
744
+ "correct_answer": "A subset of machine learning using multi-layered neural networks",
745
+ "source_reference": ["sc-Arize-C1-L5-eng.vtt"],
746
+ "judge_feedback": "Good definition question.",
747
+ "approved": true,
748
+ "in_group": false,
749
+ "group_members": [6],
750
+ "best_in_group": true
751
+ }
752
+ ]
753
+
754
+ Notice that:
755
+ 1. ALL 5 input questions are returned in the output
756
+ 2. Each question has a UNIQUE rank (2, 3, 4, 5, 6)
757
+ 3. Questions 2 and 5 are identified as being in the same group
758
+ 4. Question 2 is marked as best_in_group=true while question 5 has best_in_group=false
759
+ 5. Questions that aren't in groups with other questions have group_members containing only their own ID
760
+ </EXAMPLE OF COMPLETE RANKING RESPONSE>
761
+
762
+
763
+
764
+
765
+ </IMPORTANT RANKING INSTRUCTIONS>
766
+ """
767
+ RULES_FOR_SECOND_CLAUSES = """
768
+ Avoid contradictory second clauses - Don't add qualifying phrases that explicitly negate the main benefit or create obvious limitations
769
+
770
+ Bad: "Human feedback enables complex reasoning, allowing workflows to handle cases without any human involvement" (contradicts the premise of human feedback)
771
+ Fixed: "Human feedback enables the agent to develop more sophisticated reasoning patterns for handling complex document structures" (stays positive, just misdirects the benefit)
772
+
773
+
774
+
775
+ Additional guidance:
776
+
777
+ Keep second clauses supportive - If you include a second clause, it should reinforce the incorrect direction, not contradict it
778
+
779
+ Bad: "Context awareness helps agents understand code, but prevents them from adapting to new situations"
780
+ Good: "Context awareness helps agents understand code by focusing on the most recently modified files and functions"
781
+
782
+
783
+ Focus on misdirection, not negation - Wrong answers should point toward a plausible but incorrect benefit, not explicitly limit or negate the concept
784
+
785
+ Bad: "Version control tracks changes but cannot recall previous versions"
786
+ Good: "Version control tracks changes to ensure compatibility across different development environments"
787
+
788
+
789
+ Maintain positive framing - All options should sound like genuine benefits, just targeting the wrong aspect
790
+
791
+ Bad: "Transfer learning reuses features but freezes all weights, preventing any updates"
792
+ Good: "Transfer learning reuses features to establish consistent baseline performance across different model architectures"
793
+
794
+
795
+
796
+ Better versions of those options:
797
+
798
+ B: "Human feedback enables the agent to develop more sophisticated automated reasoning capabilities for handling complex document analysis tasks."
799
+ C: "Human feedback provides the agent with contextual understanding that enhances its decision-making framework for future similar documents."
800
+ D: "Human feedback allows the agent to establish consistent formatting and presentation standards across all processed documents."
801
+
802
+
803
+
804
+
805
+ * Look for explicit negations using "without," "rather than," "instead of," "but not," "but", "except", or "excluding" that directly contradict the core concept
806
+
807
+ Avoid negating phrases that explicitly exclude the main concept:
808
+ - Bad: "provides simple Q&A without automating structured tasks"
809
+ - Good: "provides simple Q&A and basic document classification capabilities"
810
+
811
+ - Bad: "focuses on efficiency rather than handling complex processing"
812
+ - Good: "focuses on optimizing document throughput and processing speed"
813
+
814
+ - Bad: "uses pre-defined rules with agents handling only basic tasks"
815
+ - Good: "uses standardized rule frameworks with agents managing document classification"
816
+
817
+ It is very important to consider the following:
818
+
819
+ <VERY IMPORTANT>
820
+ IMMEDIATE RED FLAGS - Mark as needing regeneration if ANY option contains:
821
+ - "but not necessarily"
822
+ - "at the expense of"
823
+ - "sometimes at the expense"
824
+ - "rather than [core concept]"
825
+ - "ensuring X rather than Y"
826
+ - "without necessarily"
827
+ - "but has no impact on"
828
+
829
+
830
+ </VERY IMPORTANT>
831
+ """
832
+ IMMEDIATE_RED_FLAGS = """
833
+ IMMEDIATE RED FLAGS - Mark as needing regeneration if ANY option contains:
834
+
835
+ CONTRADICTORY SECOND CLAUSES:
836
+ - "but not necessarily"
837
+ - "at the expense of"
838
+ - "sometimes at the expense"
839
+ - "rather than [core concept]"
840
+ - "ensuring X rather than Y"
841
+ - "without necessarily"
842
+ - "but has no impact on"
843
+ - "but cannot"
844
+ - "but prevents"
845
+ - "but limits"
846
+ - "but reduces"
847
+
848
+ EXPLICIT NEGATIONS OF CORE CONCEPTS:
849
+ - "without automating"
850
+ - "without incorporating"
851
+ - "without using"
852
+ - "without supporting"
853
+ - "preventing [main benefit]"
854
+ - "limiting [main capability]"
855
+ - "reducing the need for [core function]"
856
+
857
+ OPPOSITE DESCRIPTIONS:
858
+ - "fixed steps" or "rigid sequences" (when describing flexible systems)
859
+ - "manual intervention" (when describing automation)
860
+ - "passive components" (when describing active agents)
861
+ - "simple question answering" (when describing complex processing)
862
+ - "predefined rules" (when describing adaptive systems)
863
+
864
+ ABSOLUTE/COMPARATIVE TERMS TO AVOID:
865
+ - "always," "never," "exclusively," "purely," "solely," "only"
866
+ - "primarily," "mainly," "instead of," "as opposed to"
867
+ - "all," "every," "none," "nothing," "must," "required"
868
+ - "completely," "totally," "utterly," "impossible"
869
+
870
+ HEDGING THAT CREATES OBVIOUS LIMITATIONS:
871
+ - "sometimes," "occasionally," "might," "could potentially"
872
+ - "generally," "typically," "usually" (when limiting capabilities)
873
+ - "to some extent," "partially," "somewhat"
874
+
875
+ TRADE-OFF LANGUAGE THAT CREATES FALSE DICHOTOMIES:
876
+ - "focusing on X instead of Y"
877
+ - "prioritizing X over Y"
878
+ - "emphasizing X rather than Y"
879
+ - "optimizing for X at the cost of Y"
880
+
881
+
882
+ Check for descriptions of opposite approaches:
883
+ Identify when an answer describes a fundamentally different methodology
884
+ For example, "intuition-based" vs "evaluation-based", "feature-driven" vs "evaluation-driven"
885
+
886
+ """
quiz_generator/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .generator import QuizGenerator
2
+
3
+ __all__ = ['QuizGenerator']
quiz_generator/assessment.py ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import concurrent.futures
2
+ from typing import List
3
+ from openai import OpenAI
4
+ from models import Assessment, MultipleChoiceQuestion, MultipleChoiceOption
5
+ from .question_generation import generate_multiple_choice_question
6
+ from .question_improvement import judge_question_quality
7
+ from .question_ranking import rank_questions
8
+ import time
9
+ import threading
10
+ import json
11
+
12
+
13
+ def _get_run_manager():
14
+ """Get run manager if available, otherwise return None."""
15
+ try:
16
+ from ui.run_manager import get_run_manager
17
+ return get_run_manager()
18
+ except:
19
+ return None
20
+
21
+ def generate_assessment(client: OpenAI, model: str, temperature: float, learning_objective_generator, file_contents: List[str], num_objectives: int) -> Assessment:
22
+ """
23
+ Generate a complete assessment with learning objectives and questions.
24
+
25
+ Args:
26
+ file_contents: List of file contents with source tags
27
+ num_objectives: Number of learning objectives to generate
28
+
29
+ Returns:
30
+ Complete assessment
31
+ """
32
+ print(f"Generating assessment with {num_objectives} learning objectives")
33
+ start_time = time.time()
34
+
35
+ # Generate learning objectives using the new optimized workflow
36
+ # This generates base objectives, groups them, and generates incorrect answers only for best-in-group
37
+ result = learning_objective_generator.generate_and_group_learning_objectives(file_contents, num_objectives)
38
+
39
+ # Use the enhanced best-in-group objectives for question generation
40
+ learning_objectives = result["best_in_group"]
41
+
42
+ # Generate questions for each learning objective in parallel
43
+ questions = generate_questions_in_parallel(client, model, temperature, learning_objectives, file_contents)
44
+
45
+
46
+ # Rank questions based on quality criteria
47
+ ranked_questions = rank_questions(questions, file_contents)
48
+ print(f"Ranked {len(ranked_questions)} questions")
49
+
50
+
51
+ # Create assessment
52
+ assessment = Assessment(
53
+ learning_objectives=learning_objectives,
54
+ questions=ranked_questions
55
+ )
56
+
57
+ end_time = time.time()
58
+ print(f"Assessment generation completed in {end_time - start_time:.2f} seconds")
59
+
60
+ return assessment
61
+
62
+
63
+ def generate_questions_in_parallel(client: OpenAI, model: str, temperature: float, learning_objectives: List['RankedLearningObjective'], file_contents: List[str]) -> List[MultipleChoiceQuestion]:
64
+ """
65
+ Generate multiple choice questions in parallel for each learning objective.
66
+
67
+ Args:
68
+ learning_objectives: List of learning objectives
69
+ file_contents: List of file contents with source tags
70
+
71
+ Returns:
72
+ List of generated questions
73
+ """
74
+ run_manager = _get_run_manager()
75
+
76
+ if run_manager:
77
+ run_manager.log(f"Generating {len(learning_objectives)} questions in parallel", level="INFO")
78
+ start_time = time.time()
79
+
80
+ questions = []
81
+
82
+ # Function to generate a single question based on a learning objective
83
+ def generate_question_for_objective(objective, idx):
84
+ try:
85
+ thread_id = threading.get_ident()
86
+ if run_manager:
87
+ run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Starting work on objective: {objective.learning_objective[:50]}...", level="DEBUG")
88
+
89
+ # Generate the question
90
+ if run_manager:
91
+ run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Generating question...", level="DEBUG")
92
+ question = generate_multiple_choice_question(client, model, temperature, objective, file_contents)
93
+
94
+ # Judge question quality
95
+ if run_manager:
96
+ run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Judging question quality...", level="DEBUG")
97
+ approved, feedback = judge_question_quality(client, model, temperature, question)
98
+
99
+ # Update question with judgment
100
+ question.approved = approved
101
+ question.judge_feedback = feedback
102
+
103
+ if run_manager:
104
+ run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Question completed with approval: {approved}", level="DEBUG")
105
+ return question
106
+ except Exception as e:
107
+ if run_manager:
108
+ run_manager.log(f"Worker {idx}: Error generating question: {str(e)}", level="ERROR")
109
+ # Create a placeholder question on error
110
+ options = [
111
+ MultipleChoiceOption(option_text=f"Option {i}", is_correct=(i==0), feedback="Feedback")
112
+ for i in range(4)
113
+ ]
114
+ error_question = MultipleChoiceQuestion(
115
+ id=idx,
116
+ question_text=f"Error generating question: {str(e)}",
117
+ options=options,
118
+ learning_objective_id=objective.id,
119
+ source_reference=objective.source_reference,
120
+ approved=False,
121
+ judge_feedback=f"Error: {str(e)}"
122
+ )
123
+ return error_question
124
+
125
+ # Use ThreadPoolExecutor for parallel execution
126
+ max_workers = min(len(learning_objectives), 5) # Limit to 5 concurrent workers
127
+ if run_manager:
128
+ run_manager.log(f"PARALLEL: Starting ThreadPoolExecutor with {max_workers} workers", level="INFO")
129
+
130
+ with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
131
+ # Submit tasks
132
+ if run_manager:
133
+ run_manager.log(f"PARALLEL: Submitting {len(learning_objectives)} tasks to thread pool", level="DEBUG")
134
+ future_to_idx = {executor.submit(generate_question_for_objective, objective, i): i
135
+ for i, objective in enumerate(learning_objectives)}
136
+
137
+ if run_manager:
138
+ run_manager.log(f"PARALLEL: All tasks submitted, waiting for completion", level="DEBUG")
139
+
140
+ # Collect results as they complete
141
+ for future in concurrent.futures.as_completed(future_to_idx):
142
+ idx = future_to_idx[future]
143
+ try:
144
+ question = future.result()
145
+ questions.append(question)
146
+ if run_manager:
147
+ run_manager.log(f"Completed question {idx+1}/{len(learning_objectives)}", level="INFO")
148
+ except Exception as e:
149
+ if run_manager:
150
+ run_manager.log(f"Question {idx+1} generation failed: {str(e)}", level="ERROR")
151
+ # Add a placeholder for failed questions
152
+ options = [
153
+ MultipleChoiceOption(option_text=f"Option {i}", is_correct=(i==0), feedback="Feedback")
154
+ for i in range(4)
155
+ ]
156
+ error_question = MultipleChoiceQuestion(
157
+ id=idx,
158
+ question_text=f"Failed to generate question: {str(e)}",
159
+ options=options,
160
+ learning_objective_id=learning_objectives[idx].id,
161
+ learning_objective=getattr(learning_objectives[idx], "learning_objective", "N/A"),
162
+ correct_answer="N/A",
163
+ source_reference=learning_objectives[idx].source_reference,
164
+ judge_feedback=f"Error: {str(e)}",
165
+ approved=False
166
+ )
167
+ questions.append(error_question)
168
+
169
+
170
+ end_time = time.time()
171
+ if run_manager:
172
+ run_manager.log(f"Question generation completed in {end_time - start_time:.2f} seconds", level="INFO")
173
+
174
+ return questions
175
+
176
+
177
+ def save_assessment_to_json(assessment: Assessment, output_path: str) -> None:
178
+ """
179
+ Save assessment to a JSON file.
180
+
181
+ Args:
182
+ assessment: Assessment to save
183
+ output_path: Path to save the assessment to
184
+ """
185
+ # Convert assessment to dict
186
+ assessment_dict = assessment.model_dump()
187
+
188
+ # Save to file
189
+ with open(output_path, "w") as f:
190
+ json.dump(assessment_dict, f, indent=2)
quiz_generator/feedback_questions.py ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from openai import OpenAI
2
+ from models import MultipleChoiceQuestionFromFeedback, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
3
+ import os
4
+ from typing import List
5
+
6
+
7
+ def generate_multiple_choice_question_from_feedback(client: OpenAI, model: str, temperature: float, feedback: str, file_contents: List[str]) -> MultipleChoiceQuestionFromFeedback:
8
+ """
9
+ Generate a multiple choice question based on user feedback.
10
+
11
+ Args:
12
+ feedback: User feedback or guidance
13
+ file_contents: List of file contents with source tags
14
+
15
+ Returns:
16
+ Generated multiple choice question
17
+ """
18
+ print(f"Processing feedback: {feedback[:100]}...")
19
+
20
+ # Step 1: Extract structured information from feedback using LLM
21
+ extraction_prompt = f"""
22
+ Extract the following information from the user's feedback to create a multiple choice question:
23
+ 1. Any source references mentioned
24
+ 2. The learning objective
25
+ 3. The difficulty level
26
+ 4. The original question text (if present)
27
+ 5. Any specific feedback about what to change or improve
28
+
29
+ <QUESTION FOLLOWED BY USER CRITICISM>
30
+ {feedback}
31
+ </QUESTION FOLLOWED BY USER CRITICISM>
32
+ """
33
+
34
+ try:
35
+ # Extract structured information
36
+ # Different parameter handling for different model families
37
+ params = {
38
+ "model": model,
39
+ "response_model": MultipleChoiceQuestionFromFeedback,
40
+ "messages": [
41
+ {"role": "system", "content": "You are an expert at extracting structured information from text to prepare for question generation."},
42
+ {"role": "user", "content": extraction_prompt}
43
+ ]
44
+ }
45
+
46
+ # Add temperature parameter only if not using o-series models
47
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
48
+ params["temperature"] = temperature
49
+
50
+ completion = client.beta.chat.completions.parse(
51
+ model=model,
52
+ messages=[
53
+ {"role": "system", "content": "You are an expert at extracting structured information from text to prepare for question generation."},
54
+ {"role": "user", "content": extraction_prompt}
55
+ ],
56
+ temperature=params["temperature"],
57
+ response_format=MultipleChoiceQuestionFromFeedback
58
+ )
59
+ extraction = completion.choices[0].message.parsed
60
+ print(f"Extracted question structure")
61
+
62
+ # Step 2: Find relevant content based on extracted source references
63
+ source_references = []
64
+ if extraction.source_reference:
65
+ if isinstance(extraction.source_reference, list):
66
+ source_references = extraction.source_reference
67
+ else:
68
+ source_references = [extraction.source_reference]
69
+
70
+ # If no source references extracted, get all sources from file_contents
71
+ if not source_references:
72
+ for file_content in file_contents:
73
+ source_match = re.search(r"<source file='([^']+)'>", file_content)
74
+ if source_match:
75
+ source = source_match.group(1)
76
+ source_references.append(source)
77
+ print(f"Found source file: {source}")
78
+
79
+ # Find relevant content based on source references
80
+ combined_content = ""
81
+ for source_file in source_references:
82
+ source_found = False
83
+ for file_content in file_contents:
84
+ # Look for the XML source tag with the matching filename
85
+ if f"<source file='{source_file}'>" in file_content:
86
+ print(f"Found matching source content for {source_file}")
87
+ if combined_content:
88
+ combined_content += "\n\n"
89
+ combined_content += file_content
90
+ source_found = True
91
+ break
92
+
93
+ # If no exact match found, try a more flexible match
94
+ if not source_found:
95
+ print(f"No exact match for {source_file}, looking for partial matches")
96
+ for file_content in file_contents:
97
+ if source_file in file_content:
98
+ print(f"Found partial match for {source_file}")
99
+ if combined_content:
100
+ combined_content += "\n\n"
101
+ combined_content += file_content
102
+ source_found = True
103
+ break
104
+
105
+ # If still no matching content, use all file contents combined
106
+ if not combined_content:
107
+ print(f"No content found for any source files, using all content")
108
+ combined_content = "\n\n".join(file_contents)
109
+
110
+ # Step 3: Generate new question using extracted information and content
111
+ generation_prompt = f"""
112
+ Create a multiple choice question based on the following information:
113
+
114
+ USER CRITICISM:
115
+ {extraction.feedback}
116
+
117
+ EXTRACTED QUESTION STRUCTURE:
118
+ Question: {extraction.question_text}
119
+ Learning Objective: {extraction.learning_objective}
120
+
121
+ COURSE CONTENT:
122
+ {combined_content}
123
+
124
+ INSTRUCTIONS:
125
+ 1. Create a question that addresses the user's critique or criticism. This is the top priority. Your response should align with the user's critique or criticism.
126
+ 2. Base your question ONLY on the COURSE CONTENT provided above
127
+ 3. The question must be clear, specific, and unambiguous
128
+ 4. Include EXACTLY 4 options labeled A, B, C, and D
129
+ 5. Have EXACTLY 1 correct answer
130
+ 6. All options MUST include detailed feedback explaining why the answer is correct or incorrect
131
+ 7. For the correct answer, include positive feedback that reinforces the concept
132
+ 8. For incorrect answers, provide informative feedback explaining the misconception
133
+ 9. All options should be plausible - no obviously wrong answers
134
+ 10. Make sure the question tests understanding, not just memorization
135
+ 11. Only refer to specific products if absolutely necessary
136
+ 12. Questions should prioritize core Competencies: Identify the most critical knowledge and skills students must master
137
+ 13. Questions should align with Course Purpose: Ensure objectives directly support the overarching goals of the course
138
+ 14. Questions should Consider Long-term Value: Focus on enduring understandings that students will use beyond the course
139
+
140
+ Available source files: {', '.join([os.path.basename(src) for src in source_references])}
141
+
142
+ IMPORTANT: Every option MUST have feedback. This is required.
143
+ """
144
+
145
+ # Generate new question
146
+ # Different parameter handling for different model families
147
+ params = {
148
+ "model": model,
149
+ "response_model": MultipleChoiceQuestionFromFeedback,
150
+ "messages": [
151
+ {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
152
+ {"role": "user", "content": generation_prompt}
153
+ ]
154
+ }
155
+
156
+ # Add temperature parameter only if not using o-series models
157
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
158
+ params["temperature"] = temperature
159
+
160
+ completion = client.beta.chat.completions.parse(
161
+ model=model,
162
+ messages=[
163
+ {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
164
+ {"role": "user", "content": generation_prompt}
165
+ ],
166
+ temperature=temperature,
167
+ response_format=MultipleChoiceQuestionFromFeedback
168
+ )
169
+ response = completion.choices[0].message.parsed
170
+
171
+ # Set ID and source reference if not already set
172
+ response.id = 1
173
+ if not response.source_reference:
174
+ response.source_reference = [os.path.basename(src) for src in source_references]
175
+
176
+ # Set learning objective if not already set
177
+ if not response.learning_objective and extraction.learning_objective:
178
+ response.learning_objective = extraction.learning_objective
179
+
180
+ # Set feedback from the original feedback
181
+ response.feedback = extraction.feedback
182
+
183
+ # Verify all options have feedback
184
+ for i, option in enumerate(response.options):
185
+ if not option.feedback or option.feedback.strip() == "":
186
+ if option.is_correct:
187
+ option.feedback = "Good job! This is the correct answer."
188
+ else:
189
+ option.feedback = f"This answer is incorrect. Please review the material again."
190
+
191
+ return response
192
+
193
+ except Exception as e:
194
+ print(f"Error generating question: {e}")
195
+ # Create a fallback question
196
+ options = [
197
+ MultipleChoiceOption(
198
+ option_text=f"Option {chr(65+i)}",
199
+ is_correct=(i==0),
200
+ feedback=f"{'Correct' if i==0 else 'Incorrect'} answer."
201
+ ) for i in range(4)
202
+ ]
203
+ return MultipleChoiceQuestionFromFeedback(
204
+ id=1,
205
+ question_text=f"Question based on feedback: {feedback[:50]}...",
206
+ options=options,
207
+ learning_objective="Understanding key concepts from the course material",
208
+ source_reference=["unknown"],
209
+ feedback=extraction.feedback
210
+ )
quiz_generator/generator.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from openai import OpenAI
3
+ from learning_objective_generator import LearningObjectiveGenerator
4
+ from learning_objective_generator.grouping_and_ranking import group_base_learning_objectives
5
+ from .question_generation import generate_multiple_choice_question
6
+ from .question_improvement import (
7
+ should_regenerate_incorrect_answers, regenerate_incorrect_answers, judge_question_quality
8
+ )
9
+ from .question_ranking import rank_questions, group_questions
10
+ from .feedback_questions import generate_multiple_choice_question_from_feedback
11
+ from .assessment import generate_assessment, generate_questions_in_parallel, save_assessment_to_json
12
+
13
+ class QuizGenerator:
14
+ """Simple orchestrator for quiz generation."""
15
+
16
+ def __init__(self, api_key: str, model: str = "gpt-5", temperature: float = 1.0):
17
+ self.client = OpenAI(api_key=api_key)
18
+ self.model = model
19
+ self.temperature = temperature
20
+ self.learning_objective_generator = LearningObjectiveGenerator(
21
+ api_key=api_key, model=model, temperature=temperature
22
+ )
23
+
24
+ def generate_base_learning_objectives(self, file_contents: List[str], num_objectives: int, incorrect_answer_model: str = None):
25
+ """Generate only base learning objectives (no grouping, no incorrect answers). This allows the UI to collect objectives from multiple runs before grouping."""
26
+ return self.learning_objective_generator.generate_base_learning_objectives(
27
+ file_contents, num_objectives
28
+ )
29
+
30
+ def generate_lo_incorrect_answer_options(self, file_contents, base_objectives, model_override=None):
31
+ """Generate incorrect answer options for the given base learning objectives (wrapper for LearningObjectiveGenerator)."""
32
+ return self.learning_objective_generator.generate_incorrect_answer_options(
33
+ file_contents, base_objectives, model_override
34
+ )
35
+
36
+ def group_base_learning_objectives(self, base_learning_objectives, file_contents: List[str]):
37
+ """Group base learning objectives and identify best in group."""
38
+ return group_base_learning_objectives(
39
+ self.client, self.model, self.temperature, base_learning_objectives, file_contents
40
+ )
41
+
42
+ def generate_multiple_choice_question(self, learning_objective, file_contents: List[str]):
43
+ return generate_multiple_choice_question(
44
+ self.client, self.model, self.temperature, learning_objective, file_contents
45
+ )
46
+
47
+ def should_regenerate_incorrect_answers(self, question, file_contents: List[str], model_name: str = "gpt-5-mini"):
48
+ return should_regenerate_incorrect_answers(
49
+ self.client, question, file_contents, model_name
50
+ )
51
+
52
+ def regenerate_incorrect_answers(self, questions, file_contents: List[str]):
53
+ return regenerate_incorrect_answers(
54
+ self.client, self.model, self.temperature, questions, file_contents
55
+ )
56
+
57
+ def rank_questions(self, questions, file_contents: List[str]):
58
+ return rank_questions(
59
+ self.client, self.model, self.temperature, questions, file_contents
60
+ )
61
+
62
+ def group_questions(self, questions, file_contents: List[str]):
63
+ return group_questions(
64
+ self.client, self.model, self.temperature, questions, file_contents
65
+ )
66
+
67
+ def generate_multiple_choice_question_from_feedback(self, feedback: str, file_contents: List[str]):
68
+ return generate_multiple_choice_question_from_feedback(
69
+ self.client, self.model, self.temperature, feedback, file_contents
70
+ )
71
+
72
+ def judge_question_quality(self, question):
73
+ return judge_question_quality(
74
+ self.client, self.model, self.temperature, question
75
+ )
76
+
77
+ def generate_assessment(self, file_contents: List[str], num_objectives: int):
78
+ return generate_assessment(
79
+ self.client, self.model, self.temperature,
80
+ self.learning_objective_generator, file_contents, num_objectives
81
+ )
82
+
83
+ def generate_questions_in_parallel(self, learning_objectives, file_contents: List[str]):
84
+ return generate_questions_in_parallel(
85
+ self.client, self.model, self.temperature, learning_objectives, file_contents
86
+ )
87
+
88
+ def save_assessment_to_json(self, assessment, output_path: str):
89
+ return save_assessment_to_json(assessment, output_path)
quiz_generator/question_generation.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from openai import OpenAI
3
+ from models import MultipleChoiceQuestion, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
4
+ from prompts.questions import (
5
+ GENERAL_QUALITY_STANDARDS, MULTIPLE_CHOICE_STANDARDS,
6
+ EXAMPLE_QUESTIONS, QUESTION_SPECIFIC_QUALITY_STANDARDS,
7
+ CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS,
8
+ ANSWER_FEEDBACK_QUALITY_STANDARDS,
9
+ )
10
+ from prompts.incorrect_answers import (
11
+ INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
12
+ )
13
+
14
+
15
+ def _get_run_manager():
16
+ """Get run manager if available, otherwise return None."""
17
+ try:
18
+ from ui.run_manager import get_run_manager
19
+ return get_run_manager()
20
+ except:
21
+ return None
22
+
23
+ def generate_multiple_choice_question(client: OpenAI,
24
+ model: str,
25
+ temperature: float,
26
+ learning_objective: 'RankedLearningObjective',
27
+ file_contents: List[str]) -> MultipleChoiceQuestion:
28
+ """
29
+ Generate a multiple choice question for a learning objective.
30
+
31
+ Args:
32
+ learning_objective: Learning objective to generate a question for
33
+ file_contents: List of file contents with source tags
34
+
35
+ Returns:
36
+ Generated multiple choice question
37
+ """
38
+ run_manager = _get_run_manager()
39
+
40
+ # Handle source references (could be string or list)
41
+ source_references = learning_objective.source_reference
42
+ if isinstance(source_references, str):
43
+ source_references = [source_references]
44
+
45
+ if run_manager:
46
+ run_manager.log(f"Looking for content from source files: {source_references}", level="DEBUG")
47
+
48
+ # Simply collect all content that matches any of the source references
49
+ combined_content = ""
50
+ for source_file in source_references:
51
+ source_found = False
52
+ for file_content in file_contents:
53
+ # Look for the XML source tag with the matching filename
54
+ if f"<source file='{source_file}'>" in file_content:
55
+ if run_manager:
56
+ run_manager.log(f"Found matching source content for {source_file}", level="DEBUG")
57
+ if combined_content:
58
+ combined_content += "\n\n"
59
+ combined_content += file_content
60
+ source_found = True
61
+ break
62
+
63
+ # If no exact match found, try a more flexible match
64
+ if not source_found:
65
+ if run_manager:
66
+ run_manager.log(f"No exact match for {source_file}, looking for partial matches", level="DEBUG")
67
+ for file_content in file_contents:
68
+ if source_file in file_content:
69
+ if run_manager:
70
+ run_manager.log(f"Found partial match for {source_file}", level="DEBUG")
71
+ if combined_content:
72
+ combined_content += "\n\n"
73
+ combined_content += file_content
74
+ source_found = True
75
+ break
76
+
77
+ # If still no matching content, use all file contents combined
78
+ if not combined_content:
79
+ if run_manager:
80
+ run_manager.log(f"No content found for any source files, using all content", level="DEBUG")
81
+ combined_content = "\n\n".join(file_contents)
82
+
83
+ # Add multi-source instruction if needed
84
+ multi_source_instruction = ""
85
+ if len(source_references) > 1:
86
+ multi_source_instruction = """
87
+ <IMPORTANT FOR MULTI-SOURCE QUESTIONS>
88
+ This learning objective spans multiple sources. Your question should:
89
+ 1. Synthesize information across these sources
90
+ 2. Test understanding of overarching themes or connections
91
+ 3. Require knowledge from multiple sources to answer correctly
92
+ </IMPORTANT FOR MULTI-SOURCE QUESTIONS>
93
+ """
94
+
95
+ # Create the prompt
96
+ prompt = f"""
97
+ Create a multiple choice question based on the following learning objective:
98
+
99
+ <LEARNING OBJECTIVE>
100
+ {learning_objective.learning_objective}
101
+ </LEARNING OBJECTIVE>
102
+
103
+ The correct answer to this is
104
+
105
+ <CORRECT ANSWER>
106
+ {learning_objective.correct_answer}
107
+ </CORRECT ANSWER>
108
+
109
+ Follow these important instructions for writing the quiz question:
110
+
111
+ <INSTRUCTIONS>
112
+ <General Quality Standards>
113
+ {GENERAL_QUALITY_STANDARDS}
114
+ </General Quality Standards>
115
+
116
+ <Multiple Choice Specific Standards>
117
+ {MULTIPLE_CHOICE_STANDARDS}
118
+ </Multiple Choice Specific Standards>
119
+
120
+ <Example Questions>
121
+ {EXAMPLE_QUESTIONS}
122
+ </Example Questions>
123
+
124
+ <Question Specific Quality Standards>
125
+ {QUESTION_SPECIFIC_QUALITY_STANDARDS}
126
+ </Question Specific Quality Standards>
127
+
128
+ <Correct Answer Specific Quality Standards>
129
+ {CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS}
130
+ </Correct Answer Specific Quality Standards>
131
+
132
+ These are the incorrect answer options:
133
+
134
+ <INCORRECT_ANSWER_OPTIONS>
135
+ {learning_objective.incorrect_answer_options}
136
+ </INCORRECT_ANSWER_OPTIONS>
137
+
138
+ Incorrect answers should follow the following examples with explanations:
139
+
140
+ Here are some examples of high quality incorrect answer options for each learning objective:
141
+ <incorrect_answer_examples>
142
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
143
+ </incorrect_answer_examples>
144
+
145
+ IMPORTANT:
146
+ AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
147
+ Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
148
+ These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
149
+ More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
150
+
151
+
152
+
153
+ <Answer Feedback Quality Standards>
154
+ {ANSWER_FEEDBACK_QUALITY_STANDARDS}
155
+ </Answer Feedback Quality Standards>
156
+
157
+ </INSTRUCTIONS>
158
+
159
+ {multi_source_instruction}
160
+
161
+ Below the course content that the quiz question is based on:
162
+
163
+ <COURSE CONTENT>
164
+ {combined_content}
165
+ </COURSE CONTENT>
166
+ """
167
+
168
+ # Generate question using instructor
169
+ try:
170
+
171
+ params = {
172
+ "model": model,
173
+ "messages": [
174
+ {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
175
+ {"role": "user", "content": prompt}
176
+ ],
177
+ "response_format": MultipleChoiceQuestion
178
+ }
179
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
180
+ params["temperature"] = temperature
181
+
182
+ completion = client.beta.chat.completions.parse(**params)
183
+ response = completion.choices[0].message.parsed
184
+
185
+ # Set learning objective ID and source reference
186
+ response.id = learning_objective.id
187
+ response.learning_objective_id = learning_objective.id
188
+ response.learning_objective = learning_objective.learning_objective
189
+ response.source_reference = learning_objective.source_reference
190
+
191
+ # Verify all options have feedback
192
+ for i, option in enumerate(response.options):
193
+ if not option.feedback or option.feedback.strip() == "":
194
+ if option.is_correct:
195
+ option.feedback = "Good job! This is the correct answer."
196
+ else:
197
+ option.feedback = f"This answer is incorrect. Please review the material again."
198
+ return response
199
+
200
+ except Exception as e:
201
+ print(f"Error generating question: {e}")
202
+ # Create a fallback question
203
+ options = [
204
+ MultipleChoiceOption(
205
+ option_text=f"Option {chr(65+i)}",
206
+ is_correct=(i==0),
207
+ feedback=f"{'Correct' if i==0 else 'Incorrect'} answer."
208
+ ) for i in range(4)
209
+ ]
210
+ return MultipleChoiceQuestion(
211
+ id=learning_objective.id,
212
+ question_text=f"Question for learning objective: {learning_objective.learning_objective}",
213
+ options=options,
214
+ learning_objective_id=learning_objective.id,
215
+ learning_objective=learning_objective.learning_objective,
216
+ source_reference=learning_objective.source_reference,
217
+ )
quiz_generator/question_improvement.py ADDED
@@ -0,0 +1,578 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Tuple, Optional
2
+ from openai import OpenAI
3
+ from models import MultipleChoiceQuestion, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
4
+ from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
5
+ from prompts.questions import GENERAL_QUALITY_STANDARDS, MULTIPLE_CHOICE_STANDARDS, EXAMPLE_QUESTIONS
6
+ import json
7
+ import os
8
+
9
+
10
+ def _get_run_manager():
11
+ """Get run manager if available, otherwise return None."""
12
+ try:
13
+ from ui.run_manager import get_run_manager
14
+ return get_run_manager()
15
+ except:
16
+ return None
17
+
18
+ def should_regenerate_incorrect_answers(client: OpenAI,model: str, temperature: float, question: MultipleChoiceQuestion, file_contents: List[str], model_name: str = "gpt-5-mini") -> bool:
19
+ """
20
+ Check if a question needs regeneration of incorrect answer options using a lightweight model.
21
+
22
+ Args:
23
+ question: Question to check
24
+ file_contents: List of file contents with source tags
25
+ model_name: Model to use for checking (default: gpt-5-mini)
26
+
27
+ Returns:
28
+ Boolean indicating whether the question needs regeneration
29
+ """
30
+ print(f"Checking if question ID {question.id} needs incorrect answer regeneration using {model_name}")
31
+
32
+ # Format the question for display in the prompt
33
+ question_display = (
34
+ f"ID: {question.id}\n"
35
+ f"Question: {question.question_text}\n"
36
+ f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in question.options])}\n"
37
+ f"Learning Objective: {question.learning_objective}\n"
38
+ f"Learning Objective ID: {question.learning_objective_id}\n"
39
+ f"Correct Answer: {question.correct_answer}\n"
40
+ f"Source Reference: {question.source_reference}"
41
+ )
42
+
43
+ # Extract relevant content based on source references (simplified version)
44
+ combined_content = ""
45
+ if question.source_reference:
46
+ source_references = question.source_reference if isinstance(question.source_reference, list) else [question.source_reference]
47
+
48
+ for source_file in source_references:
49
+ for file_content in file_contents:
50
+ if f"<source file='{source_file}'>" in file_content:
51
+ if combined_content:
52
+ combined_content += "\n\n"
53
+ combined_content += file_content
54
+ break
55
+
56
+ # If no content found, use a sample of all content
57
+ if not combined_content:
58
+ combined_content = "\n\n".join(file_contents) # Just use first two content files for efficiency
59
+
60
+ # Create a simplified prompt focused just on checking if regeneration is needed
61
+ check_prompt = f"""
62
+ Below is a multiple choice question. Evaluate ONLY the INCORRECT answer options against the below guidelines. Respond
63
+ only TRUE OR FALSE if it needs regeneration
64
+
65
+ {question_display}
66
+
67
+ Consider the course content to help you make informed decisions:
68
+
69
+ COURSE CONTENT:
70
+ {combined_content}
71
+
72
+
73
+
74
+ Here are some examples of high quality incorrect answer suggestions which you should follow:
75
+ <incorrect_answer_examples>
76
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
77
+ </incorrect_answer_examples>
78
+
79
+ Refer to the correct answer <correct_answer>{question.correct_answer}</correct_answer>.
80
+ Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
81
+ Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence long
82
+
83
+
84
+ {IMMEDIATE_RED_FLAGS}
85
+
86
+ """
87
+
88
+
89
+ # Call OpenAI API with the lightweight model
90
+ try:
91
+ params = {
92
+ "model": "gpt-5-mini",
93
+ "messages": [
94
+ {"role": "system", "content": "You are an expert educational assessment evaluator. You determine if incorrect answer options meet quality standards."},
95
+ {"role": "user", "content": check_prompt}
96
+ ],
97
+ #"temperature": 0.7
98
+ }
99
+
100
+ completion = client.chat.completions.create(**params)
101
+ response_text = completion.choices[0].message.content.strip().lower()
102
+
103
+ print(f"Checking response text output: {response_text}")
104
+
105
+ # Check if regeneration is needed
106
+ needs_regeneration = "true" in response_text
107
+ #needs_regeneration = True
108
+ print(f"Question ID {question.id} needs regeneration: {needs_regeneration} ({response_text})")
109
+ return needs_regeneration
110
+
111
+ except Exception as e:
112
+ print(f"Error checking regeneration need for question ID {question.id}: {str(e)}")
113
+ # If there's an error, assume regeneration is needed to be safe
114
+ return False
115
+
116
+
117
+ def regenerate_incorrect_answers(client: OpenAI, model: str, temperature: float, questions: List[MultipleChoiceQuestion], file_contents: List[str]) -> List[MultipleChoiceQuestion]:
118
+ """
119
+ Regenerate incorrect answer options for questions.
120
+
121
+ Args:
122
+ questions: List of questions to improve
123
+ file_contents: List of file contents with source tags
124
+
125
+ Returns:
126
+ The same list of questions with improved incorrect answer options
127
+ """
128
+ print(f"Regenerating incorrect answer options for {len(questions)} questions")
129
+
130
+ for i, question in enumerate(questions):
131
+ # Check if this question needs regeneration
132
+ # if not self.should_regenerate_incorrect_answers(question, file_contents):
133
+ # print(f"Question ID {question.id} does not need regeneration. Skipping.")
134
+ # continue
135
+
136
+ # Extract relevant content based on source references
137
+ combined_content = ""
138
+ if question.source_reference:
139
+ source_references = question.source_reference if isinstance(question.source_reference, list) else [question.source_reference]
140
+
141
+ for source_file in source_references:
142
+ source_found = False
143
+ for file_content in file_contents:
144
+ # Look for the XML source tag with the matching filename
145
+ if f"<source file='{source_file}'>" in file_content:
146
+ print(f"Found matching source content for {source_file}")
147
+ if combined_content:
148
+ combined_content += "\n\n"
149
+ combined_content += file_content
150
+ source_found = True
151
+ break
152
+
153
+ # If no exact match found, try a more flexible match
154
+ if not source_found:
155
+ print(f"No exact match for {source_file}, looking for partial matches")
156
+ for file_content in file_contents:
157
+ if source_file in file_content:
158
+ print(f"Found partial match for {source_file}")
159
+ if combined_content:
160
+ combined_content += "\n\n"
161
+ combined_content += file_content
162
+ source_found = True
163
+ break
164
+
165
+ # If still no matching content, use all file contents combined
166
+ if not combined_content:
167
+ print(f"No content found for any source files, using all content")
168
+ combined_content = "\n\n".join(file_contents)
169
+ else:
170
+ # If no source references, use all content
171
+ combined_content = "\n\n".join(file_contents)
172
+
173
+ # Find the correct option
174
+ correct_option = None
175
+ for opt in question.options:
176
+ if opt.is_correct:
177
+ correct_option = opt
178
+ break
179
+
180
+ if not correct_option:
181
+ print(f"Warning: No correct option found in question ID {question.id}. Skipping.")
182
+ continue
183
+
184
+ # Process each incorrect option individually
185
+ updated_options = [correct_option] # Start with the correct option
186
+ options_regenerated = 0
187
+
188
+ for opt in question.options:
189
+ if opt.is_correct:
190
+ continue # Skip the correct option, already added
191
+
192
+ # Check if this specific option needs regeneration
193
+ needs_regeneration, reason = should_regenerate_individual_option(client, model, temperature, question, opt, combined_content)
194
+ if needs_regeneration:
195
+ # Regenerate this specific option
196
+ print(f"Regenerating option '{opt.option_text}' for question ID {question.id}")
197
+ new_option = regenerate_individual_option(client, model, temperature, question, opt, correct_option, combined_content, reason)
198
+ if new_option:
199
+ updated_options.append(new_option)
200
+ options_regenerated += 1
201
+ else:
202
+ # If regeneration failed, keep the original
203
+ updated_options.append(opt)
204
+ else:
205
+ # Option doesn't need regeneration, keep as is
206
+ print(f"Option '{opt.option_text}' for question ID {question.id} does not need regeneration")
207
+ updated_options.append(opt)
208
+
209
+ # Update the question with the new options
210
+ questions[i].options = updated_options
211
+ print(f"Regenerated {options_regenerated} options for question ID {question.id}")
212
+
213
+ return questions
214
+
215
+ def should_regenerate_individual_option(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion, option: MultipleChoiceOption, content: str) -> Tuple[bool, str]:
216
+ """
217
+ Check if a specific incorrect option needs regeneration.
218
+
219
+ Args:
220
+ question: The question containing the option
221
+ option: The specific option to check
222
+ content: The relevant content for context
223
+
224
+ Returns:
225
+ Tuple of (Boolean indicating whether the option needs regeneration, Reason for the decision)
226
+ """
227
+ print(f"Checking if option '{option.option_text}' needs regeneration")
228
+
229
+ # Format the question and option for display
230
+ question_display = (
231
+ f"Question: {question.question_text}\n"
232
+ f"Learning Objective: {question.learning_objective}\n"
233
+ f"Correct Answer: {question.correct_answer}\n"
234
+ )
235
+
236
+ option_display = (
237
+ f"Option Text: {option.option_text}\n"
238
+ f"Feedback: {option.feedback}\n"
239
+ )
240
+
241
+ # Create a simplified prompt focused just on checking this option
242
+ check_prompt = f"""
243
+ Below is a multiple choice question and ONE incorrect answer option. Evaluate ONLY THIS OPTION against the quality guidelines.
244
+
245
+ {question_display}
246
+
247
+ INCORRECT OPTION TO EVALUATE:
248
+ {option_display}
249
+
250
+ Consider the course content to help you make informed decisions:
251
+
252
+ COURSE CONTENT:
253
+ {content}
254
+
255
+
256
+
257
+
258
+
259
+
260
+ Here are some examples of high quality incorrect answer suggestions which you should follow:
261
+ <incorrect_answer_examples>
262
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
263
+ </incorrect_answer_examples>
264
+
265
+ Refer to the correct answer <correct_answer>{question.correct_answer}</correct_answer>.
266
+ Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
267
+ Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence long
268
+
269
+
270
+
271
+
272
+ {IMMEDIATE_RED_FLAGS}
273
+
274
+ TASK: Determine if this specific incorrect option needs improvement based on the guidelines.
275
+ Answer with ONLY "true" if improvements are needed or "false" if no improvements are needed. Follow this by a one sentence explanation of why it needs regeneration.
276
+ """
277
+
278
+ # Call OpenAI API with the lightweight model
279
+ try:
280
+ params = {
281
+ "model": "gpt-5-mini",
282
+ "messages": [
283
+ {"role": "system", "content": "You are an expert educational assessment evaluator. You determine if incorrect answer options meet quality standards."},
284
+ {"role": "user", "content": check_prompt}
285
+ ]
286
+ }
287
+
288
+ completion = client.chat.completions.create(**params)
289
+ response_text = completion.choices[0].message.content.strip().lower()
290
+
291
+ print(f"Checking option response: {response_text}")
292
+
293
+ # Check if regeneration is needed
294
+ needs_regeneration = "true" in response_text
295
+
296
+ # Extract reason if available (everything after true/false)
297
+ reason = "No specific reason provided"
298
+ if " " in response_text:
299
+ parts = response_text.split(" ", 1)
300
+ if len(parts) > 1:
301
+ reason = parts[1].strip()
302
+
303
+ print(f"Option '{option.option_text}' needs regeneration: {needs_regeneration}")
304
+ return needs_regeneration, reason
305
+
306
+ except Exception as e:
307
+ print(f"Error checking option regeneration need: {str(e)}")
308
+ # If there's an error, assume regeneration is not needed
309
+ return False, f"Error during evaluation: {str(e)}"
310
+
311
+ def regenerate_individual_option(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion, option: MultipleChoiceOption,
312
+ correct_option: MultipleChoiceOption, content: str, reason: str) -> Optional[MultipleChoiceOption]:
313
+ """
314
+ Regenerate a specific incorrect option.
315
+
316
+ Args:
317
+ question: The question containing the option
318
+ option: The specific option to regenerate
319
+ correct_option: The correct option for context
320
+ content: The relevant content for context
321
+ reason: Reason why the option needs regeneration
322
+
323
+ Returns:
324
+ A new MultipleChoiceOption or None if regeneration failed
325
+ """
326
+ print(f"Regenerating option '{option.option_text}'")
327
+
328
+
329
+
330
+
331
+
332
+ # Format the question and options for display
333
+ question_display = (
334
+ f"Question: {question.question_text}\n"
335
+ f"Learning Objective: {question.learning_objective}\n"
336
+ f"Correct Answer: {question.correct_answer}\n"
337
+ f"Correct Option: {correct_option.option_text}\n"
338
+ f"Correct Option Feedback: {correct_option.feedback}\n"
339
+ )
340
+
341
+ option_display = (
342
+ f"Incorrect Option to Improve: {option.option_text}\n"
343
+ f"Current Feedback: {option.feedback}\n"
344
+ )
345
+
346
+ # Create a prompt focused on regenerating this specific option
347
+ regeneration_prompt = f"""
348
+ Below is a multiple choice question with the CORRECT option and ONE INCORRECT option that needs improvement.
349
+
350
+ {question_display}
351
+
352
+ INCORRECT OPTION TO IMPROVE:
353
+ {option_display}
354
+
355
+ Consider the course content to help you make informed decisions:
356
+
357
+ COURSE CONTENT:
358
+ {content}
359
+
360
+
361
+
362
+ Here are some examples of high quality incorrect answer suggestions which you should follow:
363
+ <incorrect_answer_examples>
364
+ {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
365
+ </incorrect_answer_examples>
366
+
367
+ Consider also the quality standards for writing options and feedback:
368
+
369
+ <General Quality Standards>
370
+ {GENERAL_QUALITY_STANDARDS}
371
+ </General Quality Standards>
372
+ These are some examples of questions and their answer options along with their feedback which you should follow:
373
+ <Example Questions>
374
+ {EXAMPLE_QUESTIONS}
375
+ </Example Questions>
376
+
377
+ <Additional Guidelines>
378
+ - be in the language and tone of the course.
379
+ - be at a similar level of difficulty or complexity as encountered in the course.
380
+ - assess only information from the course and not depend on information that was
381
+ not covered in the course.
382
+ - not attempt to teach something as part of the quiz.
383
+ - use clear and concise language
384
+ - not induce confusion
385
+ - provide a slight (not major) challenge.
386
+ - be easily interpreted and unambiguous.
387
+ - be well written in clear and concise language, proper grammar, good sentence
388
+ structure, and consistent formatting
389
+ - be thoughtful and specific rather than broad and ambiguous
390
+ - be complete in its wording such that understanding the question is not part
391
+ of the assessment
392
+
393
+ Incorrect answer feedback should:
394
+ - be informational and encouraging, not punitive.
395
+ - be a single sentence, concise and to the point.
396
+ - Do not say "Incorrect" or "Wrong".
397
+
398
+ </Additional Guidelines>
399
+
400
+ {IMMEDIATE_RED_FLAGS}
401
+
402
+ Return ONLY the improved incorrect option and its feedback in this exact JSON format:
403
+ {{"option_text": "Your improved incorrect option text here", "feedback": "Your improved feedback explaining why this option is incorrect"}}
404
+
405
+ The option_text must be factually incorrect but plausible, and the feedback must explain why it's incorrect.
406
+ """
407
+
408
+ # Call OpenAI API
409
+ try:
410
+
411
+
412
+ params = {
413
+ "model": model,
414
+ "messages": [
415
+ {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
416
+ {"role": "user", "content": regeneration_prompt}
417
+ ],
418
+ "response_format": {"type": "json_object"}
419
+ }
420
+
421
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
422
+ params["temperature"] = temperature
423
+
424
+ completion = client.chat.completions.create(**params)
425
+ response_text = completion.choices[0].message.content
426
+
427
+ # Parse the JSON response
428
+ try:
429
+ response_data = json.loads(response_text)
430
+ new_option_text = response_data.get("option_text", "")
431
+ new_feedback = response_data.get("feedback", "")
432
+
433
+ if not new_option_text or not new_feedback:
434
+ print(f"Error: Missing option_text or feedback in response")
435
+ return None
436
+
437
+ # Create a new option with the regenerated text and feedback
438
+ new_option = MultipleChoiceOption(
439
+ option_text=new_option_text,
440
+ is_correct=False,
441
+ feedback=new_feedback
442
+ )
443
+
444
+ #print(f"Successfully regenerated option: '{new_option_text}'")
445
+
446
+ # Log the regeneration for debugging
447
+ option_index = next((i for i, opt in enumerate(question.options) if opt.option_text == option.option_text), -1)
448
+
449
+
450
+ debug_dir = os.path.join("wrong_answer_debug")
451
+ os.makedirs(debug_dir, exist_ok=True)
452
+
453
+ # Create a log file for this question
454
+ log_file = os.path.join(debug_dir, f"question_{question.id}_option_{option_index}.txt")
455
+
456
+ # Format the log message
457
+ log_message = f"""
458
+ Question ID: {question.id}
459
+ Question: {question.question_text}
460
+
461
+ REASON FOR REGENERATION:
462
+ {reason}
463
+
464
+ BEFORE:
465
+ Option Text: {option.option_text}
466
+ Feedback: {option.feedback}
467
+
468
+ AFTER:
469
+ Option Text: {new_option.option_text}
470
+ Feedback: {new_option.feedback}
471
+ """
472
+
473
+ # Write to the log file
474
+ with open(log_file, "w") as f:
475
+ f.write(log_message)
476
+
477
+ # Also print to console
478
+ print(f"\n--- Regenerated Option for Question {question.id}, Option {option_index} ---")
479
+ print(f"BEFORE: {option.option_text}")
480
+ print(f"AFTER: {new_option.option_text}")
481
+ print(f"Log saved to {log_file}")
482
+
483
+ return new_option
484
+
485
+ except json.JSONDecodeError as e:
486
+ print(f"Error parsing JSON response: {str(e)}")
487
+ print(f"Raw response: {response_text}")
488
+ return None
489
+
490
+ except Exception as e:
491
+ print(f"Error regenerating option: {str(e)}")
492
+ return None
493
+
494
+ def judge_question_quality(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion) -> Tuple[bool, str, float]:
495
+ """
496
+ Judge the quality of a question based on quality standards.
497
+
498
+ Args:
499
+ question: Question to judge
500
+
501
+ Returns:
502
+ Tuple of (approved, feedback, score)
503
+ """
504
+ run_manager = _get_run_manager()
505
+
506
+ # Create the prompt
507
+ prompt = f"""
508
+ Evaluate the quality of the following multiple choice question based on the provided quality standards.
509
+
510
+ Question: {question.question_text}
511
+
512
+ Options:
513
+ {json.dumps([{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in question.options], indent=2)}
514
+
515
+ Learning Objective: The question is testing the following learning objective:
516
+ {question.learning_objective}
517
+
518
+ Quality Standards to evaluate against:
519
+ 1. Alignment: The question should align with the learning objective and test understanding of course content.
520
+ 2. Clarity: The question should be clear, unambiguous, and well-written.
521
+ 3. Difficulty: The question should be challenging but fair for someone who has studied the material.
522
+ 4. Options: The options should be plausible, with one clearly correct answer.
523
+ 5. Feedback: Each option should have appropriate feedback that explains why it is correct or incorrect.
524
+
525
+ Provide:
526
+ 1. Detailed feedback on the question's strengths and weaknesses. Two or three sentences
527
+ 2. A final decision on whether to approve the question (true/false)
528
+
529
+ Format your response as a JSON object with the following fields:
530
+ {{
531
+ "feedback": string,
532
+ "approved": boolean
533
+ }}
534
+ """
535
+
536
+ # Generate judgment
537
+ # Different parameter handling for different model families
538
+ params = {
539
+ "model": model,
540
+ "messages": [
541
+ {"role": "system", "content": "You are an expert educational assessment evaluator."},
542
+ {"role": "user", "content": prompt}
543
+ ]
544
+ }
545
+
546
+ # Add temperature parameter only if not using o-series models
547
+ if not TEMPERATURE_UNAVAILABLE.get(model, True):
548
+ params["temperature"] = temperature
549
+
550
+ response = client.chat.completions.create(**params)
551
+
552
+ # Parse the response
553
+ try:
554
+ # Get the raw response content
555
+ raw_content = response.choices[0].message.content
556
+ if run_manager:
557
+ # Log full response in DEBUG for detailed tracking
558
+ run_manager.log(f"DEBUG - Raw judge response: {raw_content}", level="DEBUG")
559
+
560
+ # Try to extract JSON from the response if it's not pure JSON
561
+ # Sometimes the model includes explanatory text before or after the JSON
562
+ import re
563
+ json_match = re.search(r'\{[\s\S]*\}', raw_content)
564
+
565
+ if json_match:
566
+ json_str = json_match.group(0)
567
+ result = json.loads(json_str)
568
+ else:
569
+ # If no JSON pattern found, try the raw content
570
+ result = json.loads(raw_content)
571
+
572
+ return result["approved"], result["feedback"]
573
+ except Exception as e:
574
+ if run_manager:
575
+ run_manager.log(f"Error parsing judge response: {e}", level="ERROR")
576
+ run_manager.log(f"Raw response content: {response.choices[0].message.content[:200]}...", level="DEBUG")
577
+ # Return default values if parsing fails
578
+ return True, "Question meets basic quality standards"
quiz_generator/question_ranking.py ADDED
@@ -0,0 +1,474 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from openai import OpenAI
3
+ from models import MultipleChoiceQuestion, GroupedMultipleChoiceQuestion, RankedMultipleChoiceQuestion, RankedMultipleChoiceQuestionsResponse, GroupedMultipleChoiceQuestionsResponse
4
+ from prompts.questions import RANK_QUESTIONS_PROMPT, GROUP_QUESTIONS_PROMPT
5
+ import json
6
+
7
+ def rank_questions(client: OpenAI, model: str, temperature: float, questions: List[GroupedMultipleChoiceQuestion], file_contents: List[str]) -> dict:
8
+ """
9
+ Rank multiple choice questions based on quality criteria.
10
+
11
+ Args:
12
+ questions: List of questions to rank
13
+ file_contents: List of file contents with source tags
14
+
15
+ Returns:
16
+ List of ranked questions with ranking explanations
17
+ """
18
+ try:
19
+ print(f"Ranking {len(questions)} questions")
20
+ # Separate out the ID=1 question (if present)
21
+ lo_one_question = None
22
+ questions_to_rank = []
23
+
24
+ for q in questions:
25
+ if q.learning_objective_id == 1:
26
+ lo_one_question = q
27
+ else:
28
+ questions_to_rank.append(q)
29
+
30
+ if not questions_to_rank:
31
+ return {"ranked": questions} # Nothing to rank
32
+
33
+ # Format questions for display in the prompt
34
+ questions_display = "\n\n".join([
35
+ f"ID: {q.id}\n"
36
+ f"Question: {q.question_text}\n"
37
+ f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in q.options])}\n"
38
+ f"Learning Objective: {q.learning_objective}\n"
39
+ f"Learning Objective ID: {q.learning_objective_id}\n"
40
+ f"Correct Answer: {q.correct_answer}\n"
41
+ f"Source Reference: {q.source_reference}\n"
42
+ f"Judge Feedback: {getattr(q, 'judge_feedback', 'N/A')}\n"
43
+ f"Approved: {getattr(q, 'approved', 'N/A')}\n"
44
+ f"Group Members: {q.group_members}\n"
45
+ f"In Group: {q.in_group}\n"
46
+ f"Best in Group: {q.best_in_group}\n"
47
+ for q in questions_to_rank
48
+ ])
49
+
50
+ # Extract all unique source references from questions
51
+ all_source_references = set()
52
+ for q in questions:
53
+ if isinstance(q.source_reference, list):
54
+ all_source_references.update(q.source_reference)
55
+ else:
56
+ all_source_references.add(q.source_reference)
57
+
58
+ # Combine content from all source references
59
+ combined_content = ""
60
+ for source_file in all_source_references:
61
+ source_found = False
62
+ for file_content in file_contents:
63
+ # Look for the XML source tag with the matching filename
64
+ if f"<source file='{source_file}'>" in file_content:
65
+ print(f"Found matching source content for {source_file}")
66
+ if combined_content:
67
+ combined_content += "\n\n"
68
+ combined_content += file_content
69
+ source_found = True
70
+ break
71
+
72
+ # If no exact match found, try a more flexible match
73
+ if not source_found:
74
+ print(f"No exact match for {source_file}, looking for partial matches")
75
+ for file_content in file_contents:
76
+ if source_file in file_content:
77
+ print(f"Found partial match for {source_file}")
78
+ if combined_content:
79
+ combined_content += "\n\n"
80
+ combined_content += file_content
81
+ source_found = True
82
+ break
83
+
84
+ # If still no matching content, use all file contents combined
85
+ if not combined_content:
86
+ print(f"No content found for any source files, using all content")
87
+ combined_content = "\n\n".join(file_contents)
88
+
89
+
90
+
91
+ # Create ranking prompt
92
+ ranking_prompt = f"""
93
+
94
+ {RANK_QUESTIONS_PROMPT}
95
+
96
+ Consider the questions' relevance with respect to the course content as well:
97
+
98
+ <course_content>
99
+ {combined_content}
100
+ </course_content>
101
+
102
+ Here are the questions to rank:
103
+
104
+ <questions>
105
+ {questions_display}
106
+ </questions>
107
+
108
+ For each question:
109
+ 1. Assign a rank (1 = best, 2 = second best, etc.)
110
+ 2. Provide a brief explanation for the ranking (2-3 sentences)
111
+ """
112
+ # Count tokens in the prompt
113
+ # try:
114
+ # encoding = tiktoken.get_encoding("cl100k_base")
115
+ # token_count = len(encoding.encode(ranking_prompt))
116
+ # print(f"DEBUG - Ranking prompt token count: {token_count}")
117
+
118
+ # estimated_output_tokens = len(questions_to_rank) * 250 # ~250 tokens per question in output
119
+ # print(f"DEBUG - Estimated output tokens: {estimated_output_tokens}")
120
+ # except ImportError:
121
+ # print("DEBUG - Tiktoken not installed, cannot count tokens")
122
+ # except Exception as e:
123
+ # print(f"DEBUG - Error counting tokens: {str(e)}")
124
+ # # Create a simple list of dictionaries for the response
125
+ # class RankingItem(BaseModel):
126
+ # id: int
127
+ # rank: int
128
+ # ranking_reasoning: str
129
+
130
+ # Call OpenAI API
131
+ print(f"DEBUG - Using model {model} for question ranking with temperature {temperature}")
132
+ print(f"DEBUG - Sending {len(questions)} questions to rank")
133
+ print(f"DEBUG - Question IDs being sent: {[q.id for q in questions]}")
134
+
135
+
136
+ system_prompt = "You are an expert educational content evaluator"
137
+ params = {
138
+ #"model": self.model,
139
+ "model": model,
140
+ "messages": [
141
+ {"role": "system", "content": system_prompt},
142
+ {"role": "user", "content": ranking_prompt}
143
+ ],
144
+ "response_format": RankedMultipleChoiceQuestionsResponse
145
+ }
146
+
147
+ # if not is_o_series_model:
148
+ # params["temperature"] = self.temperature
149
+
150
+ print(f"DEBUG - Making API call to rank questions")
151
+ completion = client.beta.chat.completions.parse(**params)
152
+ ranking_results = completion.choices[0].message.parsed.ranked_questions
153
+ print(f"DEBUG - API call successful")
154
+ print(f"Received {len(ranking_results)} ranking results")
155
+ print(f"DEBUG - Question IDs received in ranking: {[q.id for q in ranking_results]}")
156
+ # Check for missing questions
157
+ sent_ids = set(q.id for q in questions_to_rank)
158
+ received_ids = set(q.id for q in ranking_results)
159
+ missing_ids = sent_ids - received_ids
160
+ if missing_ids:
161
+ print(f"DEBUG - Missing questions with IDs: {missing_ids}")
162
+ # Always keep ID=1 as the first question if present
163
+ final_questions = []
164
+ if lo_one_question:
165
+ # Convert to RankedMultipleChoiceQuestion for consistency
166
+ lo_one_ranked = RankedMultipleChoiceQuestion(
167
+ **lo_one_question.model_dump(),
168
+ rank=1,
169
+ ranking_reasoning="First question, always rank 1"
170
+ )
171
+ final_questions.append(lo_one_ranked)
172
+
173
+ # Sort questions by their original rank and then reassign ranks sequentially
174
+ # If we have a learning_objective_id=1 question, start from rank 2, otherwise start from rank 1
175
+ start_rank = 2 if lo_one_question else 1
176
+ sorted_ranking_results = sorted(ranking_results, key=lambda x: x.rank)
177
+
178
+ # Assign sequential ranks in one go
179
+ for i, q in enumerate(sorted_ranking_results):
180
+ q.rank = i + start_rank
181
+
182
+ final_questions.extend(sorted_ranking_results)
183
+ # Ensure all questions have grouping information
184
+ for q in final_questions:
185
+ if not hasattr(q, "in_group") or q.in_group is None:
186
+ q.in_group = False
187
+ if not hasattr(q, "group_members") or q.group_members is None:
188
+ q.group_members = [q.id]
189
+ if not hasattr(q, "best_in_group") or q.best_in_group is None:
190
+ q.best_in_group = q.id == 1 # ID=1 is always best in group
191
+
192
+
193
+
194
+ return {
195
+ "ranked": final_questions,
196
+ }
197
+
198
+ # # Sort by rank
199
+ # ranked_questions = sorted(ranking_results, key=lambda x: x.rank)
200
+
201
+ # return ranked_questions
202
+
203
+ except Exception as e:
204
+ print(f"Error ranking questions: {str(e)}")
205
+ # Return original questions with default ranking
206
+ return [
207
+ RankedMultipleChoiceQuestion(
208
+ **q.model_dump(),
209
+ rank=i+1,
210
+ ranking_reasoning="Ranking failed"
211
+ ) for i, q in enumerate(questions)
212
+ ]
213
+
214
+ def group_questions(client: OpenAI, model: str, temperature: float, questions: List[MultipleChoiceQuestion], file_contents: List[str]) -> dict:
215
+ """
216
+ Group multiple choice questions based on quality criteria.
217
+
218
+ Args:
219
+ questions: List of questions to group
220
+ file_contents: List of file contents with source tags
221
+
222
+ Returns:
223
+ List of ranked questions with ranking explanations
224
+ """
225
+ try:
226
+ print(f"Grouping {len(questions)} questions")
227
+ # Separate out the ID=1 question (if present)
228
+
229
+ if not questions:
230
+ return {"grouped": questions, "best_in_group": questions} # Nothing to group
231
+
232
+
233
+
234
+ # Find all questions with learning_objective_id=1
235
+ lo_one_questions = [q for q in questions if q.learning_objective_id == 1]
236
+ if lo_one_questions:
237
+ print(f"Found {len(lo_one_questions)} questions with learning_objective_id=1")
238
+
239
+ # lo_one_question = None
240
+ # questions_to_group = []
241
+
242
+ # for q in questions:
243
+ # if q.learning_objective_id == 1:
244
+ # lo_one_question = q
245
+ # else:
246
+ # questions_to_group.append(q)
247
+
248
+ # if not questions_to_group:
249
+ # return {"grouped": questions, "best_in_group": questions} # Nothing to rank
250
+
251
+ # Format questions for display in the prompt
252
+ questions_display = "\n\n".join([
253
+ f"ID: {q.id}\n"
254
+ f"Question: {q.question_text}\n"
255
+ f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in q.options])}\n"
256
+ f"Learning Objective: {q.learning_objective}\n"
257
+ f"Learning Objective ID: {q.learning_objective_id}\n"
258
+ f"Correct Answer: {q.correct_answer}\n"
259
+ f"Source Reference: {q.source_reference}\n"
260
+ f"Judge Feedback: {getattr(q, 'judge_feedback', 'N/A')}\n"
261
+ f"Approved: {getattr(q, 'approved', 'N/A')}\n"
262
+ for q in questions
263
+ ])
264
+
265
+ # Extract all unique source references from questions
266
+ all_source_references = set()
267
+ for q in questions:
268
+ if isinstance(q.source_reference, list):
269
+ all_source_references.update(q.source_reference)
270
+ else:
271
+ all_source_references.add(q.source_reference)
272
+
273
+ # Combine content from all source references
274
+ combined_content = ""
275
+ for source_file in all_source_references:
276
+ source_found = False
277
+ for file_content in file_contents:
278
+ # Look for the XML source tag with the matching filename
279
+ if f"<source file='{source_file}'>" in file_content:
280
+ print(f"Found matching source content for {source_file}")
281
+ if combined_content:
282
+ combined_content += "\n\n"
283
+ combined_content += file_content
284
+ source_found = True
285
+ break
286
+
287
+ # If no exact match found, try a more flexible match
288
+ if not source_found:
289
+ print(f"No exact match for {source_file}, looking for partial matches")
290
+ for file_content in file_contents:
291
+ if source_file in file_content:
292
+ print(f"Found partial match for {source_file}")
293
+ if combined_content:
294
+ combined_content += "\n\n"
295
+ combined_content += file_content
296
+ source_found = True
297
+ break
298
+
299
+ # If still no matching content, use all file contents combined
300
+ if not combined_content:
301
+ print(f"No content found for any source files, using all content")
302
+ combined_content = "\n\n".join(file_contents)
303
+
304
+
305
+
306
+ # Create ranking prompt
307
+ grouping_prompt = f"""
308
+
309
+ {GROUP_QUESTIONS_PROMPT}
310
+
311
+ For grouping, consider the questions' relevance with respect to the course content as well:
312
+
313
+ <course_content>
314
+ {combined_content}
315
+ </course_content>
316
+
317
+ Here are the questions to group:
318
+
319
+ <questions>
320
+ {questions_display}
321
+ </questions>
322
+
323
+ """
324
+ # # Count tokens in the prompt
325
+ # try:
326
+ # encoding = tiktoken.get_encoding("cl100k_base")
327
+ # token_count = len(encoding.encode(grouping_prompt))
328
+ # print(f"DEBUG - Grouping prompt token count: {token_count}")
329
+
330
+ # estimated_output_tokens = len(questions_to_rank) * 250 # ~250 tokens per question in output
331
+ # print(f"DEBUG - Estimated output tokens: {estimated_output_tokens}")
332
+ # except ImportError:
333
+ # print("DEBUG - Tiktoken not installed, cannot count tokens")
334
+ # except Exception as e:
335
+ # print(f"DEBUG - Error counting tokens: {str(e)}")
336
+ # # Create a simple list of dictionaries for the response
337
+ # class RankingItem(BaseModel):
338
+ # id: int
339
+ # rank: int
340
+ # ranking_reasoning: str
341
+
342
+ # Call OpenAI API
343
+ print(f"DEBUG - Using model {model} for question ranking with temperature {temperature}")
344
+ print(f"DEBUG - Sending {len(questions)} questions to group")
345
+ print(f"DEBUG - Question IDs being sent: {[q.id for q in questions]}")
346
+
347
+ system_prompt = "You are an expert educational content evaluator"
348
+ params = {
349
+ #"model": self.model,
350
+ "model": "gpt-5-mini",
351
+ "messages": [
352
+ {"role": "system", "content": system_prompt},
353
+ {"role": "user", "content": grouping_prompt}
354
+ ],
355
+ "response_format": GroupedMultipleChoiceQuestionsResponse
356
+ }
357
+
358
+ # if not is_o_series_model:
359
+ # params["temperature"] = self.temperature
360
+
361
+ print(f"DEBUG - Making API call to group questions")
362
+ completion = client.beta.chat.completions.parse(**params)
363
+ grouping_results = completion.choices[0].message.parsed.grouped_questions
364
+ print(f"DEBUG - API call successful")
365
+ print(f"Received {len(grouping_results)} grouping results")
366
+ print(f"DEBUG - Question IDs received in grouping: {[q.id for q in grouping_results]}")
367
+ # Check for missing questions
368
+ sent_ids = set(q.id for q in questions)
369
+ received_ids = set(q.id for q in grouping_results)
370
+ missing_ids = sent_ids - received_ids
371
+ if missing_ids:
372
+ print(f"DEBUG - Missing questions with IDs: {missing_ids}")
373
+ # Always keep ID=1 as the first question if present
374
+ final_questions = []
375
+ # if lo_one_question:
376
+ # # Convert to GroupedMultipleChoiceQuestion for consistency
377
+ # lo_one_grouped = GroupedMultipleChoiceQuestion(
378
+ # id=lo_one_question.id,
379
+ # question_text=lo_one_question.question_text,
380
+ # options=lo_one_question.options,
381
+ # learning_objective_id=lo_one_question.learning_objective_id,
382
+ # learning_objective=lo_one_question.learning_objective,
383
+ # correct_answer=lo_one_question.correct_answer,
384
+ # source_reference=lo_one_question.source_reference,
385
+ # judge_feedback=getattr(lo_one_question, "judge_feedback", None),
386
+ # approved=getattr(lo_one_question, "approved", None),
387
+ # #rank=1,
388
+ # #ranking_reasoning="First question, always rank 1",
389
+ # in_group=False,
390
+ # group_members=[lo_one_question.id],
391
+ # best_in_group=True
392
+ # )
393
+ # final_questions.append(lo_one_grouped)
394
+
395
+ # Add the rest of the questions in their ranked order
396
+ #sorted_grouping_results = sorted(grouping_results, key=lambda x: x.rank)
397
+
398
+ # Normalize best_in_group to Python bool
399
+ for q in grouping_results:
400
+ val = getattr(q, "best_in_group", False)
401
+ if isinstance(val, str):
402
+ q.best_in_group = val.lower() == "true"
403
+ elif isinstance(val, (bool, int)):
404
+ q.best_in_group = bool(val)
405
+ else:
406
+ q.best_in_group = False
407
+
408
+ # if lo_one_question:
409
+ # final_questions[0].best_in_group = True
410
+
411
+ final_questions.extend(grouping_results)
412
+
413
+ # # Filter for best-in-group questions (including id==1 always)
414
+ # best_in_group_questions = [q for q in final_questions if (q.learning_objective_id == 1 and getattr(q, "best_in_group", False) is True)
415
+ # or getattr(q, "best_in_group", False) is True]
416
+ best_in_group_questions = [q for q in final_questions if getattr(q, "best_in_group", False) is True]
417
+ # Check if any learning objective ID=1 question is already in best_in_group
418
+ lo_one_in_best = any(q.learning_objective_id == 1 for q in best_in_group_questions)
419
+
420
+ # If no learning objective ID=1 question is in best_in_group, add the best one
421
+ if not lo_one_in_best and lo_one_questions:
422
+ print(f"No learning objective ID=1 question in best_in_group, adding one")
423
+
424
+ # Find the best question for learning objective ID=1
425
+ # First, check if any are already in a group
426
+ lo_one_in_group = [q for q in lo_one_questions if getattr(q, "in_group", False)]
427
+
428
+ if lo_one_in_group:
429
+ # Use the first one that's already in a group
430
+ best_lo_one = lo_one_in_group[0]
431
+ else:
432
+ # Otherwise, use the first one
433
+ best_lo_one = lo_one_questions[0]
434
+
435
+ print(f"Selected question ID={best_lo_one.id} for learning objective ID=1")
436
+
437
+ # Mark it as best_in_group
438
+ best_lo_one.best_in_group = True
439
+
440
+ # Make sure it has the other grouping attributes
441
+ if not hasattr(best_lo_one, "in_group") or best_lo_one.in_group is None:
442
+ best_lo_one.in_group = True
443
+ if not hasattr(best_lo_one, "group_members") or best_lo_one.group_members is None:
444
+ best_lo_one.group_members = [best_lo_one.id]
445
+
446
+ # Add it to best_in_group_questions if not already there
447
+ if best_lo_one.id not in [q.id for q in best_in_group_questions]:
448
+ best_in_group_questions.append(best_lo_one)
449
+
450
+ # Update it in final_questions if it's already there
451
+ for i, q in enumerate(final_questions):
452
+ if q.id == best_lo_one.id:
453
+ final_questions[i] = best_lo_one
454
+ break
455
+
456
+
457
+
458
+ return {
459
+ "grouped": final_questions,
460
+ "best_in_group": best_in_group_questions
461
+ }
462
+
463
+ # # Sort by rank
464
+ # ranked_questions = sorted(ranking_results, key=lambda x: x.rank)
465
+
466
+ # return ranked_questions
467
+
468
+ except Exception as e:
469
+ print(f"Error ranking questions: {str(e)}")
470
+ # Return original questions with default ranking
471
+ return {
472
+ "grouped": questions,
473
+ "best_in_group": [q for q in questions if getattr(q, "best_in_group", False) is True]
474
+ }
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio
2
+ openai
3
+ pydantic
4
+ python-dotenv
5
+ nbformat
ui/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .app import create_ui
2
+
3
+ __all__ = ['create_ui']
ui/app.py ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from dotenv import load_dotenv
3
+ from .objective_handlers import process_files, regenerate_objectives, process_files_and_generate_questions
4
+ from .question_handlers import generate_questions
5
+ from .edit_handlers import load_quiz_for_editing, accept_and_next, go_previous, save_and_download
6
+ from .formatting import format_quiz_for_ui
7
+ from .run_manager import get_run_manager
8
+ from models import MODELS
9
+
10
+ # Load environment variables
11
+ load_dotenv()
12
+
13
+ # Set to False to disable saving output files (folders, logs, JSON, markdown).
14
+ # Tab 3 download of edited questions still works.
15
+ SAVE_OUTPUTS = False
16
+
17
+ def create_ui():
18
+ """Create the Gradio UI."""
19
+ get_run_manager().save_outputs = SAVE_OUTPUTS
20
+ with gr.Blocks(title="AI Course Assessment Generator") as app:
21
+ gr.Markdown("# AI Course Assessment Generator")
22
+ gr.Markdown("Upload course materials and generate learning objectives and quiz questions.")
23
+
24
+ with gr.Tab("Generate Learning Objectives"):
25
+ with gr.Row():
26
+ with gr.Column():
27
+ files_input = gr.File(
28
+ file_count="multiple",
29
+ label="Upload Course Materials (.vtt, .srt, .ipynb, .md)",
30
+ file_types=[".ipynb", ".vtt", ".srt", ".md"],
31
+ type="filepath"
32
+ )
33
+ num_objectives = gr.Slider(minimum=1, maximum=20, value=4, step=1, label="Number of Learning Objectives per Run")
34
+ num_runs = gr.Dropdown(
35
+ choices=["1", "2", "3", "4", "5"],
36
+ value="2",
37
+ label="Number of Generation Runs"
38
+ )
39
+ model_dropdown = gr.Dropdown(
40
+ choices=MODELS,
41
+ value="gpt-5.2",
42
+ label="Model"
43
+ )
44
+ incorrect_answer_model_dropdown = gr.Dropdown(
45
+ choices=MODELS,
46
+ value="gpt-5.2",
47
+ label="Model for Incorrect Answer Suggestions"
48
+ )
49
+ temperature_dropdown = gr.Dropdown(
50
+ choices=["0.0", "0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9", "1.0"],
51
+ value="1.0",
52
+ label="Temperature (0.0: Deterministic, 1.0: Creative)"
53
+ )
54
+ generate_button = gr.Button("Generate Learning Objectives")
55
+ generate_all_button = gr.Button("Generate all", variant="primary")
56
+
57
+ with gr.Column():
58
+ status_output = gr.Textbox(label="Status")
59
+ objectives_output = gr.Textbox(label="Best-in-Group Learning Objectives", lines=10)
60
+ grouped_output = gr.Textbox(label="All Grouped Learning Objectives", lines=10)
61
+ raw_ungrouped_output = gr.Textbox(label="Raw Ungrouped Learning Objectives (Debug)", lines=10)
62
+ feedback_input = gr.Textbox(label="Feedback on Learning Objectives")
63
+ regenerate_button = gr.Button("Regenerate Learning Objectives Based on Feedback")
64
+
65
+ with gr.Tab("Generate Questions"):
66
+ with gr.Row():
67
+ with gr.Column():
68
+ objectives_input = gr.Textbox(label="Learning Objectives JSON", lines=10, max_lines=10)
69
+ model_dropdown_q = gr.Dropdown(
70
+ choices=MODELS,
71
+ value="gpt-5.2",
72
+ label="Model"
73
+ )
74
+ temperature_dropdown_q = gr.Dropdown(
75
+ choices=["0.0", "0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9", "1.0"],
76
+ value="1.0",
77
+ label="Temperature (0.0: Deterministic, 1.0: Creative)"
78
+ )
79
+ num_questions_slider = gr.Slider(minimum=1, maximum=10, value=10, step=1, label="Number of questions")
80
+ num_runs_q = gr.Slider(minimum=1, maximum=5, value=2, step=1, label="Number of Question Generation Runs")
81
+ generate_q_button = gr.Button("Generate Questions")
82
+
83
+ with gr.Column():
84
+ status_q_output = gr.Textbox(label="Status")
85
+ best_questions_output = gr.Textbox(label="Ranked Best-in-Group Questions", lines=10)
86
+ all_questions_output = gr.Textbox(label="All Grouped Questions", lines=10)
87
+ formatted_quiz_output = gr.Textbox(label="Formatted Quiz", lines=15)
88
+
89
+ with gr.Tab("Propose/Edit Question"):
90
+ # State for editing flow
91
+ questions_state = gr.State([])
92
+ index_state = gr.State(0)
93
+ edited_state = gr.State([])
94
+
95
+ with gr.Row():
96
+ with gr.Column():
97
+ edit_status = gr.Textbox(label="Status", interactive=False)
98
+ edit_button = gr.Button("Edit questions", variant="primary")
99
+ question_editor = gr.Textbox(
100
+ label="Question",
101
+ lines=15,
102
+ interactive=True,
103
+ placeholder="Click 'Edit questions' to load the generated quiz."
104
+ )
105
+ with gr.Row():
106
+ prev_button = gr.Button("Previous")
107
+ next_button = gr.Button("Accept & Next", variant="primary")
108
+ download_button = gr.Button("Download edited quiz")
109
+ download_file = gr.File(label="Download", interactive=False)
110
+
111
+ # Set up event handlers
112
+ generate_button.click(
113
+ process_files,
114
+ inputs=[files_input, num_objectives, num_runs, model_dropdown, incorrect_answer_model_dropdown, temperature_dropdown],
115
+ outputs=[status_output, objectives_output, grouped_output, raw_ungrouped_output]
116
+ )
117
+
118
+ generate_all_button.click(
119
+ process_files_and_generate_questions,
120
+ inputs=[
121
+ files_input, num_objectives, num_runs, model_dropdown, incorrect_answer_model_dropdown, temperature_dropdown,
122
+ model_dropdown_q, temperature_dropdown_q, num_questions_slider, num_runs_q
123
+ ],
124
+ outputs=[
125
+ status_output, objectives_output, grouped_output, raw_ungrouped_output,
126
+ status_q_output, best_questions_output, all_questions_output, formatted_quiz_output
127
+ ]
128
+ )
129
+
130
+ regenerate_button.click(
131
+ regenerate_objectives,
132
+ inputs=[objectives_output, feedback_input, num_objectives, num_runs, model_dropdown, temperature_dropdown],
133
+ outputs=[status_output, objectives_output]
134
+ )
135
+
136
+ objectives_output.change(
137
+ lambda x: x,
138
+ inputs=[objectives_output],
139
+ outputs=[objectives_input]
140
+ )
141
+
142
+ generate_q_button.click(
143
+ generate_questions,
144
+ inputs=[objectives_input, model_dropdown_q, temperature_dropdown_q, num_questions_slider, num_runs_q],
145
+ outputs=[status_q_output, best_questions_output, all_questions_output, formatted_quiz_output]
146
+ )
147
+
148
+ best_questions_output.change(
149
+ format_quiz_for_ui,
150
+ inputs=[best_questions_output],
151
+ outputs=[formatted_quiz_output]
152
+ )
153
+
154
+ edit_button.click(
155
+ load_quiz_for_editing,
156
+ inputs=[formatted_quiz_output],
157
+ outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
158
+ )
159
+
160
+ next_button.click(
161
+ accept_and_next,
162
+ inputs=[question_editor, questions_state, index_state, edited_state],
163
+ outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
164
+ )
165
+
166
+ prev_button.click(
167
+ go_previous,
168
+ inputs=[question_editor, questions_state, index_state, edited_state],
169
+ outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
170
+ )
171
+
172
+ download_button.click(
173
+ save_and_download,
174
+ inputs=[question_editor, questions_state, index_state, edited_state],
175
+ outputs=[edit_status, download_file]
176
+ )
177
+
178
+ return app
179
+
180
+ if __name__ == "__main__":
181
+ app = create_ui()
182
+ app.launch()
ui/content_processor.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import nbformat
3
+ from typing import List, Dict, Any, Tuple
4
+
5
+
6
+ def _get_run_manager():
7
+ """Get run manager if available, otherwise return None."""
8
+ try:
9
+ from .run_manager import get_run_manager
10
+ return get_run_manager()
11
+ except:
12
+ return None
13
+
14
+
15
+ class ContentProcessor:
16
+ """Processes content from .vtt, .srt, .ipynb, and .md files."""
17
+
18
+ def __init__(self):
19
+ """Initialize the ContentProcessor."""
20
+ self.file_contents = []
21
+ self.run_manager = _get_run_manager()
22
+
23
+ def process_file(self, file_path: str) -> List[str]:
24
+ """
25
+ Process a file based on its extension and return the content.
26
+
27
+ Args:
28
+ file_path: Path to the file to process
29
+
30
+ Returns:
31
+ List containing the file content with source tags
32
+ """
33
+ _, ext = os.path.splitext(file_path)
34
+
35
+ if ext.lower() in ['.vtt', '.srt']:
36
+ return self._process_subtitle_file(file_path)
37
+ elif ext.lower() == '.ipynb':
38
+ return self._process_notebook_file(file_path)
39
+ elif ext.lower() == '.md':
40
+ return self._process_markdown_file(file_path)
41
+ else:
42
+ raise ValueError(f"Unsupported file type: {ext}")
43
+
44
+ def _process_subtitle_file(self, file_path: str) -> List[str]:
45
+ """Process a subtitle file (.vtt or .srt)."""
46
+ try:
47
+ filename = os.path.basename(file_path)
48
+ if self.run_manager:
49
+ self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
50
+
51
+ with open(file_path, 'r', encoding='utf-8') as f:
52
+ content = f.read()
53
+
54
+ # Simple processing for subtitle files
55
+ # Remove timestamp lines and other metadata
56
+ lines = content.split('\n')
57
+ text_content = []
58
+
59
+ for line in lines:
60
+ # Skip empty lines, timestamp lines, and subtitle numbers
61
+ if (line.strip() and
62
+ not line.strip().isdigit() and
63
+ not '-->' in line and
64
+ not line.strip().startswith('WEBVTT')):
65
+ text_content.append(line.strip())
66
+
67
+ # Combine all text into a single content string
68
+ combined_text = "\n".join(text_content)
69
+
70
+ # Add XML source tags at the beginning and end of the content
71
+ tagged_content = f"<source file='{filename}'>\n{combined_text}\n</source>"
72
+
73
+ return [tagged_content]
74
+
75
+ except Exception as e:
76
+ if self.run_manager:
77
+ self.run_manager.log(f"Error processing subtitle file {file_path}: {e}", level="ERROR")
78
+ return []
79
+
80
+ def _process_markdown_file(self, file_path: str) -> List[str]:
81
+ """Process a Markdown file (.md)."""
82
+ try:
83
+ filename = os.path.basename(file_path)
84
+ if self.run_manager:
85
+ self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
86
+
87
+ with open(file_path, 'r', encoding='utf-8') as f:
88
+ content = f.read()
89
+
90
+ # Add XML source tags at the beginning and end of the content
91
+ tagged_content = f"<source file='{filename}'>\n{content}\n</source>"
92
+
93
+ return [tagged_content]
94
+
95
+ except Exception as e:
96
+ if self.run_manager:
97
+ self.run_manager.log(f"Error processing markdown file {file_path}: {e}", level="ERROR")
98
+ return []
99
+
100
+ def _process_notebook_file(self, file_path: str) -> List[str]:
101
+ """Process a Jupyter notebook file (.ipynb)."""
102
+ try:
103
+ filename = os.path.basename(file_path)
104
+ if self.run_manager:
105
+ self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
106
+
107
+ # First check if the file is valid JSON
108
+ try:
109
+ with open(file_path, 'r', encoding='utf-8') as f:
110
+ import json
111
+ # Try to parse as JSON first
112
+ json.load(f)
113
+ except json.JSONDecodeError as json_err:
114
+ if self.run_manager:
115
+ self.run_manager.log(f"File {file_path} is not valid JSON: {json_err}", level="DEBUG")
116
+ # If it's not valid JSON, add it as plain text with a source tag
117
+ with open(file_path, 'r', encoding='utf-8') as f:
118
+ content = f.read()
119
+ tagged_content = f"<source file='{filename}'>\n```\n{content}\n```\n</source>"
120
+ return [tagged_content]
121
+
122
+ # If we get here, the file is valid JSON, try to parse as notebook
123
+ with open(file_path, 'r', encoding='utf-8') as f:
124
+ notebook = nbformat.read(f, as_version=4)
125
+
126
+ # Extract text from markdown and code cells
127
+ content_parts = []
128
+ for cell in notebook.cells:
129
+ if cell.cell_type == 'markdown':
130
+ content_parts.append(f"[Markdown]\n{cell.source}")
131
+ elif cell.cell_type == 'code':
132
+ content_parts.append(f"[Code]\n```python\n{cell.source}\n```")
133
+
134
+ # # Include output if present
135
+ # if hasattr(cell, 'outputs') and cell.outputs:
136
+ # for output in cell.outputs:
137
+ # if 'text' in output:
138
+ # content_parts.append(f"[Output]\n```\n{output.text}\n```")
139
+ # elif 'data' in output and 'text/plain' in output.data:
140
+ # content_parts.append(f"[Output]\n```\n{output.data['text/plain']}\n```")
141
+
142
+ # Combine all content into a single string
143
+ combined_content = "\n\n".join(content_parts)
144
+
145
+ # Add XML source tags at the beginning and end of the content
146
+ tagged_content = f"<source file='{filename}'>\n{combined_content}\n</source>"
147
+
148
+ return [tagged_content]
149
+
150
+ except Exception as e:
151
+ if self.run_manager:
152
+ self.run_manager.log(f"Error processing notebook file {file_path}: {e}", level="ERROR")
153
+ # Try to extract content as plain text if notebook parsing fails
154
+ try:
155
+ with open(file_path, 'r', encoding='utf-8') as f:
156
+ content = f.read()
157
+ tagged_content = f"<source file='{filename}'>\n```\n{content}\n```\n</source>"
158
+ return [tagged_content]
159
+ except Exception as read_err:
160
+ if self.run_manager:
161
+ self.run_manager.log(f"Could not read file as text either: {read_err}", level="ERROR")
162
+ return []
163
+
164
+ def process_files(self, file_paths: List[str]) -> List[str]:
165
+ """
166
+ Process multiple files and combine their content.
167
+
168
+ Args:
169
+ file_paths: List of paths to files to process
170
+
171
+ Returns:
172
+ List of file contents with source tags
173
+ """
174
+ all_file_contents = []
175
+
176
+ for file_path in file_paths:
177
+ file_content = self.process_file(file_path)
178
+ all_file_contents.extend(file_content)
179
+
180
+ # Store the processed file contents
181
+ self.file_contents = all_file_contents
182
+
183
+ # The entire content of each file is used as context
184
+ # Each file's content is wrapped in XML source tags
185
+ # This approach ensures that the LLM has access to the complete context
186
+ return all_file_contents
ui/edit_handlers.py ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import tempfile
3
+ import gradio as gr
4
+ from .run_manager import get_run_manager
5
+
6
+
7
+ def _next_button_label(index, total):
8
+ """Return 'Accept & Finish' for the last question, 'Accept & Next' otherwise."""
9
+ if total > 0 and index >= total - 1:
10
+ return gr.update(value="Accept & Finish")
11
+ return gr.update(value="Accept & Next")
12
+
13
+
14
+ def _parse_questions(md_content):
15
+ """Split formatted_quiz.md content into individual question blocks."""
16
+ parts = re.split(r'(?=\*\*Question \d+)', md_content.strip())
17
+ return [p.strip() for p in parts if p.strip()]
18
+
19
+
20
+ def _parse_question_block(block_text):
21
+ """Parse a single markdown question block into structured data."""
22
+ prompt = ""
23
+ options = []
24
+ current_option = None
25
+
26
+ for line in block_text.split('\n'):
27
+ stripped = line.strip()
28
+
29
+ # Question text line (colon may be inside or outside the bold markers)
30
+ q_match = re.match(r'\*\*Question \d+.*?\*\*:?\s*(.+)', stripped)
31
+ if q_match:
32
+ prompt = q_match.group(1).strip()
33
+ continue
34
+
35
+ # Skip ranking reasoning
36
+ if stripped.startswith('Ranking Reasoning:'):
37
+ continue
38
+
39
+ # Option line: • A [Correct]: text or • A: text
40
+ opt_match = re.match(r'•\s*[A-D]\s*(\[Correct\])?\s*:\s*(.+)', stripped)
41
+ if opt_match:
42
+ if current_option:
43
+ options.append(current_option)
44
+ current_option = {
45
+ 'answer': opt_match.group(2).strip(),
46
+ 'isCorrect': opt_match.group(1) is not None,
47
+ 'feedback': ''
48
+ }
49
+ continue
50
+
51
+ # Feedback line
52
+ fb_match = re.match(r'◦\s*Feedback:\s*(.+)', stripped)
53
+ if fb_match and current_option:
54
+ current_option['feedback'] = fb_match.group(1).strip()
55
+ continue
56
+
57
+ if current_option:
58
+ options.append(current_option)
59
+
60
+ return {'prompt': prompt, 'options': options}
61
+
62
+
63
+ def _generate_yml(questions_data):
64
+ """Generate YAML quiz format from parsed question data."""
65
+ lines = [
66
+ "name: Quiz 1",
67
+ "passingThreshold: 4",
68
+ "estimatedTimeSec: 600",
69
+ "maxTrialsPer24Hrs: 3",
70
+ "courseSlug: course_Slug",
71
+ "insertAfterConclusion: true",
72
+ "RandomQuestionPosition: true",
73
+ "questions:",
74
+ ]
75
+
76
+ for q in questions_data:
77
+ lines.append(" - typeName: multipleChoice")
78
+ lines.append(" points: 1")
79
+ lines.append(" shuffle: true")
80
+ lines.append(" prompt: |-")
81
+ for prompt_line in q['prompt'].split('\n'):
82
+ lines.append(f" {prompt_line}")
83
+ lines.append(" options:")
84
+ for opt in q['options']:
85
+ answer = opt['answer'].replace('"', '\\"')
86
+ is_correct = 'true' if opt['isCorrect'] else 'false'
87
+ lines.append(f' - answer: "{answer}"')
88
+ lines.append(f" isCorrect: {is_correct}")
89
+ lines.append(f" feedback: {opt['feedback']}")
90
+
91
+ return '\n'.join(lines) + '\n'
92
+
93
+
94
+ def load_quiz_for_editing(formatted_quiz_text=""):
95
+ """Load formatted quiz for editing. Tries disk first, falls back to UI text."""
96
+ run_manager = get_run_manager()
97
+ content = None
98
+
99
+ # Try loading from disk
100
+ quiz_path = run_manager.get_latest_formatted_quiz_path()
101
+ if quiz_path is not None:
102
+ with open(quiz_path, "r", encoding="utf-8") as f:
103
+ content = f.read()
104
+
105
+ # Fall back to the formatted quiz text from the UI
106
+ if not content and formatted_quiz_text:
107
+ content = formatted_quiz_text
108
+
109
+ if not content:
110
+ return (
111
+ "No formatted quiz found. Generate questions in the 'Generate Questions' tab first.",
112
+ "",
113
+ [],
114
+ 0,
115
+ [],
116
+ gr.update(),
117
+ )
118
+
119
+ questions = _parse_questions(content)
120
+ if not questions:
121
+ return "The quiz file is empty.", "", [], 0, [], gr.update()
122
+
123
+ status = f"Question 1 of {len(questions)}"
124
+ edited = list(questions) # start with originals
125
+ return status, questions[0], questions, 0, edited, _next_button_label(0, len(questions))
126
+
127
+
128
+ def accept_and_next(current_text, questions, index, edited):
129
+ """Save current edit and advance to the next question."""
130
+ if not questions:
131
+ return "No quiz loaded.", "", questions, index, edited, gr.update()
132
+
133
+ # Save the current edit
134
+ edited[index] = current_text
135
+
136
+ if index + 1 < len(questions):
137
+ new_index = index + 1
138
+ status = f"Question {new_index + 1} of {len(questions)}"
139
+ return status, edited[new_index], questions, new_index, edited, _next_button_label(new_index, len(questions))
140
+ else:
141
+ # All questions reviewed
142
+ return (
143
+ f"All {len(questions)} questions reviewed. Click 'Download edited quiz' to save.",
144
+ current_text,
145
+ questions,
146
+ index,
147
+ edited,
148
+ gr.update(value="Accept & Finish"),
149
+ )
150
+
151
+
152
+ def go_previous(current_text, questions, index, edited):
153
+ """Save current edit and go back to the previous question."""
154
+ if not questions:
155
+ return "No quiz loaded.", "", questions, index, edited, gr.update()
156
+
157
+ # Save the current edit before moving
158
+ edited[index] = current_text
159
+
160
+ if index > 0:
161
+ new_index = index - 1
162
+ status = f"Question {new_index + 1} of {len(questions)}"
163
+ return status, edited[new_index], questions, new_index, edited, _next_button_label(new_index, len(questions))
164
+ else:
165
+ return f"Question 1 of {len(questions)} (already at first question)", current_text, questions, index, edited, _next_button_label(index, len(questions))
166
+
167
+
168
+ def save_and_download(current_text, questions, index, edited):
169
+ """Join edited questions, save to output folder, and return files for download."""
170
+ if not edited:
171
+ return "No edited questions to save.", None
172
+
173
+ # Save the current edit in case user didn't click accept
174
+ edited[index] = current_text
175
+
176
+ combined_md = "\n\n".join(edited) + "\n"
177
+
178
+ # Generate YAML
179
+ questions_data = [_parse_question_block(q) for q in edited]
180
+ yml_content = _generate_yml(questions_data)
181
+
182
+ # Save to output folder
183
+ run_manager = get_run_manager()
184
+ saved_path = run_manager.save_edited_quiz(combined_md, "formatted_quiz_edited.md")
185
+ run_manager.save_edited_quiz(yml_content, "formatted_quiz_edited.yml")
186
+
187
+ # Create temp files for Gradio download
188
+ tmp_md = tempfile.NamedTemporaryFile(delete=False, suffix=".md", mode="w", encoding="utf-8")
189
+ tmp_md.write(combined_md)
190
+ tmp_md.close()
191
+
192
+ tmp_yml = tempfile.NamedTemporaryFile(delete=False, suffix=".yml", mode="w", encoding="utf-8")
193
+ tmp_yml.write(yml_content)
194
+ tmp_yml.close()
195
+
196
+ status = f"Saved to {saved_path}" if saved_path else "Download ready."
197
+ return status, [tmp_md.name, tmp_yml.name]
ui/feedback_handlers.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ from quiz_generator import QuizGenerator
4
+ from .state import get_processed_contents
5
+
6
+ def propose_question_handler(guidance, model_name, temperature):
7
+ """Generate a single question based on user guidance or feedback."""
8
+
9
+ if not get_processed_contents():
10
+ return "Please upload and process files in the 'Generate Learning Objectives' tab first.", None
11
+
12
+ if not os.getenv("OPENAI_API_KEY"):
13
+ return "OpenAI API key not found.", None
14
+
15
+ try:
16
+ quiz_generator = QuizGenerator(
17
+ api_key=os.getenv("OPENAI_API_KEY"),
18
+ model=model_name,
19
+ temperature=float(temperature)
20
+ )
21
+
22
+ question = quiz_generator.generate_multiple_choice_question_from_feedback(
23
+ guidance, get_processed_contents()
24
+ )
25
+
26
+ formatted_question = {
27
+ "id": question.id,
28
+ "question_text": question.question_text,
29
+ "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in question.options],
30
+ "learning_objective": question.learning_objective,
31
+ "source_reference": question.source_reference,
32
+ "feedback": question.feedback
33
+ }
34
+
35
+ return "Question generated successfully.", json.dumps(formatted_question, indent=2)
36
+ except Exception as e:
37
+ return f"Error: {str(e)}", None
ui/formatting.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+
3
+ def format_quiz_for_ui(questions_json):
4
+ """Format quiz questions for display in the UI."""
5
+ if not questions_json:
6
+ return "No questions to format."
7
+
8
+ try:
9
+ questions = json.loads(questions_json)
10
+
11
+ # Sort questions by rank if available
12
+ try:
13
+ questions = sorted(questions, key=lambda q: q.get('rank', 999))
14
+ except Exception as e:
15
+ print(f"Warning: Could not sort by rank: {e}")
16
+
17
+ formatted_output = ""
18
+ for i, question in enumerate(questions, 1):
19
+ # Add question with rank if available
20
+ rank_info = ""
21
+ if 'rank' in question:
22
+ rank_info = f" [Rank: {question['rank']}]"
23
+
24
+ formatted_output += f"**Question {i}{rank_info}:** {question['question_text']}\n\n"
25
+
26
+ # Add ranking reasoning if available
27
+ if 'ranking_reasoning' in question:
28
+ formatted_output += f"Ranking Reasoning: {question['ranking_reasoning']}\n\n"
29
+
30
+ options = question['options']
31
+ option_letters = ['A', 'B', 'C', 'D']
32
+
33
+ # Add each option with its letter
34
+ for j, option in enumerate(options):
35
+ letter = option_letters[j]
36
+ correct_marker = " [Correct]" if option['is_correct'] else ""
37
+
38
+ formatted_output += f"\t• {letter}{correct_marker}: {option['text']}\n"
39
+ formatted_output += f"\t ◦ Feedback: {option['feedback']}\n\n"
40
+
41
+ formatted_output += "\n"
42
+
43
+ return formatted_output
44
+
45
+ except Exception as e:
46
+ return f"Error formatting quiz: {str(e)}"
ui/objective_handlers.py ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+ from typing import List
5
+ from models.learning_objectives import LearningObjective
6
+ from .content_processor import ContentProcessor
7
+ from quiz_generator import QuizGenerator
8
+ from .state import get_processed_contents, set_processed_contents, set_learning_objectives
9
+ from .run_manager import get_run_manager
10
+ from .question_handlers import generate_questions
11
+
12
+ def process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature):
13
+ """Process uploaded files and generate learning objectives."""
14
+
15
+ run_manager = get_run_manager()
16
+
17
+ # Input validation
18
+ if not files:
19
+ return "Please upload at least one file.", None, None, None
20
+
21
+ if not os.getenv("OPENAI_API_KEY"):
22
+ return "OpenAI API key not found. Please set the OPENAI_API_KEY environment variable.", None, None, None
23
+
24
+ # Extract file paths
25
+ file_paths = _extract_file_paths(files)
26
+ if not file_paths:
27
+ return "No valid files found. Please upload valid .ipynb, .vtt, .srt, or .md files.", None, None, None
28
+
29
+ # Start run and logging
30
+ run_id = run_manager.start_objective_run(
31
+ files=file_paths,
32
+ num_objectives=num_objectives,
33
+ num_runs=num_runs,
34
+ model=model_name,
35
+ incorrect_answer_model=incorrect_answer_model_name,
36
+ temperature=temperature
37
+ )
38
+
39
+ run_manager.log(f"Processing {len(file_paths)} files: {[os.path.basename(f) for f in file_paths]}", level="DEBUG")
40
+
41
+ # Process files
42
+ processor = ContentProcessor()
43
+ file_contents = processor.process_files(file_paths)
44
+
45
+ if not file_contents:
46
+ run_manager.log("No content extracted from the uploaded files", level="ERROR")
47
+ return "No content extracted from the uploaded files.", None, None, None
48
+
49
+ run_manager.log(f"Successfully extracted content from {len(file_contents)} files", level="INFO")
50
+
51
+ # Store file contents for later use
52
+ set_processed_contents(file_contents)
53
+
54
+ # Generate learning objectives
55
+ run_manager.log(f"Creating QuizGenerator with model={model_name}, temperature={temperature}", level="INFO")
56
+ quiz_generator = QuizGenerator(
57
+ api_key=os.getenv("OPENAI_API_KEY"),
58
+ model=model_name,
59
+ temperature=float(temperature)
60
+ )
61
+
62
+ all_learning_objectives = _generate_multiple_runs(
63
+ quiz_generator, file_contents, num_objectives, num_runs, incorrect_answer_model_name, run_manager
64
+ )
65
+
66
+ # Group and rank objectives
67
+ grouped_result = _group_base_objectives_add_incorrect_answers(
68
+ quiz_generator, all_learning_objectives, file_contents, incorrect_answer_model_name, run_manager
69
+ )
70
+
71
+ # Format results for display
72
+ formatted_results = _format_objective_results(grouped_result, all_learning_objectives, num_objectives, run_manager)
73
+
74
+ # Store results
75
+ set_learning_objectives(grouped_result["all_grouped"])
76
+
77
+ # Save outputs to files
78
+ params = {
79
+ "files": [os.path.basename(f) for f in file_paths],
80
+ "num_objectives": num_objectives,
81
+ "num_runs": num_runs,
82
+ "model": model_name,
83
+ "incorrect_answer_model": incorrect_answer_model_name,
84
+ "temperature": temperature
85
+ }
86
+ run_manager.save_objectives_outputs(
87
+ best_in_group=formatted_results[1],
88
+ all_grouped=formatted_results[2],
89
+ raw_ungrouped=formatted_results[3],
90
+ params=params
91
+ )
92
+
93
+ # End run
94
+ run_manager.end_run(run_type="Learning Objectives")
95
+
96
+ return formatted_results
97
+
98
+ def regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature):
99
+ """Regenerate learning objectives based on feedback."""
100
+
101
+ if not get_processed_contents():
102
+ return "No processed content available. Please upload files first.", objectives_json, objectives_json
103
+
104
+ if not os.getenv("OPENAI_API_KEY"):
105
+ return "OpenAI API key not found.", objectives_json, objectives_json
106
+
107
+ if not feedback:
108
+ return "Please provide feedback to regenerate learning objectives.", objectives_json, objectives_json
109
+
110
+ # Add feedback to file contents
111
+ file_contents_with_feedback = get_processed_contents().copy()
112
+ file_contents_with_feedback.append(f"FEEDBACK ON PREVIOUS OBJECTIVES: {feedback}")
113
+
114
+ # Generate with feedback
115
+ quiz_generator = QuizGenerator(
116
+ api_key=os.getenv("OPENAI_API_KEY"),
117
+ model=model_name,
118
+ temperature=float(temperature)
119
+ )
120
+
121
+ try:
122
+ # Generate multiple runs of learning objectives with feedback
123
+ all_learning_objectives = _generate_multiple_runs(
124
+ quiz_generator,
125
+ file_contents_with_feedback,
126
+ num_objectives,
127
+ num_runs,
128
+ model_name # Use the same model for incorrect answer suggestions
129
+ )
130
+
131
+ # Group and rank the objectives
132
+ grouping_result = _group_base_objectives_add_incorrect_answers(quiz_generator, all_base_learning_objectives, file_contents_with_feedback, model_name)
133
+
134
+ # Get the results
135
+ grouped_objectives = grouping_result["all_grouped"]
136
+ best_in_group_objectives = grouping_result["best_in_group"]
137
+
138
+ # Convert to JSON
139
+ grouped_objectives_json = json.dumps([obj.dict() for obj in grouped_objectives])
140
+ best_in_group_json = json.dumps([obj.dict() for obj in best_in_group_objectives])
141
+
142
+ return f"Generated {len(all_learning_objectives)} learning objectives, {len(best_in_group_objectives)} unique after grouping.", grouped_objectives_json, best_in_group_json
143
+
144
+ except Exception as e:
145
+ print(f"Error regenerating learning objectives: {e}")
146
+ import traceback
147
+ traceback.print_exc()
148
+ return f"Error regenerating learning objectives: {str(e)}", objectives_json, objectives_json
149
+
150
+ def _extract_file_paths(files):
151
+ """Extract file paths from different input formats."""
152
+ file_paths = []
153
+
154
+ if isinstance(files, list):
155
+ for file in files:
156
+ if file and os.path.exists(file):
157
+ file_paths.append(file)
158
+ elif isinstance(files, str) and os.path.exists(files):
159
+ file_paths.append(files)
160
+ elif hasattr(files, 'name') and os.path.exists(files.name):
161
+ file_paths.append(files.name)
162
+
163
+ return file_paths
164
+
165
+ def _generate_multiple_runs(quiz_generator, file_contents, num_objectives, num_runs, incorrect_answer_model_name, run_manager):
166
+ """Generate learning objectives across multiple runs."""
167
+ all_learning_objectives = []
168
+ num_runs_int = int(num_runs)
169
+
170
+ for run in range(num_runs_int):
171
+ run_manager.log(f"Starting generation run {run+1}/{num_runs_int}", level="INFO")
172
+
173
+ # Generate base learning objectives without grouping or incorrect answers
174
+ learning_objectives = quiz_generator.generate_base_learning_objectives(
175
+ file_contents, num_objectives, incorrect_answer_model_name
176
+ )
177
+
178
+ run_manager.log(f"Generated {len(learning_objectives)} learning objectives in run {run+1}", level="INFO")
179
+
180
+ # Assign temporary IDs
181
+ for i, obj in enumerate(learning_objectives):
182
+ obj.id = 1000 * (run + 1) + (i + 1)
183
+
184
+ all_learning_objectives.extend(learning_objectives)
185
+
186
+ run_manager.log(f"Total learning objectives from all runs: {len(all_learning_objectives)}", level="INFO")
187
+ return all_learning_objectives
188
+
189
+ def _group_base_objectives_add_incorrect_answers(quiz_generator, all_base_learning_objectives, file_contents, incorrect_answer_model_name=None, run_manager=None):
190
+ """Group base learning objectives and add incorrect answers to best-in-group objectives."""
191
+ run_manager.log("Grouping base learning objectives...", level="INFO")
192
+ grouping_result = quiz_generator.group_base_learning_objectives(all_base_learning_objectives, file_contents)
193
+
194
+ grouped_objectives = grouping_result["all_grouped"]
195
+ best_in_group_objectives = grouping_result["best_in_group"]
196
+
197
+ run_manager.log(f"Grouped into {len(best_in_group_objectives)} best-in-group objectives", level="INFO")
198
+
199
+ # Find and reassign the best first objective to ID=1
200
+ _reassign_objective_ids(grouped_objectives, run_manager)
201
+
202
+ # Step 1: Generate incorrect answer suggestions only for best-in-group objectives
203
+ run_manager.log("Generating incorrect answer options only for best-in-group objectives...", level="INFO")
204
+ enhanced_best_in_group = quiz_generator.generate_lo_incorrect_answer_options(
205
+ file_contents, best_in_group_objectives, incorrect_answer_model_name
206
+ )
207
+
208
+ run_manager.log("Generated incorrect answer options", level="INFO")
209
+
210
+ # Clear debug directory for incorrect answer regeneration logs
211
+ debug_dir = os.path.join("incorrect_suggestion_debug")
212
+ if os.path.exists(debug_dir):
213
+ shutil.rmtree(debug_dir)
214
+ os.makedirs(debug_dir, exist_ok=True)
215
+
216
+ # Step 2: Run the improvement workflow on the generated incorrect answers
217
+ run_manager.log("Improving incorrect answer options for best-in-group objectives...", level="INFO")
218
+ improved_best_in_group = quiz_generator.learning_objective_generator.regenerate_incorrect_answers(
219
+ enhanced_best_in_group, file_contents
220
+ )
221
+
222
+ run_manager.log("Completed improvement of incorrect answer options", level="INFO")
223
+
224
+ # Create a map of best-in-group objectives by ID for easy lookup
225
+ best_in_group_map = {obj.id: obj for obj in improved_best_in_group}
226
+
227
+ # Process all grouped objectives
228
+ final_grouped_objectives = []
229
+
230
+ for grouped_obj in grouped_objectives:
231
+ if getattr(grouped_obj, "best_in_group", False):
232
+ # For best-in-group objectives, use the enhanced version with incorrect answers
233
+ if grouped_obj.id in best_in_group_map:
234
+ final_grouped_objectives.append(best_in_group_map[grouped_obj.id])
235
+ else:
236
+ # This shouldn't happen, but just in case
237
+ final_grouped_objectives.append(grouped_obj)
238
+ else:
239
+ # For non-best-in-group objectives, ensure they have empty incorrect answers
240
+ final_grouped_objectives.append(LearningObjective(
241
+ id=grouped_obj.id,
242
+ learning_objective=grouped_obj.learning_objective,
243
+ source_reference=grouped_obj.source_reference,
244
+ correct_answer=grouped_obj.correct_answer,
245
+ incorrect_answer_options=[], # Empty list for non-best-in-group
246
+ in_group=getattr(grouped_obj, 'in_group', None),
247
+ group_members=getattr(grouped_obj, 'group_members', None),
248
+ best_in_group=getattr(grouped_obj, 'best_in_group', None)
249
+ ))
250
+
251
+ return {
252
+ "all_grouped": final_grouped_objectives,
253
+ "best_in_group": improved_best_in_group
254
+ }
255
+
256
+ def _reassign_objective_ids(grouped_objectives, run_manager):
257
+ """Reassign IDs to ensure best first objective gets ID=1."""
258
+ # Find best first objective
259
+ best_first_objective = None
260
+
261
+ # First identify all groups containing objectives with IDs ending in 001
262
+ groups_with_001 = {}
263
+ for obj in grouped_objectives:
264
+ if obj.id % 1000 == 1: # ID ends in 001
265
+ group_members = getattr(obj, "group_members", [obj.id])
266
+ for member_id in group_members:
267
+ if member_id not in groups_with_001:
268
+ groups_with_001[member_id] = True
269
+
270
+ # Now find the best_in_group objective from these groups
271
+ for obj in grouped_objectives:
272
+ obj_id = getattr(obj, "id", 0)
273
+ group_members = getattr(obj, "group_members", [obj_id])
274
+
275
+ # Check if this objective is in a group with 001 objectives
276
+ is_in_001_group = any(member_id in groups_with_001 for member_id in group_members)
277
+
278
+ if is_in_001_group and getattr(obj, "best_in_group", False):
279
+ best_first_objective = obj
280
+ run_manager.log(f"Found best_in_group objective in a 001 group with ID={obj.id}", level="DEBUG")
281
+ break
282
+
283
+ # If no best_in_group from 001 groups found, fall back to the first 001 objective
284
+ if not best_first_objective:
285
+ for obj in grouped_objectives:
286
+ if obj.id % 1000 == 1: # First objective from a run
287
+ best_first_objective = obj
288
+ run_manager.log(f"No best_in_group from 001 groups found, using first 001 with ID={obj.id}", level="DEBUG")
289
+ break
290
+ # Reassign IDs
291
+ id_counter = 2
292
+ if best_first_objective:
293
+ best_first_objective.id = 1
294
+ run_manager.log(f"Reassigned primary objective to ID=1", level="INFO")
295
+
296
+ for obj in grouped_objectives:
297
+ if obj is best_first_objective:
298
+ continue
299
+ obj.id = id_counter
300
+ id_counter += 1
301
+
302
+ def _format_objective_results(grouped_result, all_learning_objectives, num_objectives, run_manager):
303
+ """Format objective results for display."""
304
+ sorted_best_in_group = sorted(grouped_result["best_in_group"], key=lambda obj: obj.id)
305
+ sorted_all_grouped = sorted(grouped_result["all_grouped"], key=lambda obj: obj.id)
306
+
307
+ # Limit best-in-group to the requested number of objectives
308
+ sorted_best_in_group = sorted_best_in_group[:num_objectives]
309
+
310
+ run_manager.log("Formatting objective results for display", level="INFO")
311
+ run_manager.log(f"Best-in-group objectives limited to top {len(sorted_best_in_group)} (requested: {num_objectives})", level="INFO")
312
+
313
+ # Format best-in-group
314
+ formatted_best_in_group = []
315
+ for obj in sorted_best_in_group:
316
+ formatted_best_in_group.append({
317
+ "id": obj.id,
318
+ "learning_objective": obj.learning_objective,
319
+ "source_reference": obj.source_reference,
320
+ "correct_answer": obj.correct_answer,
321
+ "incorrect_answer_options": getattr(obj, 'incorrect_answer_options', None),
322
+ "in_group": getattr(obj, 'in_group', None),
323
+ "group_members": getattr(obj, 'group_members', None),
324
+ "best_in_group": getattr(obj, 'best_in_group', None)
325
+ })
326
+
327
+ # Format grouped
328
+ formatted_grouped = []
329
+ for obj in sorted_all_grouped:
330
+ formatted_grouped.append({
331
+ "id": obj.id,
332
+ "learning_objective": obj.learning_objective,
333
+ "source_reference": obj.source_reference,
334
+ "correct_answer": obj.correct_answer,
335
+ "incorrect_answer_options": getattr(obj, 'incorrect_answer_options', None),
336
+ "in_group": getattr(obj, 'in_group', None),
337
+ "group_members": getattr(obj, 'group_members', None),
338
+ "best_in_group": getattr(obj, 'best_in_group', None)
339
+ })
340
+
341
+ # Format unranked
342
+ formatted_unranked = []
343
+ for obj in all_learning_objectives:
344
+ formatted_unranked.append({
345
+ "id": obj.id,
346
+ "learning_objective": obj.learning_objective,
347
+ "source_reference": obj.source_reference,
348
+ "correct_answer": obj.correct_answer
349
+ })
350
+
351
+ run_manager.log(f"Formatted {len(formatted_best_in_group)} best-in-group, {len(formatted_grouped)} grouped, {len(formatted_unranked)} raw objectives", level="INFO")
352
+
353
+ return (
354
+ f"Generated and grouped {len(formatted_best_in_group)} unique learning objectives successfully. Saved to run: {run_manager.get_current_run_id()}",
355
+ json.dumps(formatted_best_in_group, indent=2),
356
+ json.dumps(formatted_grouped, indent=2),
357
+ json.dumps(formatted_unranked, indent=2)
358
+ )
359
+
360
+ def process_files_and_generate_questions(files, num_objectives, num_runs, model_name, incorrect_answer_model_name,
361
+ temperature, model_name_q, temperature_q, num_questions, num_runs_q):
362
+ """Process files, generate learning objectives, and then generate questions in one flow."""
363
+
364
+ # First, generate learning objectives
365
+ obj_results = process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)
366
+
367
+ # obj_results contains: (status, objectives_output, grouped_output, raw_ungrouped_output)
368
+ status_obj, objectives_output, grouped_output, raw_ungrouped_output = obj_results
369
+
370
+ # Check if objectives generation failed
371
+ if not objectives_output or objectives_output is None:
372
+ # Return error status for objectives and empty values for questions
373
+ return (
374
+ status_obj, # status_output
375
+ objectives_output, # objectives_output
376
+ grouped_output, # grouped_output
377
+ raw_ungrouped_output, # raw_ungrouped_output
378
+ "Learning objectives generation failed. Cannot proceed with questions.", # status_q_output
379
+ None, # best_questions_output
380
+ None, # all_questions_output
381
+ None # formatted_quiz_output
382
+ )
383
+
384
+ # Now generate questions using the objectives
385
+ question_results = generate_questions(objectives_output, model_name_q, temperature_q, num_questions, num_runs_q)
386
+
387
+ # question_results contains: (status_q, best_questions_output, all_questions_output, formatted_quiz_output)
388
+ status_q, best_questions_output, all_questions_output, formatted_quiz_output = question_results
389
+
390
+ # Combine the status messages
391
+ combined_status = f"{status_obj}\n\nThen:\n{status_q}"
392
+
393
+ # Return all 8 outputs
394
+ return (
395
+ combined_status, # status_output
396
+ objectives_output, # objectives_output
397
+ grouped_output, # grouped_output
398
+ raw_ungrouped_output, # raw_ungrouped_output
399
+ status_q, # status_q_output
400
+ best_questions_output, # best_questions_output
401
+ all_questions_output, # all_questions_output
402
+ formatted_quiz_output # formatted_quiz_output
403
+ )
ui/question_handlers.py ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import shutil
4
+ from typing import List
5
+ from quiz_generator import QuizGenerator
6
+ from models import LearningObjective
7
+ from .state import get_processed_contents
8
+ from .formatting import format_quiz_for_ui
9
+ from .run_manager import get_run_manager
10
+
11
+ def generate_questions(objectives_json, model_name, temperature, num_questions, num_runs):
12
+ """Generate questions based on approved learning objectives."""
13
+
14
+ run_manager = get_run_manager()
15
+
16
+ # Input validation
17
+ if not objectives_json:
18
+ return "No learning objectives provided.", None, None, None
19
+
20
+ if not os.getenv("OPENAI_API_KEY"):
21
+ return "OpenAI API key not found.", None, None, None
22
+
23
+ if not get_processed_contents():
24
+ return "No processed content available. Please go back to the first tab and upload files.", None, None, None
25
+
26
+ # Parse and create learning objectives
27
+ learning_objectives = _parse_learning_objectives(objectives_json)
28
+ if not learning_objectives:
29
+ run_manager.log("Invalid learning objectives JSON", level="ERROR")
30
+ return "Invalid learning objectives JSON.", None, None, None
31
+
32
+ # Start question run
33
+ run_id = run_manager.start_question_run(
34
+ objectives_count=len(learning_objectives),
35
+ model=model_name,
36
+ temperature=temperature,
37
+ num_questions=int(num_questions),
38
+ num_runs=int(num_runs)
39
+ )
40
+
41
+ run_manager.log(f"Parsed {len(learning_objectives)} learning objectives", level="INFO")
42
+ run_manager.log(f"Target total questions: {num_questions}", level="INFO")
43
+
44
+ # Generate questions
45
+ run_manager.log(f"Creating QuizGenerator with model={model_name}, temperature={temperature}", level="INFO")
46
+ quiz_generator = QuizGenerator(
47
+ api_key=os.getenv("OPENAI_API_KEY"),
48
+ model=model_name,
49
+ temperature=float(temperature)
50
+ )
51
+
52
+ all_questions = _generate_questions_multiple_runs(
53
+ quiz_generator, learning_objectives, int(num_questions), num_runs, run_manager
54
+ )
55
+
56
+ # Group and rank questions
57
+ results = _group_and_rank_questions(quiz_generator, all_questions, run_manager)
58
+
59
+ # Improve incorrect answers
60
+ #_improve_incorrect_answers(quiz_generator, results["best_in_group_ranked"])
61
+
62
+ # Format results
63
+ formatted_results = _format_question_results(results, int(num_questions), run_manager)
64
+
65
+ # Save outputs to files
66
+ params = {
67
+ "objectives_count": len(learning_objectives),
68
+ "model": model_name,
69
+ "temperature": temperature,
70
+ "num_questions": int(num_questions),
71
+ "num_runs": int(num_runs)
72
+ }
73
+ run_manager.save_questions_outputs(
74
+ best_ranked=formatted_results[1],
75
+ all_grouped=formatted_results[2],
76
+ formatted_quiz=formatted_results[3],
77
+ params=params
78
+ )
79
+
80
+ # End run
81
+ run_manager.end_run(run_type="Questions")
82
+
83
+ return formatted_results
84
+
85
+ def _parse_learning_objectives(objectives_json):
86
+ """Parse learning objectives from JSON."""
87
+ try:
88
+ objectives_data = json.loads(objectives_json)
89
+ learning_objectives = []
90
+
91
+ for obj_data in objectives_data:
92
+ obj = LearningObjective(
93
+ id=obj_data["id"],
94
+ learning_objective=obj_data["learning_objective"],
95
+ source_reference=obj_data["source_reference"],
96
+ correct_answer=obj_data["correct_answer"],
97
+ incorrect_answer_options=obj_data["incorrect_answer_options"]
98
+ )
99
+ learning_objectives.append(obj)
100
+
101
+ return learning_objectives
102
+ except json.JSONDecodeError:
103
+ return None
104
+
105
+ def _generate_questions_multiple_runs(quiz_generator, learning_objectives, num_questions, num_runs, run_manager):
106
+ """Generate questions across multiple runs with proportional distribution."""
107
+ all_questions = []
108
+ num_runs_int = int(num_runs)
109
+ num_objectives = len(learning_objectives)
110
+
111
+ # Calculate proportional distribution of questions across objectives
112
+ distribution = _calculate_proportional_distribution(num_questions, num_objectives)
113
+ run_manager.log(f"Question distribution across {num_objectives} objectives: {distribution}", level="INFO")
114
+
115
+ # Select which objectives to use based on distribution
116
+ objectives_to_use = []
117
+ for i, count in enumerate(distribution):
118
+ if count > 0 and i < len(learning_objectives):
119
+ objectives_to_use.append((learning_objectives[i], count))
120
+
121
+ run_manager.log(f"Using {len(objectives_to_use)} learning objectives for question generation", level="INFO")
122
+
123
+ for run in range(num_runs_int):
124
+ run_manager.log(f"Starting question generation run {run+1}/{num_runs_int}", level="INFO")
125
+
126
+ # Generate questions for each selected objective with its assigned count
127
+ for obj, question_count in objectives_to_use:
128
+ run_manager.log(f"Generating {question_count} question(s) for objective {obj.id}: {obj.learning_objective[:80]}...", level="INFO")
129
+
130
+ for q_num in range(question_count):
131
+ run_questions = quiz_generator.generate_questions_in_parallel(
132
+ [obj], get_processed_contents()
133
+ )
134
+
135
+ if run_questions:
136
+ run_manager.log(f"Generated question {q_num+1}/{question_count} for objective {obj.id}", level="DEBUG")
137
+ all_questions.extend(run_questions)
138
+
139
+ run_manager.log(f"Generated {len(all_questions)} questions so far in run {run+1}", level="INFO")
140
+
141
+ # Assign unique IDs
142
+ for i, q in enumerate(all_questions):
143
+ q.id = i + 1
144
+
145
+ run_manager.log(f"Total questions from all runs: {len(all_questions)}", level="INFO")
146
+ return all_questions
147
+
148
+ def _calculate_proportional_distribution(num_questions, num_objectives):
149
+ """Calculate how to distribute N questions across M objectives proportionally."""
150
+ if num_questions <= 0 or num_objectives <= 0:
151
+ return []
152
+
153
+ # If we have more objectives than questions, only use as many objectives as we have questions
154
+ if num_questions < num_objectives:
155
+ distribution = [1] * num_questions + [0] * (num_objectives - num_questions)
156
+ return distribution
157
+
158
+ # Calculate base questions per objective and remainder
159
+ base_per_objective = num_questions // num_objectives
160
+ remainder = num_questions % num_objectives
161
+
162
+ # Distribute evenly, giving extra questions to the first 'remainder' objectives
163
+ distribution = [base_per_objective + (1 if i < remainder else 0) for i in range(num_objectives)]
164
+
165
+ return distribution
166
+
167
+ def _group_and_rank_questions(quiz_generator, all_questions, run_manager):
168
+ """Group and rank questions."""
169
+ run_manager.log(f"Grouping {len(all_questions)} questions by similarity...", level="INFO")
170
+ grouping_result = quiz_generator.group_questions(all_questions, get_processed_contents())
171
+
172
+ run_manager.log(f"Grouped into {len(grouping_result['best_in_group'])} best-in-group questions", level="INFO")
173
+
174
+ # Rank ALL grouped questions (not just best-in-group) to ensure we have enough questions for selection
175
+ run_manager.log(f"Ranking all {len(grouping_result['grouped'])} grouped questions...", level="INFO")
176
+ ranking_result = quiz_generator.rank_questions(grouping_result['grouped'], get_processed_contents())
177
+
178
+ run_manager.log("Completed ranking of questions", level="INFO")
179
+
180
+ return {
181
+ "grouped": grouping_result["grouped"],
182
+ "all_ranked": ranking_result["ranked"]
183
+ }
184
+
185
+ def _improve_incorrect_answers(quiz_generator, questions):
186
+ """Improve incorrect answer options."""
187
+ # Clear debug directory
188
+ debug_dir = os.path.join("wrong_answer_debug")
189
+ if os.path.exists(debug_dir):
190
+ shutil.rmtree(debug_dir)
191
+ os.makedirs(debug_dir, exist_ok=True)
192
+
193
+ quiz_generator.regenerate_incorrect_answers(questions, get_processed_contents())
194
+
195
+ def _format_question_results(results, num_questions, run_manager):
196
+ """Format question results for display."""
197
+ run_manager.log("Formatting question results for display", level="INFO")
198
+
199
+ # Format all ranked questions (these will be the top N questions from all grouped questions)
200
+ formatted_best_questions = []
201
+ for q in results["all_ranked"]:
202
+ formatted_best_questions.append({
203
+ "id": q.id,
204
+ "question_text": q.question_text,
205
+ "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in q.options],
206
+ "learning_objective_id": q.learning_objective_id,
207
+ "learning_objective": q.learning_objective,
208
+ "correct_answer": q.correct_answer,
209
+ "source_reference": q.source_reference,
210
+ "rank": getattr(q, "rank", None),
211
+ "ranking_reasoning": getattr(q, "ranking_reasoning", None),
212
+ "in_group": getattr(q, "in_group", None),
213
+ "group_members": getattr(q, "group_members", None),
214
+ "best_in_group": getattr(q, "best_in_group", None)
215
+ })
216
+
217
+ # Format all grouped questions
218
+ formatted_all_questions = []
219
+ for q in results["grouped"]:
220
+ formatted_all_questions.append({
221
+ "id": q.id,
222
+ "question_text": q.question_text,
223
+ "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in q.options],
224
+ "learning_objective_id": q.learning_objective_id,
225
+ "learning_objective": q.learning_objective,
226
+ "correct_answer": q.correct_answer,
227
+ "source_reference": q.source_reference,
228
+ "in_group": getattr(q, "in_group", None),
229
+ "group_members": getattr(q, "group_members", None),
230
+ "best_in_group": getattr(q, "best_in_group", None)
231
+ })
232
+
233
+ # Limit formatted quiz and best-ranked to the requested number of questions
234
+ formatted_best_questions_limited = formatted_best_questions[:num_questions]
235
+ formatted_quiz = format_quiz_for_ui(json.dumps(formatted_best_questions_limited, indent=2))
236
+
237
+ run_manager.log(f"Formatted {len(formatted_best_questions)} best-ranked, {len(formatted_all_questions)} grouped questions", level="INFO")
238
+ run_manager.log(f"Best-ranked and formatted quiz limited to top {len(formatted_best_questions_limited)} questions (requested: {num_questions})", level="INFO")
239
+
240
+ return (
241
+ f"Generated and ranked {len(formatted_best_questions_limited)} unique questions successfully. Saved to run: {run_manager.get_current_run_id()}/{run_manager.get_current_question_run_id()}",
242
+ json.dumps(formatted_best_questions_limited, indent=2),
243
+ json.dumps(formatted_all_questions, indent=2),
244
+ formatted_quiz
245
+ )
ui/run_manager.py ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Run manager for tracking test runs and managing output folders."""
2
+
3
+ import os
4
+ import json
5
+ import time
6
+ from datetime import datetime
7
+ from typing import Dict, Any, Optional, List
8
+ from pathlib import Path
9
+
10
+
11
+ class RunManager:
12
+ """Manages test runs, folders, and logging."""
13
+
14
+ def __init__(self, base_dir: str = "results", save_outputs: bool = True):
15
+ self.base_dir = base_dir
16
+ self.save_outputs = save_outputs
17
+ self.current_run_id: Optional[str] = None
18
+ self.current_run_dir: Optional[str] = None
19
+ self.current_question_run_id: Optional[str] = None # Track current question run ID
20
+ self.log_file: Optional[str] = None
21
+ self.run_start_time: Optional[float] = None
22
+ self.last_objective_params: Optional[Dict[str, Any]] = None
23
+
24
+ # Create base results directory
25
+ if self.save_outputs:
26
+ os.makedirs(self.base_dir, exist_ok=True)
27
+
28
+ def _get_next_run_id(self) -> str:
29
+ """Generate the next unique run ID."""
30
+ existing_runs = [d for d in os.listdir(self.base_dir)
31
+ if d.startswith("test_id") and os.path.isdir(os.path.join(self.base_dir, d))]
32
+
33
+ if not existing_runs:
34
+ return "test_id00001"
35
+
36
+ # Extract numbers and find max
37
+ numbers = []
38
+ for run in existing_runs:
39
+ try:
40
+ num = int(run.replace("test_id", ""))
41
+ numbers.append(num)
42
+ except ValueError:
43
+ continue
44
+
45
+ next_num = max(numbers) + 1 if numbers else 1
46
+ return f"test_id{next_num:05d}"
47
+
48
+ def _create_run_structure(self, run_id: str) -> str:
49
+ """Create folder structure for a run."""
50
+ run_dir = os.path.join(self.base_dir, run_id)
51
+ if self.save_outputs:
52
+ os.makedirs(run_dir, exist_ok=True)
53
+ os.makedirs(os.path.join(run_dir, "learning objectives"), exist_ok=True)
54
+ os.makedirs(os.path.join(run_dir, "questions"), exist_ok=True)
55
+ return run_dir
56
+
57
+ def _get_next_question_run_id(self) -> str:
58
+ """Generate the next unique question run ID for the current test run."""
59
+ if self.current_run_dir is None:
60
+ return "q_run_001"
61
+
62
+ questions_dir = os.path.join(self.current_run_dir, "questions")
63
+ if not os.path.exists(questions_dir):
64
+ return "q_run_001"
65
+
66
+ # Find existing question run folders
67
+ existing_q_runs = [d for d in os.listdir(questions_dir)
68
+ if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))]
69
+
70
+ if not existing_q_runs:
71
+ return "q_run_001"
72
+
73
+ # Extract numbers and find max
74
+ numbers = []
75
+ for run in existing_q_runs:
76
+ try:
77
+ num = int(run.replace("q_run_", ""))
78
+ numbers.append(num)
79
+ except ValueError:
80
+ continue
81
+
82
+ next_num = max(numbers) + 1 if numbers else 1
83
+ return f"q_run_{next_num:03d}"
84
+
85
+ def _params_changed(self, new_params: Dict[str, Any]) -> bool:
86
+ """Check if objective generation parameters have changed."""
87
+ if self.last_objective_params is None:
88
+ return True
89
+
90
+ # Compare relevant parameters
91
+ keys_to_compare = ["files", "num_objectives", "num_runs", "model",
92
+ "incorrect_answer_model", "temperature"]
93
+
94
+ for key in keys_to_compare:
95
+ if new_params.get(key) != self.last_objective_params.get(key):
96
+ return True
97
+
98
+ return False
99
+
100
+ def start_objective_run(self, files: List[str], num_objectives: int, num_runs: str,
101
+ model: str, incorrect_answer_model: str, temperature: str) -> str:
102
+ """
103
+ Start a new objective generation run or continue existing one.
104
+ Returns the run ID.
105
+ """
106
+ params = {
107
+ "files": sorted(files), # Sort for consistent comparison
108
+ "num_objectives": num_objectives,
109
+ "num_runs": num_runs,
110
+ "model": model,
111
+ "incorrect_answer_model": incorrect_answer_model,
112
+ "temperature": temperature
113
+ }
114
+
115
+ # Check if we need a new run
116
+ if self._params_changed(params):
117
+ # Create new run
118
+ self.current_run_id = self._get_next_run_id()
119
+ self.current_run_dir = self._create_run_structure(self.current_run_id)
120
+ self.log_file = os.path.join(self.current_run_dir, "log.log")
121
+ self.last_objective_params = params
122
+
123
+ # Log header
124
+ self.log(f"=== New Learning Objectives Run: {self.current_run_id} ===", level="INFO")
125
+ self.log(f"Inputs: {[os.path.basename(f) for f in files]}", level="INFO")
126
+ self.log("Variables:", level="INFO")
127
+ self.log(f" Number of Learning Objectives per Run: {num_objectives}", level="INFO")
128
+ self.log(f" Number of Generation Runs: {num_runs}", level="INFO")
129
+ self.log(f" Model: {model}", level="INFO")
130
+ self.log(f" Model for Incorrect Answer Suggestions: {incorrect_answer_model}", level="INFO")
131
+ self.log(f" Temperature (0.0: Deterministic, 1.0: Creative): {temperature}", level="INFO")
132
+ self.log("", level="INFO") # Blank line
133
+ else:
134
+ # Continue existing run
135
+ self.log("", level="INFO") # Blank line
136
+ self.log(f"=== Continuing Learning Objectives Run: {self.current_run_id} ===", level="INFO")
137
+
138
+ self.run_start_time = time.time()
139
+ return self.current_run_id
140
+
141
+ def start_question_run(self, objectives_count: int, model: str,
142
+ temperature: str, num_questions: int, num_runs: int) -> str:
143
+ """
144
+ Start a question generation run (continues logging to same run).
145
+ Returns the run ID.
146
+ """
147
+ if self.current_run_id is None:
148
+ # No objective run exists, create new run
149
+ self.current_run_id = self._get_next_run_id()
150
+ self.current_run_dir = self._create_run_structure(self.current_run_id)
151
+ self.log_file = os.path.join(self.current_run_dir, "log.log")
152
+ self.log(f"=== New Questions Run: {self.current_run_id} ===", level="INFO")
153
+ else:
154
+ self.log("", level="INFO") # Blank line
155
+ self.log(f"=== Generate Questions Run ===", level="INFO")
156
+
157
+ # Get next question run ID for this test run
158
+ self.current_question_run_id = self._get_next_question_run_id()
159
+ self.log(f"Question Run ID: {self.current_question_run_id}", level="INFO")
160
+
161
+ self.log("Variables:", level="INFO")
162
+ self.log(f" Number of Learning Objectives: {objectives_count}", level="INFO")
163
+ self.log(f" Number of Questions to Generate: {num_questions}", level="INFO")
164
+ self.log(f" Model: {model}", level="INFO")
165
+ self.log(f" Temperature (0.0: Deterministic, 1.0: Creative): {temperature}", level="INFO")
166
+ self.log(f" Number of Question Generation Runs: {num_runs}", level="INFO")
167
+ self.log("", level="INFO") # Blank line
168
+
169
+ self.run_start_time = time.time()
170
+ return self.current_run_id
171
+
172
+ def log(self, message: str, level: str = "INFO"):
173
+ """Write a log message with timestamp."""
174
+ # Always print to console
175
+ print(f"[{level}] {message}")
176
+
177
+ if not self.save_outputs or self.log_file is None:
178
+ return
179
+
180
+ timestamp = datetime.now().strftime("%m/%d %H:%M:%S")
181
+ log_line = f"[{timestamp}][{level}] {message}\n"
182
+
183
+ with open(self.log_file, "a", encoding="utf-8") as f:
184
+ f.write(log_line)
185
+
186
+ def end_run(self, run_type: str = "Learning Objectives"):
187
+ """End the current run and log total time."""
188
+ if self.run_start_time is None:
189
+ return
190
+
191
+ elapsed = time.time() - self.run_start_time
192
+ self.log(f"Total time for {run_type}: +{elapsed:.0f}s", level="INFO")
193
+ self.log("", level="INFO") # Blank line
194
+
195
+ def save_objectives_outputs(self, best_in_group: str, all_grouped: str,
196
+ raw_ungrouped: str, params: Dict[str, Any]):
197
+ """Save learning objectives outputs to files."""
198
+ if not self.save_outputs or self.current_run_dir is None:
199
+ return
200
+
201
+ obj_dir = os.path.join(self.current_run_dir, "learning objectives")
202
+
203
+ # Save JSON outputs
204
+ with open(os.path.join(obj_dir, "best_in_group.json"), "w", encoding="utf-8") as f:
205
+ f.write(best_in_group)
206
+
207
+ with open(os.path.join(obj_dir, "all_grouped.json"), "w", encoding="utf-8") as f:
208
+ f.write(all_grouped)
209
+
210
+ with open(os.path.join(obj_dir, "raw_ungrouped.json"), "w", encoding="utf-8") as f:
211
+ f.write(raw_ungrouped)
212
+
213
+ # Save input parameters
214
+ with open(os.path.join(obj_dir, "input_parameters.json"), "w", encoding="utf-8") as f:
215
+ json.dump(params, f, indent=2)
216
+
217
+ # Save best-in-group learning objectives as Markdown
218
+ try:
219
+ objectives_data = json.loads(best_in_group)
220
+ md_content = "# Learning Objectives\n\n"
221
+ for i, obj in enumerate(objectives_data, 1):
222
+ learning_objective = obj.get("learning_objective", "")
223
+ md_content += f"{i}. {learning_objective}\n"
224
+
225
+ with open(os.path.join(obj_dir, "best_in_group.md"), "w", encoding="utf-8") as f:
226
+ f.write(md_content)
227
+ except Exception as e:
228
+ self.log(f"Error creating markdown output: {e}", level="ERROR")
229
+
230
+ self.log(f"Saved learning objectives outputs to {obj_dir}", level="INFO")
231
+
232
+ def save_questions_outputs(self, best_ranked: str, all_grouped: str,
233
+ formatted_quiz: str, params: Dict[str, Any]):
234
+ """Save questions outputs to files in a numbered subfolder."""
235
+ if not self.save_outputs or self.current_run_dir is None:
236
+ return
237
+
238
+ # Create subfolder for this question run
239
+ q_base_dir = os.path.join(self.current_run_dir, "questions")
240
+ q_run_dir = os.path.join(q_base_dir, self.current_question_run_id if self.current_question_run_id else "q_run_001")
241
+ os.makedirs(q_run_dir, exist_ok=True)
242
+
243
+ # Save JSON outputs
244
+ with open(os.path.join(q_run_dir, "best_ranked.json"), "w", encoding="utf-8") as f:
245
+ f.write(best_ranked)
246
+
247
+ with open(os.path.join(q_run_dir, "all_grouped.json"), "w", encoding="utf-8") as f:
248
+ f.write(all_grouped)
249
+
250
+ # Save formatted quiz as markdown
251
+ with open(os.path.join(q_run_dir, "formatted_quiz.md"), "w", encoding="utf-8") as f:
252
+ f.write(formatted_quiz)
253
+
254
+ # Save input parameters
255
+ with open(os.path.join(q_run_dir, "input_parameters.json"), "w", encoding="utf-8") as f:
256
+ json.dump(params, f, indent=2)
257
+
258
+ self.log(f"Saved questions outputs to {q_run_dir}", level="INFO")
259
+
260
+ def get_current_run_id(self) -> Optional[str]:
261
+ """Get the current run ID."""
262
+ return self.current_run_id
263
+
264
+ def get_current_run_dir(self) -> Optional[str]:
265
+ """Get the current run directory."""
266
+ return self.current_run_dir
267
+
268
+ def get_current_question_run_id(self) -> Optional[str]:
269
+ """Get the current question run ID."""
270
+ return self.current_question_run_id
271
+
272
+ def get_latest_formatted_quiz_path(self) -> Optional[str]:
273
+ """Find the formatted_quiz.md from the latest question run."""
274
+ if self.current_run_dir is None:
275
+ return None
276
+
277
+ questions_dir = os.path.join(self.current_run_dir, "questions")
278
+ if not os.path.exists(questions_dir):
279
+ return None
280
+
281
+ q_runs = sorted([
282
+ d for d in os.listdir(questions_dir)
283
+ if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))
284
+ ])
285
+ if not q_runs:
286
+ return None
287
+
288
+ quiz_path = os.path.join(questions_dir, q_runs[-1], "formatted_quiz.md")
289
+ return quiz_path if os.path.exists(quiz_path) else None
290
+
291
+ def save_edited_quiz(self, content: str, filename: str = "formatted_quiz_edited.md") -> Optional[str]:
292
+ """Save edited quiz to the latest question run folder."""
293
+ if not self.save_outputs or self.current_run_dir is None:
294
+ return None
295
+
296
+ questions_dir = os.path.join(self.current_run_dir, "questions")
297
+ if not os.path.exists(questions_dir):
298
+ return None
299
+
300
+ q_runs = sorted([
301
+ d for d in os.listdir(questions_dir)
302
+ if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))
303
+ ])
304
+ if not q_runs:
305
+ return None
306
+
307
+ output_path = os.path.join(questions_dir, q_runs[-1], filename)
308
+ with open(output_path, "w", encoding="utf-8") as f:
309
+ f.write(content)
310
+
311
+ self.log(f"Saved edited quiz to {output_path}", level="INFO")
312
+ return output_path
313
+
314
+
315
+ # Global run manager instance
316
+ _run_manager = None
317
+
318
+ def get_run_manager() -> RunManager:
319
+ """Get or create the global run manager instance."""
320
+ global _run_manager
321
+ if _run_manager is None:
322
+ _run_manager = RunManager()
323
+ return _run_manager
ui/state.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Global state management for the UI."""
2
+
3
+ # Global variables to store processed content and generated objectives
4
+ processed_file_contents = []
5
+ generated_learning_objectives = []
6
+
7
+ def get_processed_contents():
8
+ """Get the current processed file contents."""
9
+ return processed_file_contents
10
+
11
+ def set_processed_contents(contents):
12
+ """Set the processed file contents."""
13
+ global processed_file_contents
14
+ processed_file_contents = contents
15
+
16
+ def get_learning_objectives():
17
+ """Get the current learning objectives."""
18
+ return generated_learning_objectives
19
+
20
+ def set_learning_objectives(objectives):
21
+ """Set the learning objectives."""
22
+ global generated_learning_objectives
23
+ generated_learning_objectives = objectives
24
+
25
+ def clear_state():
26
+ """Clear all state."""
27
+ global processed_file_contents, generated_learning_objectives
28
+ processed_file_contents = []
29
+ generated_learning_objectives = []