# AI Course Assessment Generator - Functionality Report
## Table of Contents
1. [Overview](#overview)
2. [System Architecture](#system-architecture)
3. [Data Models](#data-models)
4. [Application Entry Point](#application-entry-point)
5. [User Interface Structure](#user-interface-structure)
6. [Complete Workflow](#complete-workflow)
7. [Detailed Component Functionality](#detailed-component-functionality)
8. [Quality Standards and Prompts](#quality-standards-and-prompts)
---
## Overview
The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels.
### Key Capabilities
- **Multi-format Content Processing**: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks)
- **AI-Powered Generation**: Uses OpenAI's GPT models with configurable parameters
- **Quality Assurance**: Implements LLM-based quality assessment and ranking
- **Source Tracking**: Maintains XML-tagged references from source materials to generated content
- **Iterative Improvement**: Supports feedback-based regeneration and enhancement
- **Parallel Processing**: Generates questions concurrently for improved performance
---
## System Architecture
### Architectural Patterns
#### 1. **Orchestrator Pattern**
Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.
#### 2. **Modular Prompt System**
The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks.
#### 3. **Structured Output Generation**
All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API.
#### 4. **Source Tracking via XML Tags**
Content is wrapped in XML tags (e.g., `content`) throughout the pipeline to maintain traceability from source files to generated questions.
### Technology Stack
- **Python 3.8+**
- **Gradio 5.29.0+**: Web-based UI framework
- **Pydantic 2.8.0+**: Data validation and schema management
- **OpenAI 1.52.0+**: LLM API integration
- **Instructor 1.7.9+**: Structured output generation
- **nbformat 5.9.2**: Jupyter notebook parsing
- **python-dotenv 1.0.0**: Environment variable management
---
## Data Models
### Learning Objectives Progression
The system uses a hierarchical progression of learning objective models:
#### 1. **BaseLearningObjectiveWithoutCorrectAnswer**
```python
- id: int
- learning_objective: str
- source_reference: Union[List[str], str]
```
Initial generation without correct answers.
#### 2. **BaseLearningObjective**
```python
- id: int
- learning_objective: str
- source_reference: Union[List[str], str]
- correct_answer: str
```
Base objectives with correct answers added.
#### 3. **LearningObjective**
```python
- id: int
- learning_objective: str
- source_reference: Union[List[str], str]
- correct_answer: str
- incorrect_answer_options: Union[List[str], str]
- in_group: Optional[bool]
- group_members: Optional[List[int]]
- best_in_group: Optional[bool]
```
Enhanced with incorrect answer suggestions and grouping metadata.
#### 4. **GroupedLearningObjective**
```python
(All fields from LearningObjective)
- in_group: bool (required)
- group_members: List[int] (required)
- best_in_group: bool (required)
```
Fully grouped and ranked objectives.
### Question Models Progression
#### 1. **MultipleChoiceOption**
```python
- option_text: str
- is_correct: bool
- feedback: str
```
#### 2. **MultipleChoiceQuestion**
```python
- id: int
- question_text: str
- options: List[MultipleChoiceOption]
- learning_objective_id: int
- learning_objective: str
- correct_answer: str
- source_reference: Union[List[str], str]
- judge_feedback: Optional[str]
- approved: Optional[bool]
```
#### 3. **RankedMultipleChoiceQuestion**
```python
(All fields from MultipleChoiceQuestion)
- rank: int
- ranking_reasoning: str
- in_group: bool
- group_members: List[int]
- best_in_group: bool
```
#### 4. **Assessment**
```python
- learning_objectives: List[LearningObjective]
- questions: List[RankedMultipleChoiceQuestion]
```
Final output containing both objectives and questions.
### Configuration Models
#### **MODELS**
Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]`
#### **TEMPERATURE_UNAVAILABLE**
Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings).
---
## Application Entry Point
### `app.py`
The root-level entry point that:
1. Loads environment variables from `.env` file
2. Checks for `OPENAI_API_KEY` presence
3. Creates the Gradio UI via `ui.app.create_ui()`
4. Launches the web interface at `http://127.0.0.1:7860`
```python
# Workflow:
load_dotenv() → Check API key → create_ui() → app.launch()
```
---
## User Interface Structure
### `ui/app.py` - Gradio Interface
The UI is organized into **3 main tabs**:
#### **Tab 1: Generate Learning Objectives**
**Input Components:**
- File uploader (accepts `.ipynb`, `.vtt`, `.srt`)
- Number of objectives per run (slider: 1-20, default: 3)
- Number of generation runs (dropdown: 1-5, default: 3)
- Model selection (dropdown, default: "gpt-5")
- Incorrect answer model selection (dropdown, default: "gpt-5")
- Temperature setting (dropdown: 0.0-1.0, default: 1.0)
- Generate button
- Feedback input textbox
- Regenerate button
**Output Components:**
- Status textbox
- Best-in-Group Learning Objectives (JSON)
- All Grouped Learning Objectives (JSON)
- Raw Ungrouped Learning Objectives (JSON) - for debugging
**Event Handler:** `process_files()` from `objective_handlers.py`
#### **Tab 2: Generate Questions**
**Input Components:**
- Learning Objectives JSON (auto-populated from Tab 1)
- Model selection
- Temperature setting
- Number of question generation runs (slider: 1-5, default: 1)
- Generate Questions button
**Output Components:**
- Status textbox
- Ranked Best-in-Group Questions (JSON)
- All Grouped Questions (JSON)
- Formatted Quiz (human-readable format)
**Event Handler:** `generate_questions()` from `question_handlers.py`
#### **Tab 3: Propose/Edit Question**
**Input Components:**
- Question guidance/feedback textbox
- Model selection
- Temperature setting
- Generate Question button
**Output Components:**
- Status textbox
- Generated Question (JSON)
**Event Handler:** `propose_question_handler()` from `feedback_handlers.py`
---
## Complete Workflow
### Phase 1: File Upload and Content Processing
#### Step 1.1: File Upload
User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface.
#### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)
```python
# Handles different input formats:
- List of file paths
- Single file path string
- File objects with .name attribute
```
#### Step 1.3: Content Processing (`ui/content_processor.py`)
**For Subtitle Files (`.vtt`, `.srt`):**
```python
1. Read file with UTF-8 encoding
2. Split into lines
3. Filter out:
- Empty lines
- Numeric timestamp indicators
- Lines containing '-->' (timestamps)
- 'WEBVTT' header lines
4. Combine remaining text lines
5. Wrap in XML tags: content
```
**For Jupyter Notebooks (`.ipynb`):**
```python
1. Validate JSON format
2. Parse with nbformat.read()
3. Extract from cells:
- Markdown cells: [Markdown]\n{content}
- Code cells: [Code]\n```python\n{content}\n```
4. Combine all cell content
5. Wrap in XML tags: content
```
**Error Handling:**
- Invalid JSON: Wraps raw content in code blocks
- Parsing failures: Falls back to plain text extraction
- All errors logged to console
#### Step 1.4: State Storage
Processed content stored in global state (`ui/state.py`):
```python
processed_file_contents = [tagged_content_1, tagged_content_2, ...]
```
### Phase 2: Learning Objective Generation
#### Step 2.1: Multi-Run Base Generation
**Process:** `objective_handlers._generate_multiple_runs()`
For each run (user-specified, typically 3 runs):
1. **Call:** `QuizGenerator.generate_base_learning_objectives()`
2. **Workflow:**
```
generate_base_learning_objectives()
↓
generate_base_learning_objectives_without_correct_answers()
→ Creates prompt with:
- BASE_LEARNING_OBJECTIVES_PROMPT
- BLOOMS_TAXONOMY_LEVELS
- LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
- Combined file contents
→ Calls OpenAI API with structured output
→ Returns List[BaseLearningObjectiveWithoutCorrectAnswer]
↓
generate_correct_answers_for_objectives()
→ For each objective:
- Creates prompt with objective and course content
- Calls OpenAI API (unstructured text response)
- Extracts correct answer
→ Returns List[BaseLearningObjective]
```
3. **ID Assignment:**
```python
# Temporary IDs by run:
Run 1: 1001, 1002, 1003
Run 2: 2001, 2002, 2003
Run 3: 3001, 3002, 3003
```
4. **Aggregation:**
All objectives from all runs combined into single list.
**Example:** 3 runs × 3 objectives = 9 total base objectives
#### Step 2.2: Grouping and Ranking
**Process:** `objective_handlers._group_base_objectives_add_incorrect_answers()`
**Step 2.2.1: Group Base Objectives**
```python
QuizGenerator.group_base_learning_objectives()
↓
learning_objective_generator/grouping_and_ranking.py
→ group_base_learning_objectives()
```
**Grouping Logic:**
1. Creates prompt containing:
- Original generation criteria
- All base objectives with IDs
- Course content for context
- Grouping instructions
2. **Special Rule:** All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective)
3. **LLM Call:**
- Model: `gpt-5-mini`
- Response format: `GroupedBaseLearningObjectivesResponse`
- Returns: Grouped objectives with metadata
4. **Output Structure:**
```python
{
"all_grouped": [all objectives with group metadata],
"best_in_group": [objectives marked as best in their groups]
}
```
**Step 2.2.2: ID Reassignment** (`_reassign_objective_ids()`)
```python
1. Find best objective from the 001 group
2. Assign it ID = 1
3. Assign remaining objectives IDs starting from 2
```
**Step 2.2.3: Generate Incorrect Answer Options**
Only for **best-in-group** objectives:
```python
QuizGenerator.generate_lo_incorrect_answer_options()
↓
learning_objective_generator/enhancement.py
→ generate_incorrect_answer_options()
```
**Process:**
1. For each best-in-group objective:
- Creates prompt with:
- Objective and correct answer
- INCORRECT_ANSWER_PROMPT guidelines
- INCORRECT_ANSWER_EXAMPLES
- Course content
- Calls OpenAI API (with optional model override)
- Generates 5 plausible incorrect answer options
2. **Returns:** `List[LearningObjective]` with incorrect_answer_options populated
**Step 2.2.4: Improve Incorrect Answers**
```python
learning_objective_generator.regenerate_incorrect_answers()
↓
learning_objective_generator/suggestion_improvement.py
```
**Quality Check Process:**
1. For each objective's incorrect answers:
- Checks for red flags (contradictory phrases, absolute terms)
- Examples of red flags:
- "but not necessarily"
- "at the expense of"
- "rather than"
- "always", "never", "exclusively"
2. If problems found:
- Logs issue to `incorrect_suggestion_debug/` directory
- Regenerates incorrect answers with additional constraints
- Updates objective with improved answers
**Step 2.2.5: Final Assembly**
Creates final list where:
- Best-in-group objectives have enhanced incorrect answers
- Non-best-in-group objectives have empty `incorrect_answer_options: []`
#### Step 2.3: Display Results
**Three output formats:**
1. **Best-in-Group Objectives** (primary output):
- Only objectives marked as best_in_group
- Includes incorrect answer options
- Sorted by ID
- Formatted as JSON
2. **All Grouped Objectives**:
- All objectives with grouping metadata
- Shows group_members arrays
- Best-in-group flags visible
3. **Raw Ungrouped** (debug):
- Original objectives from all runs
- No grouping metadata
- Original temporary IDs
#### Step 2.4: State Update
```python
set_learning_objectives(grouped_result["all_grouped"])
set_processed_contents(file_contents) # Already set, but persisted
```
### Phase 3: Question Generation
#### Step 3.1: Parse Learning Objectives
**Process:** `question_handlers._parse_learning_objectives()`
```python
1. Parse JSON from Tab 1 output
2. Create LearningObjective objects from dictionaries
3. Validate required fields
4. Return List[LearningObjective]
```
#### Step 3.2: Multi-Run Question Generation
**Process:** `question_handlers._generate_questions_multiple_runs()`
For each run (user-specified, typically 1 run):
```python
QuizGenerator.generate_questions_in_parallel()
↓
quiz_generator/assessment.py
→ generate_questions_in_parallel()
```
**Parallel Generation Process:**
1. **Thread Pool Setup:**
```python
max_workers = min(len(learning_objectives), 5)
ThreadPoolExecutor(max_workers=max_workers)
```
2. **For Each Learning Objective (in parallel):**
**Step 3.2.1: Question Generation** (`quiz_generator/question_generation.py`)
```python
generate_multiple_choice_question()
```
**a) Source Content Matching:**
```python
- Extract source_reference from objective
- Search file_contents for matching XML tags
- Exact match:
- Fallback: Partial filename match
- Last resort: Use all file contents combined
```
**b) Multi-Source Handling:**
```python
if len(source_references) > 1:
Add special instruction:
"Question should synthesize information across sources"
```
**c) Prompt Construction:**
```python
Combines:
- Learning objective
- Correct answer
- Incorrect answer options from objective
- GENERAL_QUALITY_STANDARDS
- MULTIPLE_CHOICE_STANDARDS
- EXAMPLE_QUESTIONS
- QUESTION_SPECIFIC_QUALITY_STANDARDS
- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
- ANSWER_FEEDBACK_QUALITY_STANDARDS
- Matched course content
```
**d) API Call:**
```python
- Model: User-selected (default: gpt-5)
- Temperature: User-selected (if supported by model)
- Response format: MultipleChoiceQuestion
- Returns: Question with 4 options, each with feedback
```
**e) Post-Processing:**
```python
- Set question ID = learning_objective ID
- Verify all options have feedback
- Add default feedback if missing
```
**Step 3.2.2: Quality Assessment** (`quiz_generator/question_improvement.py`)
```python
judge_question_quality()
```
**Quality Judging Process:**
```python
1. Creates evaluation prompt with:
- Question text and all options
- Quality criteria from prompts
- Evaluation instructions
2. LLM evaluates question for:
- Clarity and unambiguity
- Alignment with learning objective
- Quality of incorrect options
- Feedback quality
- Appropriate difficulty
3. Returns:
- approved: bool
- feedback: str (reasoning for judgment)
4. Updates question:
question.approved = approved
question.judge_feedback = feedback
```
3. **Results Collection:**
```python
- Questions collected as futures complete
- IDs assigned sequentially across runs
- All questions aggregated into single list
```
**Example:** 3 objectives × 1 run = 3 questions generated in parallel
#### Step 3.3: Grouping Questions
**Process:** `quiz_generator/question_ranking.py → group_questions()`
```python
1. Creates prompt with:
- All generated questions
- Grouping instructions
- Example format
2. LLM identifies:
- Questions testing same concept (same learning_objective_id)
- Groups of similar questions
- Best question in each group
3. Model: gpt-5-mini
Response format: GroupedMultipleChoiceQuestionsResponse
4. Returns:
{
"grouped": [all questions with group metadata],
"best_in_group": [best questions from each group]
}
```
#### Step 3.4: Ranking Questions
**Process:** `quiz_generator/question_ranking.py → rank_questions()`
**Only ranks best-in-group questions:**
```python
1. Creates prompt with:
- RANK_QUESTIONS_PROMPT
- All quality standards
- Best-in-group questions only
- Course content for context
2. Ranking Criteria:
- Question clarity and unambiguity
- Alignment with learning objective
- Quality of incorrect options
- Feedback quality
- Appropriate difficulty (prefers simple English)
- Adherence to all guidelines
- Avoidance of absolute terms
3. Special Instructions:
- NEVER change question with ID=1
- Each question gets unique rank (2, 3, 4, ...)
- Rank 1 is reserved
- All questions must be returned
4. Model: User-selected
Response format: RankedMultipleChoiceQuestionsResponse
5. Returns:
{
"ranked": [questions with rank and ranking_reasoning]
}
```
#### Step 3.5: Format Results
**Process:** `question_handlers._format_question_results()`
**Three outputs:**
1. **Best-in-Group Ranked Questions:**
```python
- Sorted by rank
- Includes all question data
- Includes rank and ranking_reasoning
- Includes group metadata
- Formatted as JSON
```
2. **All Grouped Questions:**
```python
- All questions with group metadata
- No ranking information
- Shows which questions are in groups
- Formatted as JSON
```
3. **Formatted Quiz:**
```python
format_quiz_for_ui() creates human-readable format:
**Question 1 [Rank: 2]:** What is...
Ranking Reasoning: ...
• A [Correct]: Option text
◦ Feedback: Correct feedback
• B: Option text
◦ Feedback: Incorrect feedback
[continues for all questions]
```
### Phase 4: Custom Question Generation (Optional)
**Tab 3 Workflow:**
#### Step 4.1: User Input
User provides:
- Free-form guidance/feedback text
- Model selection
- Temperature setting
#### Step 4.2: Generation
**Process:** `feedback_handlers.propose_question_handler()`
```python
QuizGenerator.generate_multiple_choice_question_from_feedback()
↓
quiz_generator/feedback_questions.py
```
**Workflow:**
```python
1. Retrieves processed file contents from state
2. Creates prompt combining:
- User feedback/guidance
- All quality standards
- Course content
- Generation criteria
3. Model generates:
- Single question
- With learning objective inferred from guidance
- 4 options with feedback
- Source references
4. Returns: MultipleChoiceQuestionFromFeedback object
(includes user feedback as metadata)
5. Formatted as JSON for display
```
### Phase 5: Assessment Export (Automated)
The final assessment can be saved using:
```python
QuizGenerator.save_assessment_to_json()
↓
quiz_generator/assessment.py → save_assessment_to_json()
```
**Process:**
```python
1. Convert Assessment object to dictionary
assessment_dict = assessment.model_dump()
2. Write to JSON file with indent=2
Default filename: "assessment.json"
3. Contains:
- All learning objectives (best-in-group)
- All ranked questions
- Complete metadata
```
---
## Detailed Component Functionality
### Content Processor (`ui/content_processor.py`)
**Class: `ContentProcessor`**
**Methods:**
1. **`process_files(file_paths: List[str]) -> List[str]`**
- Main entry point for processing multiple files
- Returns list of XML-tagged content strings
- Stores results in `self.file_contents`
2. **`process_file(file_path: str) -> List[str]`**
- Routes to appropriate handler based on file extension
- Returns single-item list with tagged content
3. **`_process_subtitle_file(file_path: str) -> List[str]`**
- Filters out timestamps and metadata
- Preserves actual subtitle text
- Wraps in `` tags
4. **`_process_notebook_file(file_path: str) -> List[str]`**
- Validates JSON structure
- Parses with nbformat
- Extracts markdown and code cells
- Falls back to raw text on parsing errors
- Wraps in `` tags
### Learning Objective Generator (`learning_objective_generator/`)
#### **generator.py - LearningObjectiveGenerator Class**
**Orchestrator that delegates to specialized modules:**
**Methods:**
1. **`generate_base_learning_objectives()`**
- Delegates to `base_generation.py`
- Returns base objectives with correct answers
2. **`group_base_learning_objectives()`**
- Delegates to `grouping_and_ranking.py`
- Groups similar objectives
- Identifies best in each group
3. **`generate_incorrect_answer_options()`**
- Delegates to `enhancement.py`
- Adds 5 incorrect answer suggestions per objective
4. **`regenerate_incorrect_answers()`**
- Delegates to `suggestion_improvement.py`
- Quality-checks and improves incorrect answers
5. **`generate_and_group_learning_objectives()`**
- Complete workflow method
- Combines: base generation → grouping → incorrect answers
- Returns dict with all_grouped and best_in_group
#### **base_generation.py**
**Key Functions:**
**`generate_base_learning_objectives()`**
- Wrapper that calls two separate functions
- First: Generate objectives without correct answers
- Second: Generate correct answers for those objectives
**`generate_base_learning_objectives_without_correct_answers()`**
**Process:**
```python
1. Extract source filenames from XML tags
2. Combine all file contents
3. Create prompt with:
- BASE_LEARNING_OBJECTIVES_PROMPT
- BLOOMS_TAXONOMY_LEVELS
- LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
- Course content
4. API call:
- Model: User-selected
- Temperature: User-selected (if supported)
- Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse
5. Post-process:
- Assign sequential IDs
- Normalize source_reference (extract basenames)
6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer]
```
**`generate_correct_answers_for_objectives()`**
**Process:**
```python
1. For each objective without answer:
- Create prompt with objective + course content
- Call OpenAI API (text response, not structured)
- Extract correct answer
- Create BaseLearningObjective with answer
2. Error handling: Add "[Error generating correct answer]" on failure
3. Returns: List[BaseLearningObjective]
```
**Quality Guidelines in Prompt:**
- Objectives must be assessable via multiple-choice
- Start with action verbs (identify, describe, define, list, compare)
- One goal per objective
- Derived directly from course content
- Tool/framework agnostic (focus on principles, not specific implementations)
- First objective should be relatively easy recall question
- Avoid objectives about "building" or "creating" (not MC-assessable)
#### **grouping_and_ranking.py**
**Key Functions:**
**`group_base_learning_objectives()`**
**Process:**
```python
1. Format objectives for display in prompt
2. Create grouping prompt with:
- Original generation criteria
- All base objectives
- Course content
- Grouping instructions
3. Special rule:
- All objectives with IDs ending in 1 grouped together
- Best one selected from this group
- Will become primary objective (ID=1)
4. API call:
- Model: "gpt-5-mini" (hardcoded for efficiency)
- Response format: GroupedBaseLearningObjectivesResponse
5. Post-process:
- Normalize best_in_group to Python bool
- Filter for best-in-group objectives
6. Returns:
{
"all_grouped": List[GroupedBaseLearningObjective],
"best_in_group": List[GroupedBaseLearningObjective]
}
```
**Grouping Criteria:**
- Topic overlap
- Similarity of concepts
- Quality based on original generation criteria
- Clarity and specificity
- Alignment with course content
#### **enhancement.py**
**Key Function: `generate_incorrect_answer_options()`**
**Process:**
```python
1. For each base objective:
- Create prompt with:
- Learning objective and correct answer
- INCORRECT_ANSWER_PROMPT (detailed guidelines)
- INCORRECT_ANSWER_EXAMPLES
- Course content
- Request 5 plausible incorrect options
2. API call:
- Model: model_override or default
- Temperature: User-selected (if supported)
- Response format: LearningObjective (includes incorrect_answer_options)
3. Returns: List[LearningObjective] with all fields populated
```
**Incorrect Answer Quality Principles:**
- Create common misunderstandings
- Maintain identical structure to correct answer
- Use course terminology correctly but in wrong contexts
- Include partially correct information
- Avoid obviously wrong answers
- Mirror detail level and style of correct answer
- Avoid absolute terms ("always", "never", "exclusively")
- Avoid contradictory second clauses
#### **suggestion_improvement.py**
**Key Function: `regenerate_incorrect_answers()`**
**Process:**
```python
1. For each learning objective:
- Call should_regenerate_incorrect_answers()
2. should_regenerate_incorrect_answers():
- Creates evaluation prompt with:
- Objective and all incorrect options
- IMMEDIATE_RED_FLAGS checklist
- RULES_FOR_SECOND_CLAUSES
- LLM evaluates each option
- Returns: needs_regeneration: bool
3. If regeneration needed:
- Logs to incorrect_suggestion_debug/{id}.txt
- Creates new prompt with additional constraints
- Regenerates incorrect answers
- Validates again
4. Returns: List[LearningObjective] with improved incorrect answers
```
**Red Flags Checked:**
- Contradictory second clauses ("but not necessarily")
- Explicit negations ("without automating")
- Opposite descriptions ("fixed steps" for flexible systems)
- Absolute/comparative terms
- Hedging that creates limitations
- Trade-off language creating false dichotomies
### Quiz Generator (`quiz_generator/`)
#### **generator.py - QuizGenerator Class**
**Orchestrator with LearningObjectiveGenerator embedded:**
**Initialization:**
```python
def __init__(self, api_key, model="gpt-5", temperature=1.0):
self.client = OpenAI(api_key=api_key)
self.model = model
self.temperature = temperature
self.learning_objective_generator = LearningObjectiveGenerator(
api_key=api_key, model=model, temperature=temperature
)
```
**Methods (delegates to specialized modules):**
1. **`generate_base_learning_objectives()`** → delegates to LearningObjectiveGenerator
2. **`generate_lo_incorrect_answer_options()`** → delegates to LearningObjectiveGenerator
3. **`group_base_learning_objectives()`** → delegates to grouping_and_ranking.py
4. **`generate_multiple_choice_question()`** → delegates to question_generation.py
5. **`generate_questions_in_parallel()`** → delegates to assessment.py
6. **`group_questions()`** → delegates to question_ranking.py
7. **`rank_questions()`** → delegates to question_ranking.py
8. **`judge_question_quality()`** → delegates to question_improvement.py
9. **`regenerate_incorrect_answers()`** → delegates to question_improvement.py
10. **`generate_multiple_choice_question_from_feedback()`** → delegates to feedback_questions.py
11. **`save_assessment_to_json()`** → delegates to assessment.py
#### **question_generation.py**
**Key Function: `generate_multiple_choice_question()`**
**Detailed Process:**
**1. Source Content Matching:**
```python
source_references = learning_objective.source_reference
if isinstance(source_references, str):
source_references = [source_references]
combined_content = ""
for source_file in source_references:
# Try exact match:
for file_content in file_contents:
if f"" in file_content:
combined_content += file_content
break
# Fallback: partial match
if not found:
for file_content in file_contents:
if source_file in file_content:
combined_content += file_content
break
# Last resort: use all content
if not combined_content:
combined_content = "\n\n".join(file_contents)
```
**2. Multi-Source Instruction:**
```python
if len(source_references) > 1:
Add special instruction:
"This learning objective spans multiple sources.
Your question should:
1. Synthesize information across these sources
2. Test understanding of overarching themes
3. Require knowledge from multiple sources"
```
**3. Prompt Construction:**
Combines extensive quality standards:
```python
- Learning objective
- Correct answer
- Incorrect answer options from objective
- GENERAL_QUALITY_STANDARDS
- MULTIPLE_CHOICE_STANDARDS
- EXAMPLE_QUESTIONS
- QUESTION_SPECIFIC_QUALITY_STANDARDS
- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
- ANSWER_FEEDBACK_QUALITY_STANDARDS
- Multi-source instruction (if applicable)
- Matched course content
```
**4. API Call:**
```python
params = {
"model": model,
"messages": [
{"role": "system", "content": "Expert educational assessment creator"},
{"role": "user", "content": prompt}
],
"response_format": MultipleChoiceQuestion
}
if not TEMPERATURE_UNAVAILABLE.get(model, True):
params["temperature"] = temperature
response = client.beta.chat.completions.parse(**params)
```
**5. Post-Processing:**
```python
- Set response.id = learning_objective.id
- Set response.learning_objective_id = learning_objective.id
- Set response.learning_objective = learning_objective.learning_objective
- Set response.source_reference = learning_objective.source_reference
- Verify all options have feedback
- Add default feedback if missing
```
**6. Error Handling:**
```python
On exception:
- Create fallback question with 4 generic options
- Include error message in question_text
- Mark as questionable quality
```
#### **question_ranking.py**
**Key Functions:**
**`group_questions(questions, file_contents)`**
**Process:**
```python
1. Create prompt with:
- GROUP_QUESTIONS_PROMPT
- All questions with complete data
- Grouping instructions
2. Grouping Logic:
- Questions with same learning_objective_id are similar
- Group by topic overlap
- Mark best_in_group within each group
- Single-member groups: best_in_group = true by default
3. API call:
- Model: User-selected
- Response format: GroupedMultipleChoiceQuestionsResponse
4. Critical Instructions:
- MUST return ALL questions
- Each question must have group metadata
- best_in_group set appropriately
5. Returns:
{
"grouped": List[GroupedMultipleChoiceQuestion],
"best_in_group": [questions where best_in_group=true]
}
```
**`rank_questions(questions, file_contents)`**
**Process:**
```python
1. Create prompt with:
- RANK_QUESTIONS_PROMPT
- ALL quality standards (comprehensive)
- Best-in-group questions only
- Course content
2. Ranking Criteria (from prompt):
- Question clarity and unambiguity
- Alignment with learning objective
- Quality of incorrect options
- Feedback quality
- Appropriate difficulty (simple English preferred)
- Adherence to all guidelines
- Avoidance of problematic words/phrases
3. Special Instructions:
- DO NOT change question with ID=1
- Rank starting from 2 (rank 1 reserved)
- Each question gets unique rank
- Must return ALL questions
4. API call:
- Model: User-selected
- Response format: RankedMultipleChoiceQuestionsResponse
5. Returns:
{
"ranked": List[RankedMultipleChoiceQuestion]
(includes rank and ranking_reasoning for each)
}
```
**Simple vs Complex English Examples (from ranking criteria):**
```
Simple: "AI engineers create computer programs that can learn from data"
Complex: "AI engineering practitioners architect computational paradigms
exhibiting autonomous erudition capabilities"
```
#### **question_improvement.py**
**Key Functions:**
**`judge_question_quality(client, model, temperature, question)`**
**Process:**
```python
1. Create evaluation prompt with:
- Question text
- All options with feedback
- Quality criteria
- Evaluation instructions
2. LLM evaluates:
- Clarity and lack of ambiguity
- Alignment with learning objective
- Quality of distractors (incorrect options)
- Feedback quality and helpfulness
- Appropriate difficulty level
- Adherence to all standards
3. API call:
- Unstructured text response
- LLM returns: APPROVED or NOT APPROVED + reasoning
4. Parsing:
approved = "APPROVED" in response.upper()
feedback = full response text
5. Returns: (approved: bool, feedback: str)
```
**`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`**
**Process:**
```python
1. Extract incorrect options from question
2. Create evaluation prompt with:
- Each incorrect option
- IMMEDIATE_RED_FLAGS checklist
- Course content for context
3. LLM checks each option for:
- Contradictory second clauses
- Explicit negations
- Absolute terms
- Opposite descriptions
- Trade-off language
4. Returns: needs_regeneration: bool
5. If true:
- Log to wrong_answer_debug/ directory
- Provides detailed feedback on issues
```
**`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`**
**Process:**
```python
1. For each question:
- Check if regeneration needed
- If yes:
a. Create new prompt with stricter constraints
b. Include original question for context
c. Add specific rules about avoiding red flags
d. Regenerate options
e. Validate again
- If no: keep original
2. Returns: List of questions with improved incorrect answers
```
#### **feedback_questions.py**
**Key Function: `generate_multiple_choice_question_from_feedback()`**
**Process:**
```python
1. Accept user feedback/guidance as free-form text
2. Create prompt combining:
- User feedback
- All quality standards
- Course content
- Standard generation criteria
3. LLM infers:
- Learning objective from feedback
- Appropriate question
- 4 options with feedback
- Source references
4. API call:
- Model: User-selected
- Response format: MultipleChoiceQuestionFromFeedback
5. Includes user feedback as metadata in response
6. Returns: Single question object
```
#### **assessment.py**
**Key Functions:**
**`generate_questions_in_parallel()`**
**Parallel Processing Details:**
```python
1. Setup:
max_workers = min(len(learning_objectives), 5)
# Limits to 5 concurrent threads
2. Thread Pool Executor:
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
3. For each objective (in separate thread):
Worker function:
def generate_question_for_objective(objective, idx):
- Generate question
- Judge quality
- Update with approval and feedback
- Handle errors gracefully
- Return complete question
4. Submit all tasks:
future_to_idx = {
executor.submit(generate_question_for_objective, obj, i): i
for i, obj in enumerate(learning_objectives)
}
5. Collect results as completed:
for future in concurrent.futures.as_completed(future_to_idx):
question = future.result()
questions.append(question)
print progress
6. Error handling:
- Individual failures don't stop other threads
- Placeholder questions created on error
- All errors logged
7. Returns: List[MultipleChoiceQuestion] with quality judgments
```
**`save_assessment_to_json(assessment, output_path)`**
```python
1. Convert Pydantic model to dict:
assessment_dict = assessment.model_dump()
2. Write to JSON file:
with open(output_path, "w") as f:
json.dump(assessment_dict, f, indent=2)
3. File contains:
{
"learning_objectives": [...],
"questions": [...]
}
```
### State Management (`ui/state.py`)
**Global State Variables:**
```python
processed_file_contents = [] # List of XML-tagged content strings
generated_learning_objectives = [] # List of learning objective objects
```
**Functions:**
- `get_processed_contents()` → retrieves file contents
- `set_processed_contents(contents)` → stores file contents
- `get_learning_objectives()` → retrieves objectives
- `set_learning_objectives(objectives)` → stores objectives
- `clear_state()` → resets both variables
**Purpose:**
- Persists data between UI tabs
- Allows Tab 2 to access content processed in Tab 1
- Allows Tab 3 to access content for custom questions
- Enables regeneration with feedback
### UI Handlers
#### **objective_handlers.py**
**`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`**
**Complete Workflow:**
```python
1. Validate inputs (files exist, API key present)
2. Extract file paths from Gradio file objects
3. Process files → get XML-tagged content
4. Store in state
5. Create QuizGenerator
6. Generate multiple runs of base objectives
7. Group and rank objectives
8. Generate incorrect answers for best-in-group
9. Improve incorrect answers
10. Reassign IDs (best from 001 group → ID=1)
11. Format results for display
12. Store in state
13. Return 4 outputs: status, best-in-group, all-grouped, raw
```
**`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`**
**Workflow:**
```python
1. Retrieve processed contents from state
2. Append feedback to content:
file_contents_with_feedback.append(f"FEEDBACK: {feedback}")
3. Generate new objectives with feedback context
4. Group and rank
5. Return regenerated objectives
```
**`_reassign_objective_ids(grouped_objectives)`**
**ID Assignment Logic:**
```python
1. Find all objectives with IDs ending in 001 (1001, 2001, etc.)
2. Identify their groups
3. Find best_in_group objective from these groups
4. Assign it ID = 1
5. Assign all other objectives sequential IDs starting from 2
```
**`_format_objective_results(grouped_result, all_learning_objectives)`**
**Formatting:**
```python
1. Sort by ID
2. Create dictionaries from Pydantic objects
3. Include all metadata fields
4. Convert to JSON with indent=2
5. Return 3 formatted outputs + status message
```
#### **question_handlers.py**
**`generate_questions(objectives_json, model_name, temperature, num_runs)`**
**Complete Workflow:**
```python
1. Validate inputs
2. Parse objectives JSON → create LearningObjective objects
3. Retrieve processed contents from state
4. Create QuizGenerator
5. Generate questions (multiple runs in parallel)
6. Group questions by similarity
7. Rank best-in-group questions
8. Optionally improve incorrect answers (currently commented out)
9. Format results
10. Return 4 outputs: status, best-ranked, all-grouped, formatted
```
**`_generate_questions_multiple_runs()`**
```python
For each run:
1. Call generate_questions_in_parallel()
2. Assign unique IDs across runs:
start_id = len(all_questions) + 1
for i, q in enumerate(run_questions):
q.id = start_id + i
3. Aggregate all questions
```
**`_group_and_rank_questions()`**
```python
1. Group all questions → get grouped and best_in_group
2. Rank only best_in_group questions
3. Return:
{
"grouped": all with group metadata,
"best_in_group_ranked": best with ranks
}
```
#### **feedback_handlers.py**
**`propose_question_handler(guidance, model_name, temperature)`**
**Workflow:**
```python
1. Validate state (processed contents available)
2. Create QuizGenerator
3. Call generate_multiple_choice_question_from_feedback()
- Passes user guidance and course content
- LLM infers learning objective
- Generates complete question
4. Format as JSON
5. Return status and question JSON
```
### Formatting Utilities (`ui/formatting.py`)
**`format_quiz_for_ui(questions_json)`**
**Process:**
```python
1. Parse JSON to list of question dictionaries
2. Sort by rank if available
3. For each question:
- Add header: "**Question N [Rank: X]:** {question_text}"
- Add ranking reasoning if available
- For each option:
- Add letter (A, B, C, D)
- Mark correct option
- Include option text
- Include feedback indented
4. Return formatted string with markdown
```
**Output Example:**
```
**Question 1 [Rank: 2]:** What is the primary purpose of AI agents?
Ranking Reasoning: Clear question that tests fundamental understanding...
• A [Correct]: To automate tasks and make decisions
◦ Feedback: Correct! AI agents are designed to automate tasks...
• B: To replace human workers entirely
◦ Feedback: While AI agents can automate tasks, they are not...
[continues...]
```
---
## Quality Standards and Prompts
### Learning Objectives Quality Standards
**From `prompts/learning_objectives.py`:**
**BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:**
1. **Assessability:**
- Must be testable via multiple-choice questions
- Cannot be about "building", "creating", "developing"
- Should use verbs like: identify, list, describe, define, compare
2. **Specificity:**
- One goal per objective
- Don't combine multiple action verbs
- Example of what NOT to do: "identify X and explain Y"
3. **Source Alignment:**
- Derived DIRECTLY from course content
- No topics not covered in content
- Appropriate difficulty level for course
4. **Independence:**
- Each objective stands alone
- No dependencies on other objectives
- No context required from other objectives
5. **Focus:**
- Address "why" over "what" when possible
- Critical knowledge over trivial facts
- Principles over specific implementation details
6. **Tool/Framework Agnosticism:**
- Don't mention specific tools/frameworks
- Focus on underlying principles
- Example: Don't ask about "Pandas DataFrame methods",
ask about "data filtering concepts"
7. **First Objective Rule:**
- Should be relatively easy recall question
- Address main topic/concept of course
- Format: "Identify what X is" or "Explain why X is important"
8. **Answer Length:**
- Aim for ≤20 words in correct answer
- Avoid unnecessary elaboration
- No compound sentences with extra consequences
**BLOOMS_TAXONOMY_LEVELS:**
Levels from lowest to highest:
- **Recall:** Retention of key concepts (not trivialities)
- **Comprehension:** Connect ideas, demonstrate understanding
- **Application:** Apply concept to new but similar scenario
- **Analysis:** Examine parts, determine relationships, make inferences
- **Evaluation:** Make judgments requiring critical thinking
**LEARNING_OBJECTIVE_EXAMPLES:**
Includes 7 high-quality examples with:
- Appropriate action verbs
- Clear learning objectives
- Concise correct answers (mostly <20 words)
- Multiple source references
- Framework-agnostic language
### Question Quality Standards
**From `prompts/questions.py`:**
**GENERAL_QUALITY_STANDARDS:**
- Overall goal: Set learner up for success
- Perfect score attainable for thoughtful students
- Aligned with course content
- Aligned with learning objective and correct answer
- No references to manual intervention (software/AI course)
**MULTIPLE_CHOICE_STANDARDS:**
- **EXACTLY ONE** correct answer per question
- Clear, unambiguous correct answer
- Plausible distractors representing common misconceptions
- Not obviously wrong distractors
- All options similar length and detail
- Mutually exclusive options
- Avoid "all/none of the above"
- Typically 4 options (A, B, C, D)
- Don't start feedback with "Correct" or "Incorrect"
**QUESTION_SPECIFIC_QUALITY_STANDARDS:**
Questions must:
- Match language and tone of course
- Match difficulty level of course
- Assess only course information
- Not teach as part of quiz
- Use clear, concise language
- Not induce confusion
- Provide slight (not major) challenge
- Be easily interpreted and unambiguous
- Have proper grammar and sentence structure
- Be thoughtful and specific (not broad and ambiguous)
- Be complete in wording (understanding question shouldn't be part of assessment)
**CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
Correct answers must:
- Be factually correct and unambiguous
- Match course language and tone
- Be complete sentences
- Match course difficulty level
- Contain only course information
- Not teach during quiz
- Use clear, concise language
- Be thoughtful and specific
- Be complete (identifying correct answer shouldn't require interpretation)
**INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
Incorrect answers should:
- Represent reasonable potential misconceptions
- Sound plausible to non-experts
- Require thought even from diligent learners
- Not be obviously wrong
- Use incorrect_answer_suggestions from objective (as starting point)
**Avoid:**
- Obviously wrong options anyone can eliminate
- Absolute terms: "always", "never", "only", "exclusively"
- Phrases like "used exclusively for scenarios where..."
**ANSWER_FEEDBACK_QUALITY_STANDARDS:**
**For Incorrect Answers:**
- Be informational and encouraging (not punitive)
- Single sentence, concise
- Do NOT say "Incorrect" or "Wrong"
**For Correct Answers:**
- Be informational and encouraging
- Single sentence, concise
- Do NOT say "Correct!" (redundant after "Correct: " prefix)
### Incorrect Answer Generation Guidelines
**From `prompts/incorrect_answers.py`:**
**Core Principles:**
1. **Create Common Misunderstandings:**
- Represent how students actually misunderstand
- Confuse related concepts
- Mix up terminology
2. **Maintain Identical Structure:**
- Match grammatical pattern of correct answer
- Same length and complexity
- Same formatting style
3. **Use Course Terminology Correctly but in Wrong Contexts:**
- Apply correct terms incorrectly
- Confuse with related concepts
- Example: Describe backpropagation but actually describe forward propagation
4. **Include Partially Correct Information:**
- First part correct, second part wrong
- Correct process but wrong application
- Correct concept but incomplete
5. **Avoid Obviously Wrong Answers:**
- No contradictions with basic knowledge
- Not immediately eliminable
- Require course knowledge to reject
6. **Mirror Detail Level and Style:**
- Match technical depth
- Match tone
- Same level of specificity
7. **For Lists, Maintain Consistency:**
- Same number of items
- Same format
- Mix some correct with incorrect items
8. **AVOID ABSOLUTE TERMS:**
- "always", "never", "exclusively", "primarily"
- "all", "every", "none", "nothing", "only"
- "must", "required", "impossible"
- "rather than", "as opposed to", "instead of"
**IMMEDIATE_RED_FLAGS** (triggers regeneration):
**Contradictory Second Clauses:**
- "but not necessarily"
- "at the expense of"
- "rather than [core concept]"
- "ensuring X rather than Y"
- "without necessarily"
- "but has no impact on"
- "but cannot", "but prevents", "but limits"
**Explicit Negations:**
- "without automating", "without incorporating"
- "preventing [main benefit]"
- "limiting [main capability]"
**Opposite Descriptions:**
- "fixed steps" (for flexible systems)
- "manual intervention" (for automation)
- "simple question answering" (for complex processing)
**Hedging Creating Limitations:**
- "sometimes", "occasionally", "might"
- "to some extent", "partially", "somewhat"
**INCORRECT_ANSWER_EXAMPLES:**
Includes 10 detailed examples showing:
- Learning objective
- Correct answer
- 3 plausible incorrect suggestions
- Explanation of why each is plausible but wrong
- Consistent formatting across all options
### Ranking and Grouping
**RANK_QUESTIONS_PROMPT:**
**Criteria:**
1. Question clarity and unambiguity
2. Alignment with learning objective
3. Quality of incorrect options
4. Quality of feedback
5. Appropriate difficulty (simple English preferred)
6. Adherence to all guidelines
**Critical Instructions:**
- DO NOT change question with ID=1
- Rank starting from 2
- Each question unique rank
- Must return ALL questions
- No omissions
- No duplicate ranks
**Simple vs Complex English:**
```
Simple: "AI engineers create computer programs that learn from data"
Complex: "AI engineering practitioners architect computational paradigms
exhibiting autonomous erudition capabilities"
```
**GROUP_QUESTIONS_PROMPT:**
**Grouping Logic:**
- Questions with same learning_objective_id are similar
- Identify topic overlap
- Mark best_in_group within each group
- Single-member groups: best_in_group = true
**Critical Instructions:**
- Must return ALL questions
- Each question needs group metadata
- No omissions
- Best in group marked appropriately
---
## Summary of Data Flow
### Complete End-to-End Flow
```
User Uploads Files
↓
ContentProcessor extracts and tags content
↓
[Stored in global state]
↓
Generate Base Objectives (multiple runs)
↓
Group Base Objectives (by similarity)
↓
Generate Incorrect Answers (for best-in-group only)
↓
Improve Incorrect Answers (quality check)
↓
Reassign IDs (best from 001 group → ID=1)
↓
[Objectives displayed in UI, stored in state]
↓
Generate Questions (parallel, multiple runs)
↓
Judge Question Quality (parallel)
↓
Group Questions (by similarity)
↓
Rank Questions (best-in-group only)
↓
[Questions displayed in UI]
↓
Format for Display
↓
Export to JSON (optional)
```
### Key Optimization Strategies
1. **Multiple Generation Runs:**
- Generates variety of objectives/questions
- Grouping identifies best versions
- Reduces risk of poor quality individual outputs
2. **Hierarchical Processing:**
- Generate base → Group → Enhance → Improve
- Only enhances best candidates (saves API calls)
- Progressive refinement
3. **Parallel Processing:**
- Questions generated concurrently (up to 5 threads)
- Significant time savings for multiple objectives
- Independent evaluations
4. **Quality Gating:**
- LLM judges question quality
- Checks for red flags in incorrect answers
- Regenerates problematic content
5. **Source Tracking:**
- XML tags preserve origin
- Questions link back to source materials
- Enables accurate content matching
6. **Modular Prompts:**
- Reusable quality standards
- Consistent across all generations
- Easy to update centrally
---
## Configuration and Customization
### Available Models
**Configured in `models/config.py`:**
```python
MODELS = [
"o3-mini", "o1", # Reasoning models (no temperature)
"gpt-4.1", "gpt-4o", # GPT-4 variants
"gpt-4o-mini", "gpt-4",
"gpt-3.5-turbo", # Legacy
"gpt-5", # Latest (no temperature)
"gpt-5-mini", # Efficient (no temperature)
"gpt-5-nano" # Ultra-efficient (no temperature)
]
```
**Temperature Support:**
- Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature
- Other models: Temperature 0.0 to 1.0
**Model Selection Strategy:**
- **Base objectives:** User-selected (default: gpt-5)
- **Grouping:** Hardcoded gpt-5-mini (efficiency)
- **Incorrect answers:** Separate user selection (default: gpt-5)
- **Questions:** User-selected (default: gpt-5)
- **Quality judging:** User-selected or gpt-5-mini
### Environment Variables
**Required:**
```
OPENAI_API_KEY=your_api_key_here
```
**Configured via `.env` file in project root**
### Customization Points
1. **Quality Standards:**
- Edit `prompts/learning_objectives.py`
- Edit `prompts/questions.py`
- Edit `prompts/incorrect_answers.py`
- Changes apply to all future generations
2. **Example Questions/Objectives:**
- Modify LEARNING_OBJECTIVE_EXAMPLES
- Modify EXAMPLE_QUESTIONS
- Modify INCORRECT_ANSWER_EXAMPLES
- LLM learns from these examples
3. **Generation Parameters:**
- Number of objectives per run
- Number of runs (variety)
- Temperature (creativity vs consistency)
- Model selection (quality vs cost/speed)
4. **Parallel Processing:**
- `max_workers` in assessment.py
- Currently: min(len(objectives), 5)
- Adjust for your rate limits
5. **Output Formats:**
- Modify `formatting.py` for display
- Assessment JSON structure in `models/assessment.py`
---
## Error Handling and Resilience
### Content Processing Errors
- **Invalid JSON notebooks:** Falls back to raw text
- **Parse failures:** Wraps in code blocks, continues
- **Missing files:** Logged, skipped
- **Encoding issues:** UTF-8 fallback
### Generation Errors
- **API failures:** Logged with traceback
- **Structured output parse errors:** Fallback responses created
- **Missing required fields:** Default values assigned
- **Validation errors:** Caught and logged
### Parallel Processing Errors
- **Individual thread failures:** Don't stop other threads
- **Placeholder questions:** Created on error
- **Complete error details:** Logged for debugging
- **Graceful degradation:** Partial results returned
### Quality Check Failures
- **Regeneration failures:** Original kept with warning
- **Judge unavailable:** Questions marked unapproved
- **Validation failures:** Detailed logs in debug directories
---
## Debug and Logging
### Debug Directories
1. **`incorrect_suggestion_debug/`**
- Created during objective enhancement
- Contains logs of problematic incorrect answers
- Format: `{objective_id}.txt`
- Includes: Original suggestions, identified issues, regeneration attempts
2. **`wrong_answer_debug/`**
- Created during question improvement
- Logs question-level incorrect answer issues
- Regeneration history
### Console Logging
**Extensive logging throughout:**
- File processing status
- Generation progress (run numbers)
- Parallel thread activity (thread IDs)
- API call results
- Error messages with tracebacks
- Timing information (start/end times)
**Example Log Output:**
```
DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt']
DEBUG - Found source file: file1.vtt
Generating 3 learning objectives from 3 files
Successfully generated 3 learning objectives without correct answers
Generated correct answer for objective 1
Grouping 9 base learning objectives
Received 9 grouped results
Generating incorrect answer options only for best-in-group objectives...
PARALLEL: Starting ThreadPoolExecutor with 3 workers
PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective...
Question generation completed in 45.23 seconds
```
---
## Performance Considerations
### API Call Optimization
**Calls per Workflow:**
For 3 objectives × 3 runs = 9 base objectives:
1. **Learning Objectives:**
- Base generation: 3 calls (one per run)
- Correct answers: 9 calls (one per objective)
- Grouping: 1 call
- Incorrect answers: ~3 calls (best-in-group only)
- Improvement checks: ~3 calls
- **Total: ~19 calls**
2. **Questions (for 3 objectives × 1 run):**
- Question generation: 3 calls (parallel)
- Quality judging: 3 calls (parallel)
- Grouping: 1 call
- Ranking: 1 call
- **Total: ~8 calls**
**Total for complete workflow: ~27 API calls**
### Time Estimates
**Typical Execution Times:**
- File processing: <1 second
- Objective generation (3×3): 30-60 seconds
- Question generation (3×1): 20-40 seconds (with parallelization)
- **Total: 1-2 minutes for small course**
**Factors Affecting Speed:**
- Model selection (gpt-5 slower than gpt-5-mini)
- Number of runs
- Number of objectives/questions
- API rate limits
- Network latency
- Parallel worker count
### Cost Optimization
**Strategies:**
1. Use gpt-5-mini for grouping/ranking (hardcoded)
2. Reduce number of runs (trade-off: variety)
3. Generate fewer objectives initially
4. Use faster models for initial exploration
5. Use premium models for final production
---
## Conclusion
The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs:
- **Modular architecture** for maintainability
- **Structured output generation** for reliability
- **Quality-driven iterative refinement** for excellence
- **Parallel processing** for efficiency
- **Comprehensive error handling** for resilience
The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.