Spaces:

DeepLearningAI
/

quiz-generator-v3

Sleeping

App Files Files Community

quiz-generator-v3 / APP_FUNCTIONALITY_REPORT.md

ecuartasm

Initial commit: AI Course Assessment Generator

217abc3 about 1 month ago

preview code

raw

history blame contribute delete

57.3 kB

	# AI Course Assessment Generator - Functionality Report

	## Table of Contents
	1. [Overview](#overview)
	2. [System Architecture](#system-architecture)
	3. [Data Models](#data-models)
	4. [Application Entry Point](#application-entry-point)
	5. [User Interface Structure](#user-interface-structure)
	6. [Complete Workflow](#complete-workflow)
	7. [Detailed Component Functionality](#detailed-component-functionality)
	8. [Quality Standards and Prompts](#quality-standards-and-prompts)

	---

	## Overview

	The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels.

	### Key Capabilities
	- Multi-format Content Processing: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks)
	- AI-Powered Generation: Uses OpenAI's GPT models with configurable parameters
	- Quality Assurance: Implements LLM-based quality assessment and ranking
	- Source Tracking: Maintains XML-tagged references from source materials to generated content
	- Iterative Improvement: Supports feedback-based regeneration and enhancement
	- Parallel Processing: Generates questions concurrently for improved performance

	---

	## System Architecture

	### Architectural Patterns

	#### 1. Orchestrator Pattern
	Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.

	#### 2. Modular Prompt System
	The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks.

	#### 3. Structured Output Generation
	All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API.

	#### 4. Source Tracking via XML Tags
	Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.

	### Technology Stack
	- Python 3.8+
	- Gradio 5.29.0+: Web-based UI framework
	- Pydantic 2.8.0+: Data validation and schema management
	- OpenAI 1.52.0+: LLM API integration
	- Instructor 1.7.9+: Structured output generation
	- nbformat 5.9.2: Jupyter notebook parsing
	- python-dotenv 1.0.0: Environment variable management

	---

	## Data Models

	### Learning Objectives Progression

	The system uses a hierarchical progression of learning objective models:

	#### 1. BaseLearningObjectiveWithoutCorrectAnswer
	```python
	- id: int
	- learning_objective: str
	- source_reference: Union[List[str], str]
	```
	Initial generation without correct answers.

	#### 2. BaseLearningObjective
	```python
	- id: int
	- learning_objective: str
	- source_reference: Union[List[str], str]
	- correct_answer: str
	```
	Base objectives with correct answers added.

	#### 3. LearningObjective
	```python
	- id: int
	- learning_objective: str
	- source_reference: Union[List[str], str]
	- correct_answer: str
	- incorrect_answer_options: Union[List[str], str]
	- in_group: Optional[bool]
	- group_members: Optional[List[int]]
	- best_in_group: Optional[bool]
	```
	Enhanced with incorrect answer suggestions and grouping metadata.

	#### 4. GroupedLearningObjective
	```python
	(All fields from LearningObjective)
	- in_group: bool (required)
	- group_members: List[int] (required)
	- best_in_group: bool (required)
	```
	Fully grouped and ranked objectives.

	### Question Models Progression

	#### 1. MultipleChoiceOption
	```python
	- option_text: str
	- is_correct: bool
	- feedback: str
	```

	#### 2. MultipleChoiceQuestion
	```python
	- id: int
	- question_text: str
	- options: List[MultipleChoiceOption]
	- learning_objective_id: int
	- learning_objective: str
	- correct_answer: str
	- source_reference: Union[List[str], str]
	- judge_feedback: Optional[str]
	- approved: Optional[bool]
	```

	#### 3. RankedMultipleChoiceQuestion
	```python
	(All fields from MultipleChoiceQuestion)
	- rank: int
	- ranking_reasoning: str
	- in_group: bool
	- group_members: List[int]
	- best_in_group: bool
	```

	#### 4. Assessment
	```python
	- learning_objectives: List[LearningObjective]
	- questions: List[RankedMultipleChoiceQuestion]
	```
	Final output containing both objectives and questions.

	### Configuration Models

	#### MODELS
	Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]`

	#### TEMPERATURE_UNAVAILABLE
	Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings).

	---

	## Application Entry Point

	### `app.py`
	The root-level entry point that:
	1. Loads environment variables from `.env` file
	2. Checks for `OPENAI_API_KEY` presence
	3. Creates the Gradio UI via `ui.app.create_ui()`
	4. Launches the web interface at `http://127.0.0.1:7860`

	```python
	# Workflow:
	load_dotenv() → Check API key → create_ui() → app.launch()
	```

	---

	## User Interface Structure

	### `ui/app.py` - Gradio Interface

	The UI is organized into 3 main tabs:

	#### Tab 1: Generate Learning Objectives

	Input Components:
	- File uploader (accepts `.ipynb`, `.vtt`, `.srt`)
	- Number of objectives per run (slider: 1-20, default: 3)
	- Number of generation runs (dropdown: 1-5, default: 3)
	- Model selection (dropdown, default: "gpt-5")
	- Incorrect answer model selection (dropdown, default: "gpt-5")
	- Temperature setting (dropdown: 0.0-1.0, default: 1.0)
	- Generate button
	- Feedback input textbox
	- Regenerate button

	Output Components:
	- Status textbox
	- Best-in-Group Learning Objectives (JSON)
	- All Grouped Learning Objectives (JSON)
	- Raw Ungrouped Learning Objectives (JSON) - for debugging

	Event Handler: `process_files()` from `objective_handlers.py`

	#### Tab 2: Generate Questions

	Input Components:
	- Learning Objectives JSON (auto-populated from Tab 1)
	- Model selection
	- Temperature setting
	- Number of question generation runs (slider: 1-5, default: 1)
	- Generate Questions button

	Output Components:
	- Status textbox
	- Ranked Best-in-Group Questions (JSON)
	- All Grouped Questions (JSON)
	- Formatted Quiz (human-readable format)

	Event Handler: `generate_questions()` from `question_handlers.py`

	#### Tab 3: Propose/Edit Question

	Input Components:
	- Question guidance/feedback textbox
	- Model selection
	- Temperature setting
	- Generate Question button

	Output Components:
	- Status textbox
	- Generated Question (JSON)

	Event Handler: `propose_question_handler()` from `feedback_handlers.py`

	---

	## Complete Workflow

	### Phase 1: File Upload and Content Processing

	#### Step 1.1: File Upload
	User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface.

	#### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)
	```python
	# Handles different input formats:
	- List of file paths
	- Single file path string
	- File objects with .name attribute
	```

	#### Step 1.3: Content Processing (`ui/content_processor.py`)

	For Subtitle Files (`.vtt`, `.srt`):
	```python
	1. Read file with UTF-8 encoding
	2. Split into lines
	3. Filter out:
	- Empty lines
	- Numeric timestamp indicators
	- Lines containing '-->' (timestamps)
	- 'WEBVTT' header lines
	4. Combine remaining text lines
	5. Wrap in XML tags: <source file='filename.vtt'>content</source>
	```

	For Jupyter Notebooks (`.ipynb`):
	```python
	1. Validate JSON format
	2. Parse with nbformat.read()
	3. Extract from cells:
	- Markdown cells: [Markdown]\n{content}
	- Code cells: [Code]\n```python\n{content}\n```
	4. Combine all cell content
	5. Wrap in XML tags: <source file='filename.ipynb'>content</source>
	```

	Error Handling:
	- Invalid JSON: Wraps raw content in code blocks
	- Parsing failures: Falls back to plain text extraction
	- All errors logged to console

	#### Step 1.4: State Storage
	Processed content stored in global state (`ui/state.py`):
	```python
	processed_file_contents = [tagged_content_1, tagged_content_2, ...]
	```

	### Phase 2: Learning Objective Generation

	#### Step 2.1: Multi-Run Base Generation

	Process: `objective_handlers._generate_multiple_runs()`

	For each run (user-specified, typically 3 runs):

	1. Call: `QuizGenerator.generate_base_learning_objectives()`
	2. Workflow:
	```
	generate_base_learning_objectives()
	↓
	generate_base_learning_objectives_without_correct_answers()
	→ Creates prompt with:
	- BASE_LEARNING_OBJECTIVES_PROMPT
	- BLOOMS_TAXONOMY_LEVELS
	- LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
	- Combined file contents
	→ Calls OpenAI API with structured output
	→ Returns List[BaseLearningObjectiveWithoutCorrectAnswer]
	↓
	generate_correct_answers_for_objectives()
	→ For each objective:
	- Creates prompt with objective and course content
	- Calls OpenAI API (unstructured text response)
	- Extracts correct answer
	→ Returns List[BaseLearningObjective]
	```

	3. ID Assignment:
	```python
	# Temporary IDs by run:
	Run 1: 1001, 1002, 1003
	Run 2: 2001, 2002, 2003
	Run 3: 3001, 3002, 3003
	```

	4. Aggregation:
	All objectives from all runs combined into single list.

	Example: 3 runs × 3 objectives = 9 total base objectives

	#### Step 2.2: Grouping and Ranking

	Process: `objective_handlers._group_base_objectives_add_incorrect_answers()`

	Step 2.2.1: Group Base Objectives
	```python
	QuizGenerator.group_base_learning_objectives()
	↓
	learning_objective_generator/grouping_and_ranking.py
	→ group_base_learning_objectives()
	```

	Grouping Logic:
	1. Creates prompt containing:
	- Original generation criteria
	- All base objectives with IDs
	- Course content for context
	- Grouping instructions

	2. Special Rule: All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective)

	3. LLM Call:
	- Model: `gpt-5-mini`
	- Response format: `GroupedBaseLearningObjectivesResponse`
	- Returns: Grouped objectives with metadata

	4. Output Structure:
	```python
	{
	"all_grouped": [all objectives with group metadata],
	"best_in_group": [objectives marked as best in their groups]
	}
	```

	Step 2.2.2: ID Reassignment (`_reassign_objective_ids()`)
	```python
	1. Find best objective from the 001 group
	2. Assign it ID = 1
	3. Assign remaining objectives IDs starting from 2
	```

	Step 2.2.3: Generate Incorrect Answer Options

	Only for best-in-group objectives:

	```python
	QuizGenerator.generate_lo_incorrect_answer_options()
	↓
	learning_objective_generator/enhancement.py
	→ generate_incorrect_answer_options()
	```

	Process:
	1. For each best-in-group objective:
	- Creates prompt with:
	- Objective and correct answer
	- INCORRECT_ANSWER_PROMPT guidelines
	- INCORRECT_ANSWER_EXAMPLES
	- Course content
	- Calls OpenAI API (with optional model override)
	- Generates 5 plausible incorrect answer options

	2. Returns: `List[LearningObjective]` with incorrect_answer_options populated

	Step 2.2.4: Improve Incorrect Answers

	```python
	learning_objective_generator.regenerate_incorrect_answers()
	↓
	learning_objective_generator/suggestion_improvement.py
	```

	Quality Check Process:
	1. For each objective's incorrect answers:
	- Checks for red flags (contradictory phrases, absolute terms)
	- Examples of red flags:
	- "but not necessarily"
	- "at the expense of"
	- "rather than"
	- "always", "never", "exclusively"

	2. If problems found:
	- Logs issue to `incorrect_suggestion_debug/` directory
	- Regenerates incorrect answers with additional constraints
	- Updates objective with improved answers

	Step 2.2.5: Final Assembly

	Creates final list where:
	- Best-in-group objectives have enhanced incorrect answers
	- Non-best-in-group objectives have empty `incorrect_answer_options: []`

	#### Step 2.3: Display Results

	Three output formats:

	1. Best-in-Group Objectives (primary output):
	- Only objectives marked as best_in_group
	- Includes incorrect answer options
	- Sorted by ID
	- Formatted as JSON

	2. All Grouped Objectives:
	- All objectives with grouping metadata
	- Shows group_members arrays
	- Best-in-group flags visible

	3. Raw Ungrouped (debug):
	- Original objectives from all runs
	- No grouping metadata
	- Original temporary IDs

	#### Step 2.4: State Update
	```python
	set_learning_objectives(grouped_result["all_grouped"])
	set_processed_contents(file_contents) # Already set, but persisted
	```

	### Phase 3: Question Generation

	#### Step 3.1: Parse Learning Objectives

	Process: `question_handlers._parse_learning_objectives()`

	```python
	1. Parse JSON from Tab 1 output
	2. Create LearningObjective objects from dictionaries
	3. Validate required fields
	4. Return List[LearningObjective]
	```

	#### Step 3.2: Multi-Run Question Generation

	Process: `question_handlers._generate_questions_multiple_runs()`

	For each run (user-specified, typically 1 run):

	```python
	QuizGenerator.generate_questions_in_parallel()
	↓
	quiz_generator/assessment.py
	→ generate_questions_in_parallel()
	```

	Parallel Generation Process:

	1. Thread Pool Setup:
	```python
	max_workers = min(len(learning_objectives), 5)
	ThreadPoolExecutor(max_workers=max_workers)
	```

	2. For Each Learning Objective (in parallel):

	Step 3.2.1: Question Generation (`quiz_generator/question_generation.py`)

	```python
	generate_multiple_choice_question()
	```

	a) Source Content Matching:
	```python
	- Extract source_reference from objective
	- Search file_contents for matching XML tags
	- Exact match: <source file='filename.vtt'>
	- Fallback: Partial filename match
	- Last resort: Use all file contents combined
	```

	b) Multi-Source Handling:
	```python
	if len(source_references) > 1:
	Add special instruction:
	"Question should synthesize information across sources"
	```

	c) Prompt Construction:
	```python
	Combines:
	- Learning objective
	- Correct answer
	- Incorrect answer options from objective
	- GENERAL_QUALITY_STANDARDS
	- MULTIPLE_CHOICE_STANDARDS
	- EXAMPLE_QUESTIONS
	- QUESTION_SPECIFIC_QUALITY_STANDARDS
	- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
	- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
	- ANSWER_FEEDBACK_QUALITY_STANDARDS
	- Matched course content
	```

	d) API Call:
	```python
	- Model: User-selected (default: gpt-5)
	- Temperature: User-selected (if supported by model)
	- Response format: MultipleChoiceQuestion
	- Returns: Question with 4 options, each with feedback
	```

	e) Post-Processing:
	```python
	- Set question ID = learning_objective ID
	- Verify all options have feedback
	- Add default feedback if missing
	```

	Step 3.2.2: Quality Assessment (`quiz_generator/question_improvement.py`)

	```python
	judge_question_quality()
	```

	Quality Judging Process:
	```python
	1. Creates evaluation prompt with:
	- Question text and all options
	- Quality criteria from prompts
	- Evaluation instructions

	2. LLM evaluates question for:
	- Clarity and unambiguity
	- Alignment with learning objective
	- Quality of incorrect options
	- Feedback quality
	- Appropriate difficulty

	3. Returns:
	- approved: bool
	- feedback: str (reasoning for judgment)

	4. Updates question:
	question.approved = approved
	question.judge_feedback = feedback
	```

	3. Results Collection:
	```python
	- Questions collected as futures complete
	- IDs assigned sequentially across runs
	- All questions aggregated into single list
	```

	Example: 3 objectives × 1 run = 3 questions generated in parallel

	#### Step 3.3: Grouping Questions

	Process: `quiz_generator/question_ranking.py → group_questions()`

	```python
	1. Creates prompt with:
	- All generated questions
	- Grouping instructions
	- Example format

	2. LLM identifies:
	- Questions testing same concept (same learning_objective_id)
	- Groups of similar questions
	- Best question in each group

	3. Model: gpt-5-mini
	Response format: GroupedMultipleChoiceQuestionsResponse

	4. Returns:
	{
	"grouped": [all questions with group metadata],
	"best_in_group": [best questions from each group]
	}
	```

	#### Step 3.4: Ranking Questions

	Process: `quiz_generator/question_ranking.py → rank_questions()`

	Only ranks best-in-group questions:

	```python
	1. Creates prompt with:
	- RANK_QUESTIONS_PROMPT
	- All quality standards
	- Best-in-group questions only
	- Course content for context

	2. Ranking Criteria:
	- Question clarity and unambiguity
	- Alignment with learning objective
	- Quality of incorrect options
	- Feedback quality
	- Appropriate difficulty (prefers simple English)
	- Adherence to all guidelines
	- Avoidance of absolute terms

	3. Special Instructions:
	- NEVER change question with ID=1
	- Each question gets unique rank (2, 3, 4, ...)
	- Rank 1 is reserved
	- All questions must be returned

	4. Model: User-selected
	Response format: RankedMultipleChoiceQuestionsResponse

	5. Returns:
	{
	"ranked": [questions with rank and ranking_reasoning]
	}
	```

	#### Step 3.5: Format Results

	Process: `question_handlers._format_question_results()`

	Three outputs:

	1. Best-in-Group Ranked Questions:
	```python
	- Sorted by rank
	- Includes all question data
	- Includes rank and ranking_reasoning
	- Includes group metadata
	- Formatted as JSON
	```

	2. All Grouped Questions:
	```python
	- All questions with group metadata
	- No ranking information
	- Shows which questions are in groups
	- Formatted as JSON
	```

	3. Formatted Quiz:
	```python
	format_quiz_for_ui() creates human-readable format:

	Question 1 [Rank: 2]: What is...

	Ranking Reasoning: ...

	• A [Correct]: Option text
	◦ Feedback: Correct feedback

	• B: Option text
	◦ Feedback: Incorrect feedback

	[continues for all questions]
	```

	### Phase 4: Custom Question Generation (Optional)

	Tab 3 Workflow:

	#### Step 4.1: User Input
	User provides:
	- Free-form guidance/feedback text
	- Model selection
	- Temperature setting

	#### Step 4.2: Generation

	Process: `feedback_handlers.propose_question_handler()`

	```python
	QuizGenerator.generate_multiple_choice_question_from_feedback()
	↓
	quiz_generator/feedback_questions.py
	```

	Workflow:
	```python
	1. Retrieves processed file contents from state

	2. Creates prompt combining:
	- User feedback/guidance
	- All quality standards
	- Course content
	- Generation criteria

	3. Model generates:
	- Single question
	- With learning objective inferred from guidance
	- 4 options with feedback
	- Source references

	4. Returns: MultipleChoiceQuestionFromFeedback object
	(includes user feedback as metadata)

	5. Formatted as JSON for display
	```

	### Phase 5: Assessment Export (Automated)

	The final assessment can be saved using:

	```python
	QuizGenerator.save_assessment_to_json()
	↓
	quiz_generator/assessment.py → save_assessment_to_json()
	```

	Process:
	```python
	1. Convert Assessment object to dictionary
	assessment_dict = assessment.model_dump()

	2. Write to JSON file with indent=2
	Default filename: "assessment.json"

	3. Contains:
	- All learning objectives (best-in-group)
	- All ranked questions
	- Complete metadata
	```

	---

	## Detailed Component Functionality

	### Content Processor (`ui/content_processor.py`)

	Class: `ContentProcessor`

	Methods:

	1. `process_files(file_paths: List[str]) -> List[str]`
	- Main entry point for processing multiple files
	- Returns list of XML-tagged content strings
	- Stores results in `self.file_contents`

	2. `process_file(file_path: str) -> List[str]`
	- Routes to appropriate handler based on file extension
	- Returns single-item list with tagged content

	3. `_process_subtitle_file(file_path: str) -> List[str]`
	- Filters out timestamps and metadata
	- Preserves actual subtitle text
	- Wraps in `<source file='...'>` tags

	4. `_process_notebook_file(file_path: str) -> List[str]`
	- Validates JSON structure
	- Parses with nbformat
	- Extracts markdown and code cells
	- Falls back to raw text on parsing errors
	- Wraps in `<source file='...'>` tags

	### Learning Objective Generator (`learning_objective_generator/`)

	#### generator.py - LearningObjectiveGenerator Class

	Orchestrator that delegates to specialized modules:

	Methods:

	1. `generate_base_learning_objectives()`
	- Delegates to `base_generation.py`
	- Returns base objectives with correct answers

	2. `group_base_learning_objectives()`
	- Delegates to `grouping_and_ranking.py`
	- Groups similar objectives
	- Identifies best in each group

	3. `generate_incorrect_answer_options()`
	- Delegates to `enhancement.py`
	- Adds 5 incorrect answer suggestions per objective

	4. `regenerate_incorrect_answers()`
	- Delegates to `suggestion_improvement.py`
	- Quality-checks and improves incorrect answers

	5. `generate_and_group_learning_objectives()`
	- Complete workflow method
	- Combines: base generation → grouping → incorrect answers
	- Returns dict with all_grouped and best_in_group

	#### base_generation.py

	Key Functions:

	`generate_base_learning_objectives()`
	- Wrapper that calls two separate functions
	- First: Generate objectives without correct answers
	- Second: Generate correct answers for those objectives

	`generate_base_learning_objectives_without_correct_answers()`

	Process:
	```python
	1. Extract source filenames from XML tags
	2. Combine all file contents
	3. Create prompt with:
	- BASE_LEARNING_OBJECTIVES_PROMPT
	- BLOOMS_TAXONOMY_LEVELS
	- LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
	- Course content
	4. API call:
	- Model: User-selected
	- Temperature: User-selected (if supported)
	- Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse
	5. Post-process:
	- Assign sequential IDs
	- Normalize source_reference (extract basenames)
	6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer]
	```

	`generate_correct_answers_for_objectives()`

	Process:
	```python
	1. For each objective without answer:
	- Create prompt with objective + course content
	- Call OpenAI API (text response, not structured)
	- Extract correct answer
	- Create BaseLearningObjective with answer
	2. Error handling: Add "[Error generating correct answer]" on failure
	3. Returns: List[BaseLearningObjective]
	```

	Quality Guidelines in Prompt:
	- Objectives must be assessable via multiple-choice
	- Start with action verbs (identify, describe, define, list, compare)
	- One goal per objective
	- Derived directly from course content
	- Tool/framework agnostic (focus on principles, not specific implementations)
	- First objective should be relatively easy recall question
	- Avoid objectives about "building" or "creating" (not MC-assessable)

	#### grouping_and_ranking.py

	Key Functions:

	`group_base_learning_objectives()`

	Process:
	```python
	1. Format objectives for display in prompt
	2. Create grouping prompt with:
	- Original generation criteria
	- All base objectives
	- Course content
	- Grouping instructions
	3. Special rule:
	- All objectives with IDs ending in 1 grouped together
	- Best one selected from this group
	- Will become primary objective (ID=1)
	4. API call:
	- Model: "gpt-5-mini" (hardcoded for efficiency)
	- Response format: GroupedBaseLearningObjectivesResponse
	5. Post-process:
	- Normalize best_in_group to Python bool
	- Filter for best-in-group objectives
	6. Returns:
	{
	"all_grouped": List[GroupedBaseLearningObjective],
	"best_in_group": List[GroupedBaseLearningObjective]
	}
	```

	Grouping Criteria:
	- Topic overlap
	- Similarity of concepts
	- Quality based on original generation criteria
	- Clarity and specificity
	- Alignment with course content

	#### enhancement.py

	Key Function: `generate_incorrect_answer_options()`

	Process:
	```python
	1. For each base objective:
	- Create prompt with:
	- Learning objective and correct answer
	- INCORRECT_ANSWER_PROMPT (detailed guidelines)
	- INCORRECT_ANSWER_EXAMPLES
	- Course content
	- Request 5 plausible incorrect options
	2. API call:
	- Model: model_override or default
	- Temperature: User-selected (if supported)
	- Response format: LearningObjective (includes incorrect_answer_options)
	3. Returns: List[LearningObjective] with all fields populated
	```

	Incorrect Answer Quality Principles:
	- Create common misunderstandings
	- Maintain identical structure to correct answer
	- Use course terminology correctly but in wrong contexts
	- Include partially correct information
	- Avoid obviously wrong answers
	- Mirror detail level and style of correct answer
	- Avoid absolute terms ("always", "never", "exclusively")
	- Avoid contradictory second clauses

	#### suggestion_improvement.py

	Key Function: `regenerate_incorrect_answers()`

	Process:
	```python
	1. For each learning objective:
	- Call should_regenerate_incorrect_answers()

	2. should_regenerate_incorrect_answers():
	- Creates evaluation prompt with:
	- Objective and all incorrect options
	- IMMEDIATE_RED_FLAGS checklist
	- RULES_FOR_SECOND_CLAUSES
	- LLM evaluates each option
	- Returns: needs_regeneration: bool

	3. If regeneration needed:
	- Logs to incorrect_suggestion_debug/{id}.txt
	- Creates new prompt with additional constraints
	- Regenerates incorrect answers
	- Validates again

	4. Returns: List[LearningObjective] with improved incorrect answers
	```

	Red Flags Checked:
	- Contradictory second clauses ("but not necessarily")
	- Explicit negations ("without automating")
	- Opposite descriptions ("fixed steps" for flexible systems)
	- Absolute/comparative terms
	- Hedging that creates limitations
	- Trade-off language creating false dichotomies

	### Quiz Generator (`quiz_generator/`)

	#### generator.py - QuizGenerator Class

	Orchestrator with LearningObjectiveGenerator embedded:

	Initialization:
	```python
	def __init__(self, api_key, model="gpt-5", temperature=1.0):
	self.client = OpenAI(api_key=api_key)
	self.model = model
	self.temperature = temperature
	self.learning_objective_generator = LearningObjectiveGenerator(
	api_key=api_key, model=model, temperature=temperature
	)
	```

	Methods (delegates to specialized modules):

	1. `generate_base_learning_objectives()` → delegates to LearningObjectiveGenerator
	2. `generate_lo_incorrect_answer_options()` → delegates to LearningObjectiveGenerator
	3. `group_base_learning_objectives()` → delegates to grouping_and_ranking.py
	4. `generate_multiple_choice_question()` → delegates to question_generation.py
	5. `generate_questions_in_parallel()` → delegates to assessment.py
	6. `group_questions()` → delegates to question_ranking.py
	7. `rank_questions()` → delegates to question_ranking.py
	8. `judge_question_quality()` → delegates to question_improvement.py
	9. `regenerate_incorrect_answers()` → delegates to question_improvement.py
	10. `generate_multiple_choice_question_from_feedback()` → delegates to feedback_questions.py
	11. `save_assessment_to_json()` → delegates to assessment.py

	#### question_generation.py

	Key Function: `generate_multiple_choice_question()`

	Detailed Process:

	1. Source Content Matching:
	```python
	source_references = learning_objective.source_reference
	if isinstance(source_references, str):
	source_references = [source_references]

	combined_content = ""
	for source_file in source_references:
	# Try exact match: <source file='filename'>
	for file_content in file_contents:
	if f"<source file='{source_file}'>" in file_content:
	combined_content += file_content
	break

	# Fallback: partial match
	if not found:
	for file_content in file_contents:
	if source_file in file_content:
	combined_content += file_content
	break

	# Last resort: use all content
	if not combined_content:
	combined_content = "\n\n".join(file_contents)
	```

	2. Multi-Source Instruction:
	```python
	if len(source_references) > 1:
	Add special instruction:
	"This learning objective spans multiple sources.
	Your question should:
	1. Synthesize information across these sources
	2. Test understanding of overarching themes
	3. Require knowledge from multiple sources"
	```

	3. Prompt Construction:
	Combines extensive quality standards:
	```python
	- Learning objective
	- Correct answer
	- Incorrect answer options from objective
	- GENERAL_QUALITY_STANDARDS
	- MULTIPLE_CHOICE_STANDARDS
	- EXAMPLE_QUESTIONS
	- QUESTION_SPECIFIC_QUALITY_STANDARDS
	- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
	- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
	- ANSWER_FEEDBACK_QUALITY_STANDARDS
	- Multi-source instruction (if applicable)
	- Matched course content
	```

	4. API Call:
	```python
	params = {
	"model": model,
	"messages": [
	{"role": "system", "content": "Expert educational assessment creator"},
	{"role": "user", "content": prompt}
	],
	"response_format": MultipleChoiceQuestion
	}
	if not TEMPERATURE_UNAVAILABLE.get(model, True):
	params["temperature"] = temperature

	response = client.beta.chat.completions.parse(**params)
	```

	5. Post-Processing:
	```python
	- Set response.id = learning_objective.id
	- Set response.learning_objective_id = learning_objective.id
	- Set response.learning_objective = learning_objective.learning_objective
	- Set response.source_reference = learning_objective.source_reference
	- Verify all options have feedback
	- Add default feedback if missing
	```

	6. Error Handling:
	```python
	On exception:
	- Create fallback question with 4 generic options
	- Include error message in question_text
	- Mark as questionable quality
	```

	#### question_ranking.py

	Key Functions:

	`group_questions(questions, file_contents)`

	Process:
	```python
	1. Create prompt with:
	- GROUP_QUESTIONS_PROMPT
	- All questions with complete data
	- Grouping instructions

	2. Grouping Logic:
	- Questions with same learning_objective_id are similar
	- Group by topic overlap
	- Mark best_in_group within each group
	- Single-member groups: best_in_group = true by default

	3. API call:
	- Model: User-selected
	- Response format: GroupedMultipleChoiceQuestionsResponse

	4. Critical Instructions:
	- MUST return ALL questions
	- Each question must have group metadata
	- best_in_group set appropriately

	5. Returns:
	{
	"grouped": List[GroupedMultipleChoiceQuestion],
	"best_in_group": [questions where best_in_group=true]
	}
	```

	`rank_questions(questions, file_contents)`

	Process:
	```python
	1. Create prompt with:
	- RANK_QUESTIONS_PROMPT
	- ALL quality standards (comprehensive)
	- Best-in-group questions only
	- Course content

	2. Ranking Criteria (from prompt):
	- Question clarity and unambiguity
	- Alignment with learning objective
	- Quality of incorrect options
	- Feedback quality
	- Appropriate difficulty (simple English preferred)
	- Adherence to all guidelines
	- Avoidance of problematic words/phrases

	3. Special Instructions:
	- DO NOT change question with ID=1
	- Rank starting from 2 (rank 1 reserved)
	- Each question gets unique rank
	- Must return ALL questions

	4. API call:
	- Model: User-selected
	- Response format: RankedMultipleChoiceQuestionsResponse

	5. Returns:
	{
	"ranked": List[RankedMultipleChoiceQuestion]
	(includes rank and ranking_reasoning for each)
	}
	```

	Simple vs Complex English Examples (from ranking criteria):
	```
	Simple: "AI engineers create computer programs that can learn from data"
	Complex: "AI engineering practitioners architect computational paradigms
	exhibiting autonomous erudition capabilities"
	```

	#### question_improvement.py

	Key Functions:

	`judge_question_quality(client, model, temperature, question)`

	Process:
	```python
	1. Create evaluation prompt with:
	- Question text
	- All options with feedback
	- Quality criteria
	- Evaluation instructions

	2. LLM evaluates:
	- Clarity and lack of ambiguity
	- Alignment with learning objective
	- Quality of distractors (incorrect options)
	- Feedback quality and helpfulness
	- Appropriate difficulty level
	- Adherence to all standards

	3. API call:
	- Unstructured text response
	- LLM returns: APPROVED or NOT APPROVED + reasoning

	4. Parsing:
	approved = "APPROVED" in response.upper()
	feedback = full response text

	5. Returns: (approved: bool, feedback: str)
	```

	`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`

	Process:
	```python
	1. Extract incorrect options from question

	2. Create evaluation prompt with:
	- Each incorrect option
	- IMMEDIATE_RED_FLAGS checklist
	- Course content for context

	3. LLM checks each option for:
	- Contradictory second clauses
	- Explicit negations
	- Absolute terms
	- Opposite descriptions
	- Trade-off language

	4. Returns: needs_regeneration: bool

	5. If true:
	- Log to wrong_answer_debug/ directory
	- Provides detailed feedback on issues
	```

	`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`

	Process:
	```python
	1. For each question:
	- Check if regeneration needed
	- If yes:
	a. Create new prompt with stricter constraints
	b. Include original question for context
	c. Add specific rules about avoiding red flags
	d. Regenerate options
	e. Validate again
	- If no: keep original

	2. Returns: List of questions with improved incorrect answers
	```

	#### feedback_questions.py

	Key Function: `generate_multiple_choice_question_from_feedback()`

	Process:
	```python
	1. Accept user feedback/guidance as free-form text

	2. Create prompt combining:
	- User feedback
	- All quality standards
	- Course content
	- Standard generation criteria

	3. LLM infers:
	- Learning objective from feedback
	- Appropriate question
	- 4 options with feedback
	- Source references

	4. API call:
	- Model: User-selected
	- Response format: MultipleChoiceQuestionFromFeedback

	5. Includes user feedback as metadata in response

	6. Returns: Single question object
	```

	#### assessment.py

	Key Functions:

	`generate_questions_in_parallel()`

	Parallel Processing Details:

	```python
	1. Setup:
	max_workers = min(len(learning_objectives), 5)
	# Limits to 5 concurrent threads

	2. Thread Pool Executor:
	with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:

	3. For each objective (in separate thread):

	Worker function:
	def generate_question_for_objective(objective, idx):
	- Generate question
	- Judge quality
	- Update with approval and feedback
	- Handle errors gracefully
	- Return complete question

	4. Submit all tasks:
	future_to_idx = {
	executor.submit(generate_question_for_objective, obj, i): i
	for i, obj in enumerate(learning_objectives)
	}

	5. Collect results as completed:
	for future in concurrent.futures.as_completed(future_to_idx):
	question = future.result()
	questions.append(question)
	print progress

	6. Error handling:
	- Individual failures don't stop other threads
	- Placeholder questions created on error
	- All errors logged

	7. Returns: List[MultipleChoiceQuestion] with quality judgments
	```

	`save_assessment_to_json(assessment, output_path)`

	```python
	1. Convert Pydantic model to dict:
	assessment_dict = assessment.model_dump()

	2. Write to JSON file:
	with open(output_path, "w") as f:
	json.dump(assessment_dict, f, indent=2)

	3. File contains:
	{
	"learning_objectives": [...],
	"questions": [...]
	}
	```

	### State Management (`ui/state.py`)

	Global State Variables:
	```python
	processed_file_contents = [] # List of XML-tagged content strings
	generated_learning_objectives = [] # List of learning objective objects
	```

	Functions:
	- `get_processed_contents()` → retrieves file contents
	- `set_processed_contents(contents)` → stores file contents
	- `get_learning_objectives()` → retrieves objectives
	- `set_learning_objectives(objectives)` → stores objectives
	- `clear_state()` → resets both variables

	Purpose:
	- Persists data between UI tabs
	- Allows Tab 2 to access content processed in Tab 1
	- Allows Tab 3 to access content for custom questions
	- Enables regeneration with feedback

	### UI Handlers

	#### objective_handlers.py

	`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`

	Complete Workflow:
	```python
	1. Validate inputs (files exist, API key present)
	2. Extract file paths from Gradio file objects
	3. Process files → get XML-tagged content
	4. Store in state
	5. Create QuizGenerator
	6. Generate multiple runs of base objectives
	7. Group and rank objectives
	8. Generate incorrect answers for best-in-group
	9. Improve incorrect answers
	10. Reassign IDs (best from 001 group → ID=1)
	11. Format results for display
	12. Store in state
	13. Return 4 outputs: status, best-in-group, all-grouped, raw
	```

	`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`

	Workflow:
	```python
	1. Retrieve processed contents from state
	2. Append feedback to content:
	file_contents_with_feedback.append(f"FEEDBACK: {feedback}")
	3. Generate new objectives with feedback context
	4. Group and rank
	5. Return regenerated objectives
	```

	`_reassign_objective_ids(grouped_objectives)`

	ID Assignment Logic:
	```python
	1. Find all objectives with IDs ending in 001 (1001, 2001, etc.)
	2. Identify their groups
	3. Find best_in_group objective from these groups
	4. Assign it ID = 1
	5. Assign all other objectives sequential IDs starting from 2
	```

	`_format_objective_results(grouped_result, all_learning_objectives)`

	Formatting:
	```python
	1. Sort by ID
	2. Create dictionaries from Pydantic objects
	3. Include all metadata fields
	4. Convert to JSON with indent=2
	5. Return 3 formatted outputs + status message
	```

	#### question_handlers.py

	`generate_questions(objectives_json, model_name, temperature, num_runs)`

	Complete Workflow:
	```python
	1. Validate inputs
	2. Parse objectives JSON → create LearningObjective objects
	3. Retrieve processed contents from state
	4. Create QuizGenerator
	5. Generate questions (multiple runs in parallel)
	6. Group questions by similarity
	7. Rank best-in-group questions
	8. Optionally improve incorrect answers (currently commented out)
	9. Format results
	10. Return 4 outputs: status, best-ranked, all-grouped, formatted
	```

	`_generate_questions_multiple_runs()`

	```python
	For each run:
	1. Call generate_questions_in_parallel()
	2. Assign unique IDs across runs:
	start_id = len(all_questions) + 1
	for i, q in enumerate(run_questions):
	q.id = start_id + i
	3. Aggregate all questions
	```

	`_group_and_rank_questions()`

	```python
	1. Group all questions → get grouped and best_in_group
	2. Rank only best_in_group questions
	3. Return:
	{
	"grouped": all with group metadata,
	"best_in_group_ranked": best with ranks
	}
	```

	#### feedback_handlers.py

	`propose_question_handler(guidance, model_name, temperature)`

	Workflow:
	```python
	1. Validate state (processed contents available)
	2. Create QuizGenerator
	3. Call generate_multiple_choice_question_from_feedback()
	- Passes user guidance and course content
	- LLM infers learning objective
	- Generates complete question
	4. Format as JSON
	5. Return status and question JSON
	```

	### Formatting Utilities (`ui/formatting.py`)

	`format_quiz_for_ui(questions_json)`

	Process:
	```python
	1. Parse JSON to list of question dictionaries
	2. Sort by rank if available
	3. For each question:
	- Add header: "Question N [Rank: X]: {question_text}"
	- Add ranking reasoning if available
	- For each option:
	- Add letter (A, B, C, D)
	- Mark correct option
	- Include option text
	- Include feedback indented
	4. Return formatted string with markdown
	```

	Output Example:
	```
	Question 1 [Rank: 2]: What is the primary purpose of AI agents?

	Ranking Reasoning: Clear question that tests fundamental understanding...

	• A [Correct]: To automate tasks and make decisions
	◦ Feedback: Correct! AI agents are designed to automate tasks...

	• B: To replace human workers entirely
	◦ Feedback: While AI agents can automate tasks, they are not...

	[continues...]
	```

	---

	## Quality Standards and Prompts

	### Learning Objectives Quality Standards

	From `prompts/learning_objectives.py`:

	BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:

	1. Assessability:
	- Must be testable via multiple-choice questions
	- Cannot be about "building", "creating", "developing"
	- Should use verbs like: identify, list, describe, define, compare

	2. Specificity:
	- One goal per objective
	- Don't combine multiple action verbs
	- Example of what NOT to do: "identify X and explain Y"

	3. Source Alignment:
	- Derived DIRECTLY from course content
	- No topics not covered in content
	- Appropriate difficulty level for course

	4. Independence:
	- Each objective stands alone
	- No dependencies on other objectives
	- No context required from other objectives

	5. Focus:
	- Address "why" over "what" when possible
	- Critical knowledge over trivial facts
	- Principles over specific implementation details

	6. Tool/Framework Agnosticism:
	- Don't mention specific tools/frameworks
	- Focus on underlying principles
	- Example: Don't ask about "Pandas DataFrame methods",
	ask about "data filtering concepts"

	7. First Objective Rule:
	- Should be relatively easy recall question
	- Address main topic/concept of course
	- Format: "Identify what X is" or "Explain why X is important"

	8. Answer Length:
	- Aim for ≤20 words in correct answer
	- Avoid unnecessary elaboration
	- No compound sentences with extra consequences

	BLOOMS_TAXONOMY_LEVELS:

	Levels from lowest to highest:
	- Recall: Retention of key concepts (not trivialities)
	- Comprehension: Connect ideas, demonstrate understanding
	- Application: Apply concept to new but similar scenario
	- Analysis: Examine parts, determine relationships, make inferences
	- Evaluation: Make judgments requiring critical thinking

	LEARNING_OBJECTIVE_EXAMPLES:

	Includes 7 high-quality examples with:
	- Appropriate action verbs
	- Clear learning objectives
	- Concise correct answers (mostly <20 words)
	- Multiple source references
	- Framework-agnostic language

	### Question Quality Standards

	From `prompts/questions.py`:

	GENERAL_QUALITY_STANDARDS:

	- Overall goal: Set learner up for success
	- Perfect score attainable for thoughtful students
	- Aligned with course content
	- Aligned with learning objective and correct answer
	- No references to manual intervention (software/AI course)

	MULTIPLE_CHOICE_STANDARDS:

	- EXACTLY ONE correct answer per question
	- Clear, unambiguous correct answer
	- Plausible distractors representing common misconceptions
	- Not obviously wrong distractors
	- All options similar length and detail
	- Mutually exclusive options
	- Avoid "all/none of the above"
	- Typically 4 options (A, B, C, D)
	- Don't start feedback with "Correct" or "Incorrect"

	QUESTION_SPECIFIC_QUALITY_STANDARDS:

	Questions must:
	- Match language and tone of course
	- Match difficulty level of course
	- Assess only course information
	- Not teach as part of quiz
	- Use clear, concise language
	- Not induce confusion
	- Provide slight (not major) challenge
	- Be easily interpreted and unambiguous
	- Have proper grammar and sentence structure
	- Be thoughtful and specific (not broad and ambiguous)
	- Be complete in wording (understanding question shouldn't be part of assessment)

	CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:

	Correct answers must:
	- Be factually correct and unambiguous
	- Match course language and tone
	- Be complete sentences
	- Match course difficulty level
	- Contain only course information
	- Not teach during quiz
	- Use clear, concise language
	- Be thoughtful and specific
	- Be complete (identifying correct answer shouldn't require interpretation)

	INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:

	Incorrect answers should:
	- Represent reasonable potential misconceptions
	- Sound plausible to non-experts
	- Require thought even from diligent learners
	- Not be obviously wrong
	- Use incorrect_answer_suggestions from objective (as starting point)

	Avoid:
	- Obviously wrong options anyone can eliminate
	- Absolute terms: "always", "never", "only", "exclusively"
	- Phrases like "used exclusively for scenarios where..."

	ANSWER_FEEDBACK_QUALITY_STANDARDS:

	For Incorrect Answers:
	- Be informational and encouraging (not punitive)
	- Single sentence, concise
	- Do NOT say "Incorrect" or "Wrong"

	For Correct Answers:
	- Be informational and encouraging
	- Single sentence, concise
	- Do NOT say "Correct!" (redundant after "Correct: " prefix)

	### Incorrect Answer Generation Guidelines

	From `prompts/incorrect_answers.py`:

	Core Principles:

	1. Create Common Misunderstandings:
	- Represent how students actually misunderstand
	- Confuse related concepts
	- Mix up terminology

	2. Maintain Identical Structure:
	- Match grammatical pattern of correct answer
	- Same length and complexity
	- Same formatting style

	3. Use Course Terminology Correctly but in Wrong Contexts:
	- Apply correct terms incorrectly
	- Confuse with related concepts
	- Example: Describe backpropagation but actually describe forward propagation

	4. Include Partially Correct Information:
	- First part correct, second part wrong
	- Correct process but wrong application
	- Correct concept but incomplete

	5. Avoid Obviously Wrong Answers:
	- No contradictions with basic knowledge
	- Not immediately eliminable
	- Require course knowledge to reject

	6. Mirror Detail Level and Style:
	- Match technical depth
	- Match tone
	- Same level of specificity

	7. For Lists, Maintain Consistency:
	- Same number of items
	- Same format
	- Mix some correct with incorrect items

	8. AVOID ABSOLUTE TERMS:
	- "always", "never", "exclusively", "primarily"
	- "all", "every", "none", "nothing", "only"
	- "must", "required", "impossible"
	- "rather than", "as opposed to", "instead of"

	IMMEDIATE_RED_FLAGS (triggers regeneration):

	Contradictory Second Clauses:
	- "but not necessarily"
	- "at the expense of"
	- "rather than [core concept]"
	- "ensuring X rather than Y"
	- "without necessarily"
	- "but has no impact on"
	- "but cannot", "but prevents", "but limits"

	Explicit Negations:
	- "without automating", "without incorporating"
	- "preventing [main benefit]"
	- "limiting [main capability]"

	Opposite Descriptions:
	- "fixed steps" (for flexible systems)
	- "manual intervention" (for automation)
	- "simple question answering" (for complex processing)

	Hedging Creating Limitations:
	- "sometimes", "occasionally", "might"
	- "to some extent", "partially", "somewhat"

	INCORRECT_ANSWER_EXAMPLES:

	Includes 10 detailed examples showing:
	- Learning objective
	- Correct answer
	- 3 plausible incorrect suggestions
	- Explanation of why each is plausible but wrong
	- Consistent formatting across all options

	### Ranking and Grouping

	RANK_QUESTIONS_PROMPT:

	Criteria:
	1. Question clarity and unambiguity
	2. Alignment with learning objective
	3. Quality of incorrect options
	4. Quality of feedback
	5. Appropriate difficulty (simple English preferred)
	6. Adherence to all guidelines

	Critical Instructions:
	- DO NOT change question with ID=1
	- Rank starting from 2
	- Each question unique rank
	- Must return ALL questions
	- No omissions
	- No duplicate ranks

	Simple vs Complex English:
	```
	Simple: "AI engineers create computer programs that learn from data"
	Complex: "AI engineering practitioners architect computational paradigms
	exhibiting autonomous erudition capabilities"
	```

	GROUP_QUESTIONS_PROMPT:

	Grouping Logic:
	- Questions with same learning_objective_id are similar
	- Identify topic overlap
	- Mark best_in_group within each group
	- Single-member groups: best_in_group = true

	Critical Instructions:
	- Must return ALL questions
	- Each question needs group metadata
	- No omissions
	- Best in group marked appropriately

	---

	## Summary of Data Flow

	### Complete End-to-End Flow

	```
	User Uploads Files
	↓
	ContentProcessor extracts and tags content
	↓
	[Stored in global state]
	↓
	Generate Base Objectives (multiple runs)
	↓
	Group Base Objectives (by similarity)
	↓
	Generate Incorrect Answers (for best-in-group only)
	↓
	Improve Incorrect Answers (quality check)
	↓
	Reassign IDs (best from 001 group → ID=1)
	↓
	[Objectives displayed in UI, stored in state]
	↓
	Generate Questions (parallel, multiple runs)
	↓
	Judge Question Quality (parallel)
	↓
	Group Questions (by similarity)
	↓
	Rank Questions (best-in-group only)
	↓
	[Questions displayed in UI]
	↓
	Format for Display
	↓
	Export to JSON (optional)
	```

	### Key Optimization Strategies

	1. Multiple Generation Runs:
	- Generates variety of objectives/questions
	- Grouping identifies best versions
	- Reduces risk of poor quality individual outputs

	2. Hierarchical Processing:
	- Generate base → Group → Enhance → Improve
	- Only enhances best candidates (saves API calls)
	- Progressive refinement

	3. Parallel Processing:
	- Questions generated concurrently (up to 5 threads)
	- Significant time savings for multiple objectives
	- Independent evaluations

	4. Quality Gating:
	- LLM judges question quality
	- Checks for red flags in incorrect answers
	- Regenerates problematic content

	5. Source Tracking:
	- XML tags preserve origin
	- Questions link back to source materials
	- Enables accurate content matching

	6. Modular Prompts:
	- Reusable quality standards
	- Consistent across all generations
	- Easy to update centrally

	---

	## Configuration and Customization

	### Available Models

	Configured in `models/config.py`:
	```python
	MODELS = [
	"o3-mini", "o1", # Reasoning models (no temperature)
	"gpt-4.1", "gpt-4o", # GPT-4 variants
	"gpt-4o-mini", "gpt-4",
	"gpt-3.5-turbo", # Legacy
	"gpt-5", # Latest (no temperature)
	"gpt-5-mini", # Efficient (no temperature)
	"gpt-5-nano" # Ultra-efficient (no temperature)
	]
	```

	Temperature Support:
	- Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature
	- Other models: Temperature 0.0 to 1.0

	Model Selection Strategy:
	- Base objectives: User-selected (default: gpt-5)
	- Grouping: Hardcoded gpt-5-mini (efficiency)
	- Incorrect answers: Separate user selection (default: gpt-5)
	- Questions: User-selected (default: gpt-5)
	- Quality judging: User-selected or gpt-5-mini

	### Environment Variables

	Required:
	```
	OPENAI_API_KEY=your_api_key_here
	```

	Configured via `.env` file in project root

	### Customization Points

	1. Quality Standards:
	- Edit `prompts/learning_objectives.py`
	- Edit `prompts/questions.py`
	- Edit `prompts/incorrect_answers.py`
	- Changes apply to all future generations

	2. Example Questions/Objectives:
	- Modify LEARNING_OBJECTIVE_EXAMPLES
	- Modify EXAMPLE_QUESTIONS
	- Modify INCORRECT_ANSWER_EXAMPLES
	- LLM learns from these examples

	3. Generation Parameters:
	- Number of objectives per run
	- Number of runs (variety)
	- Temperature (creativity vs consistency)
	- Model selection (quality vs cost/speed)

	4. Parallel Processing:
	- `max_workers` in assessment.py
	- Currently: min(len(objectives), 5)
	- Adjust for your rate limits

	5. Output Formats:
	- Modify `formatting.py` for display
	- Assessment JSON structure in `models/assessment.py`

	---

	## Error Handling and Resilience

	### Content Processing Errors

	- Invalid JSON notebooks: Falls back to raw text
	- Parse failures: Wraps in code blocks, continues
	- Missing files: Logged, skipped
	- Encoding issues: UTF-8 fallback

	### Generation Errors

	- API failures: Logged with traceback
	- Structured output parse errors: Fallback responses created
	- Missing required fields: Default values assigned
	- Validation errors: Caught and logged

	### Parallel Processing Errors

	- Individual thread failures: Don't stop other threads
	- Placeholder questions: Created on error
	- Complete error details: Logged for debugging
	- Graceful degradation: Partial results returned

	### Quality Check Failures

	- Regeneration failures: Original kept with warning
	- Judge unavailable: Questions marked unapproved
	- Validation failures: Detailed logs in debug directories

	---

	## Debug and Logging

	### Debug Directories

	1. `incorrect_suggestion_debug/`
	- Created during objective enhancement
	- Contains logs of problematic incorrect answers
	- Format: `{objective_id}.txt`
	- Includes: Original suggestions, identified issues, regeneration attempts

	2. `wrong_answer_debug/`
	- Created during question improvement
	- Logs question-level incorrect answer issues
	- Regeneration history

	### Console Logging

	Extensive logging throughout:
	- File processing status
	- Generation progress (run numbers)
	- Parallel thread activity (thread IDs)
	- API call results
	- Error messages with tracebacks
	- Timing information (start/end times)

	Example Log Output:
	```
	DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt']
	DEBUG - Found source file: file1.vtt
	Generating 3 learning objectives from 3 files
	Successfully generated 3 learning objectives without correct answers
	Generated correct answer for objective 1
	Grouping 9 base learning objectives
	Received 9 grouped results
	Generating incorrect answer options only for best-in-group objectives...
	PARALLEL: Starting ThreadPoolExecutor with 3 workers
	PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective...
	Question generation completed in 45.23 seconds
	```

	---

	## Performance Considerations

	### API Call Optimization

	Calls per Workflow:

	For 3 objectives × 3 runs = 9 base objectives:

	1. Learning Objectives:
	- Base generation: 3 calls (one per run)
	- Correct answers: 9 calls (one per objective)
	- Grouping: 1 call
	- Incorrect answers: ~3 calls (best-in-group only)
	- Improvement checks: ~3 calls
	- Total: ~19 calls

	2. Questions (for 3 objectives × 1 run):
	- Question generation: 3 calls (parallel)
	- Quality judging: 3 calls (parallel)
	- Grouping: 1 call
	- Ranking: 1 call
	- Total: ~8 calls

	Total for complete workflow: ~27 API calls

	### Time Estimates

	Typical Execution Times:
	- File processing: <1 second
	- Objective generation (3×3): 30-60 seconds
	- Question generation (3×1): 20-40 seconds (with parallelization)
	- Total: 1-2 minutes for small course

	Factors Affecting Speed:
	- Model selection (gpt-5 slower than gpt-5-mini)
	- Number of runs
	- Number of objectives/questions
	- API rate limits
	- Network latency
	- Parallel worker count

	### Cost Optimization

	Strategies:
	1. Use gpt-5-mini for grouping/ranking (hardcoded)
	2. Reduce number of runs (trade-off: variety)
	3. Generate fewer objectives initially
	4. Use faster models for initial exploration
	5. Use premium models for final production

	---

	## Conclusion

	The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs:

	- Modular architecture for maintainability
	- Structured output generation for reliability
	- Quality-driven iterative refinement for excellence
	- Parallel processing for efficiency
	- Comprehensive error handling for resilience

	The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.