Spaces:

DeepLearningAI
/

quiz-generator-v3

Running

ecuartasm Claude Opus 4.6 commited on Feb 23

Commit

217abc3

0 Parent(s):

Initial commit: AI Course Assessment Generator

Quiz generator application that creates learning objectives and multiple-choice
questions from course materials using OpenAI models with structured output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (40) hide show

.gitignore +46 -0
APP_FUNCTIONALITY_REPORT.md +2035 -0
CLAUDE.md +91 -0
README.md +232 -0
app.py +16 -0
diagram.mmd +106 -0
learning_objective_generator/__init__.py +3 -0
learning_objective_generator/base_generation.py +201 -0
learning_objective_generator/enhancement.py +121 -0
learning_objective_generator/generator.py +57 -0
learning_objective_generator/grouping_and_ranking.py +328 -0
learning_objective_generator/suggestion_improvement.py +393 -0
models/__init__.py +60 -0
models/assessment.py +10 -0
models/config.py +4 -0
models/learning_objectives.py +59 -0
models/questions.py +67 -0
prompts/__init__.py +0 -0
prompts/all_quality_standards.py +0 -0
prompts/incorrect_answers.py +184 -0
prompts/learning_objectives.py +216 -0
prompts/questions.py +886 -0
quiz_generator/__init__.py +3 -0
quiz_generator/assessment.py +190 -0
quiz_generator/feedback_questions.py +210 -0
quiz_generator/generator.py +89 -0
quiz_generator/question_generation.py +217 -0
quiz_generator/question_improvement.py +578 -0
quiz_generator/question_ranking.py +474 -0
requirements.txt +5 -0
ui/__init__.py +3 -0
ui/app.py +182 -0
ui/content_processor.py +186 -0
ui/edit_handlers.py +197 -0
ui/feedback_handlers.py +37 -0
ui/formatting.py +46 -0
ui/objective_handlers.py +403 -0
ui/question_handlers.py +245 -0
ui/run_manager.py +323 -0
ui/state.py +29 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,46 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+.venv/
+ENV/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+results/
+incorrect_suggestion_debug/
+Data/
+# VS Code
+.vscode/
+*.code-workspace
+# Environment variables
+.env
+.env.local
+# Claude Code
+.claude/
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log

APP_FUNCTIONALITY_REPORT.md ADDED Viewed

	@@ -0,0 +1,2035 @@

+# AI Course Assessment Generator - Functionality Report
+## Table of Contents
+1. [Overview](#overview)
+2. [System Architecture](#system-architecture)
+3. [Data Models](#data-models)
+4. [Application Entry Point](#application-entry-point)
+5. [User Interface Structure](#user-interface-structure)
+6. [Complete Workflow](#complete-workflow)
+7. [Detailed Component Functionality](#detailed-component-functionality)
+8. [Quality Standards and Prompts](#quality-standards-and-prompts)
+---
+## Overview
+The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels.
+### Key Capabilities
+- **Multi-format Content Processing**: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks)
+- **AI-Powered Generation**: Uses OpenAI's GPT models with configurable parameters
+- **Quality Assurance**: Implements LLM-based quality assessment and ranking
+- **Source Tracking**: Maintains XML-tagged references from source materials to generated content
+- **Iterative Improvement**: Supports feedback-based regeneration and enhancement
+- **Parallel Processing**: Generates questions concurrently for improved performance
+---
+## System Architecture
+### Architectural Patterns
+#### 1. **Orchestrator Pattern**
+Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.
+#### 2. **Modular Prompt System**
+The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks.
+#### 3. **Structured Output Generation**
+All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API.
+#### 4. **Source Tracking via XML Tags**
+Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.
+### Technology Stack
+- **Python 3.8+**
+- **Gradio 5.29.0+**: Web-based UI framework
+- **Pydantic 2.8.0+**: Data validation and schema management
+- **OpenAI 1.52.0+**: LLM API integration
+- **Instructor 1.7.9+**: Structured output generation
+- **nbformat 5.9.2**: Jupyter notebook parsing
+- **python-dotenv 1.0.0**: Environment variable management
+---
+## Data Models
+### Learning Objectives Progression
+The system uses a hierarchical progression of learning objective models:
+#### 1. **BaseLearningObjectiveWithoutCorrectAnswer**
+```python
+- id: int
+- learning_objective: str
+- source_reference: Union[List[str], str]
+```
+Initial generation without correct answers.
+#### 2. **BaseLearningObjective**
+```python
+- id: int
+- learning_objective: str
+- source_reference: Union[List[str], str]
+- correct_answer: str
+```
+Base objectives with correct answers added.
+#### 3. **LearningObjective**
+```python
+- id: int
+- learning_objective: str
+- source_reference: Union[List[str], str]
+- correct_answer: str
+- incorrect_answer_options: Union[List[str], str]
+- in_group: Optional[bool]
+- group_members: Optional[List[int]]
+- best_in_group: Optional[bool]
+```
+Enhanced with incorrect answer suggestions and grouping metadata.
+#### 4. **GroupedLearningObjective**
+```python
+(All fields from LearningObjective)
+- in_group: bool (required)
+- group_members: List[int] (required)
+- best_in_group: bool (required)
+```
+Fully grouped and ranked objectives.
+### Question Models Progression
+#### 1. **MultipleChoiceOption**
+```python
+- option_text: str
+- is_correct: bool
+- feedback: str
+```
+#### 2. **MultipleChoiceQuestion**
+```python
+- id: int
+- question_text: str
+- options: List[MultipleChoiceOption]
+- learning_objective_id: int
+- learning_objective: str
+- correct_answer: str
+- source_reference: Union[List[str], str]
+- judge_feedback: Optional[str]
+- approved: Optional[bool]
+```
+#### 3. **RankedMultipleChoiceQuestion**
+```python
+(All fields from MultipleChoiceQuestion)
+- rank: int
+- ranking_reasoning: str
+- in_group: bool
+- group_members: List[int]
+- best_in_group: bool
+```
+#### 4. **Assessment**
+```python
+- learning_objectives: List[LearningObjective]
+- questions: List[RankedMultipleChoiceQuestion]
+```
+Final output containing both objectives and questions.
+### Configuration Models
+#### **MODELS**
+Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]`
+#### **TEMPERATURE_UNAVAILABLE**
+Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings).
+---
+## Application Entry Point
+### `app.py`
+The root-level entry point that:
+1. Loads environment variables from `.env` file
+2. Checks for `OPENAI_API_KEY` presence
+3. Creates the Gradio UI via `ui.app.create_ui()`
+4. Launches the web interface at `http://127.0.0.1:7860`
+```python
+# Workflow:
+load_dotenv() → Check API key → create_ui() → app.launch()
+```
+---
+## User Interface Structure
+### `ui/app.py` - Gradio Interface
+The UI is organized into **3 main tabs**:
+#### **Tab 1: Generate Learning Objectives**
+**Input Components:**
+- File uploader (accepts `.ipynb`, `.vtt`, `.srt`)
+- Number of objectives per run (slider: 1-20, default: 3)
+- Number of generation runs (dropdown: 1-5, default: 3)
+- Model selection (dropdown, default: "gpt-5")
+- Incorrect answer model selection (dropdown, default: "gpt-5")
+- Temperature setting (dropdown: 0.0-1.0, default: 1.0)
+- Generate button
+- Feedback input textbox
+- Regenerate button
+**Output Components:**
+- Status textbox
+- Best-in-Group Learning Objectives (JSON)
+- All Grouped Learning Objectives (JSON)
+- Raw Ungrouped Learning Objectives (JSON) - for debugging
+**Event Handler:** `process_files()` from `objective_handlers.py`
+#### **Tab 2: Generate Questions**
+**Input Components:**
+- Learning Objectives JSON (auto-populated from Tab 1)
+- Model selection
+- Temperature setting
+- Number of question generation runs (slider: 1-5, default: 1)
+- Generate Questions button
+**Output Components:**
+- Status textbox
+- Ranked Best-in-Group Questions (JSON)
+- All Grouped Questions (JSON)
+- Formatted Quiz (human-readable format)
+**Event Handler:** `generate_questions()` from `question_handlers.py`
+#### **Tab 3: Propose/Edit Question**
+**Input Components:**
+- Question guidance/feedback textbox
+- Model selection
+- Temperature setting
+- Generate Question button
+**Output Components:**
+- Status textbox
+- Generated Question (JSON)
+**Event Handler:** `propose_question_handler()` from `feedback_handlers.py`
+---
+## Complete Workflow
+### Phase 1: File Upload and Content Processing
+#### Step 1.1: File Upload
+User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface.
+#### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)
+```python
+# Handles different input formats:
+- List of file paths
+- Single file path string
+- File objects with .name attribute
+```
+#### Step 1.3: Content Processing (`ui/content_processor.py`)
+**For Subtitle Files (`.vtt`, `.srt`):**
+```python
+1. Read file with UTF-8 encoding
+2. Split into lines
+3. Filter out:
+   - Empty lines
+   - Numeric timestamp indicators
+   - Lines containing '-->' (timestamps)
+   - 'WEBVTT' header lines
+4. Combine remaining text lines
+5. Wrap in XML tags: <source file='filename.vtt'>content</source>
+```
+**For Jupyter Notebooks (`.ipynb`):**
+```python
+1. Validate JSON format
+2. Parse with nbformat.read()
+3. Extract from cells:
+   - Markdown cells: [Markdown]\n{content}
+   - Code cells: [Code]\n```python\n{content}\n```
+4. Combine all cell content
+5. Wrap in XML tags: <source file='filename.ipynb'>content</source>
+```
+**Error Handling:**
+- Invalid JSON: Wraps raw content in code blocks
+- Parsing failures: Falls back to plain text extraction
+- All errors logged to console
+#### Step 1.4: State Storage
+Processed content stored in global state (`ui/state.py`):
+```python
+processed_file_contents = [tagged_content_1, tagged_content_2, ...]
+```
+### Phase 2: Learning Objective Generation
+#### Step 2.1: Multi-Run Base Generation
+**Process:** `objective_handlers._generate_multiple_runs()`
+For each run (user-specified, typically 3 runs):
+1. **Call:** `QuizGenerator.generate_base_learning_objectives()`
+2. **Workflow:**
+   ```
+   generate_base_learning_objectives()
+     ↓
+   generate_base_learning_objectives_without_correct_answers()
+     → Creates prompt with:
+        - BASE_LEARNING_OBJECTIVES_PROMPT
+        - BLOOMS_TAXONOMY_LEVELS
+        - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
+        - Combined file contents
+     → Calls OpenAI API with structured output
+     → Returns List[BaseLearningObjectiveWithoutCorrectAnswer]
+     ↓
+   generate_correct_answers_for_objectives()
+     → For each objective:
+        - Creates prompt with objective and course content
+        - Calls OpenAI API (unstructured text response)
+        - Extracts correct answer
+     → Returns List[BaseLearningObjective]
+   ```
+3. **ID Assignment:**
+   ```python
+   # Temporary IDs by run:
+   Run 1: 1001, 1002, 1003
+   Run 2: 2001, 2002, 2003
+   Run 3: 3001, 3002, 3003
+   ```
+4. **Aggregation:**
+   All objectives from all runs combined into single list.
+**Example:** 3 runs × 3 objectives = 9 total base objectives
+#### Step 2.2: Grouping and Ranking
+**Process:** `objective_handlers._group_base_objectives_add_incorrect_answers()`
+**Step 2.2.1: Group Base Objectives**
+```python
+QuizGenerator.group_base_learning_objectives()
+  ↓
+learning_objective_generator/grouping_and_ranking.py
+  → group_base_learning_objectives()
+```
+**Grouping Logic:**
+1. Creates prompt containing:
+   - Original generation criteria
+   - All base objectives with IDs
+   - Course content for context
+   - Grouping instructions
+2. **Special Rule:** All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective)
+3. **LLM Call:**
+   - Model: `gpt-5-mini`
+   - Response format: `GroupedBaseLearningObjectivesResponse`
+   - Returns: Grouped objectives with metadata
+4. **Output Structure:**
+   ```python
+   {
+     "all_grouped": [all objectives with group metadata],
+     "best_in_group": [objectives marked as best in their groups]
+   }
+   ```
+**Step 2.2.2: ID Reassignment** (`_reassign_objective_ids()`)
+```python
+1. Find best objective from the 001 group
+2. Assign it ID = 1
+3. Assign remaining objectives IDs starting from 2
+```
+**Step 2.2.3: Generate Incorrect Answer Options**
+Only for **best-in-group** objectives:
+```python
+QuizGenerator.generate_lo_incorrect_answer_options()
+  ↓
+learning_objective_generator/enhancement.py
+  → generate_incorrect_answer_options()
+```
+**Process:**
+1. For each best-in-group objective:
+   - Creates prompt with:
+     - Objective and correct answer
+     - INCORRECT_ANSWER_PROMPT guidelines
+     - INCORRECT_ANSWER_EXAMPLES
+     - Course content
+   - Calls OpenAI API (with optional model override)
+   - Generates 5 plausible incorrect answer options
+2. **Returns:** `List[LearningObjective]` with incorrect_answer_options populated
+**Step 2.2.4: Improve Incorrect Answers**
+```python
+learning_objective_generator.regenerate_incorrect_answers()
+  ↓
+learning_objective_generator/suggestion_improvement.py
+```
+**Quality Check Process:**
+1. For each objective's incorrect answers:
+   - Checks for red flags (contradictory phrases, absolute terms)
+   - Examples of red flags:
+     - "but not necessarily"
+     - "at the expense of"
+     - "rather than"
+     - "always", "never", "exclusively"
+2. If problems found:
+   - Logs issue to `incorrect_suggestion_debug/` directory
+   - Regenerates incorrect answers with additional constraints
+   - Updates objective with improved answers
+**Step 2.2.5: Final Assembly**
+Creates final list where:
+- Best-in-group objectives have enhanced incorrect answers
+- Non-best-in-group objectives have empty `incorrect_answer_options: []`
+#### Step 2.3: Display Results
+**Three output formats:**
+1. **Best-in-Group Objectives** (primary output):
+   - Only objectives marked as best_in_group
+   - Includes incorrect answer options
+   - Sorted by ID
+   - Formatted as JSON
+2. **All Grouped Objectives**:
+   - All objectives with grouping metadata
+   - Shows group_members arrays
+   - Best-in-group flags visible
+3. **Raw Ungrouped** (debug):
+   - Original objectives from all runs
+   - No grouping metadata
+   - Original temporary IDs
+#### Step 2.4: State Update
+```python
+set_learning_objectives(grouped_result["all_grouped"])
+set_processed_contents(file_contents)  # Already set, but persisted
+```
+### Phase 3: Question Generation
+#### Step 3.1: Parse Learning Objectives
+**Process:** `question_handlers._parse_learning_objectives()`
+```python
+1. Parse JSON from Tab 1 output
+2. Create LearningObjective objects from dictionaries
+3. Validate required fields
+4. Return List[LearningObjective]
+```
+#### Step 3.2: Multi-Run Question Generation
+**Process:** `question_handlers._generate_questions_multiple_runs()`
+For each run (user-specified, typically 1 run):
+```python
+QuizGenerator.generate_questions_in_parallel()
+  ↓
+quiz_generator/assessment.py
+  → generate_questions_in_parallel()
+```
+**Parallel Generation Process:**
+1. **Thread Pool Setup:**
+   ```python
+   max_workers = min(len(learning_objectives), 5)
+   ThreadPoolExecutor(max_workers=max_workers)
+   ```
+2. **For Each Learning Objective (in parallel):**
+   **Step 3.2.1: Question Generation** (`quiz_generator/question_generation.py`)
+   ```python
+   generate_multiple_choice_question()
+   ```
+   **a) Source Content Matching:**
+   ```python
+   - Extract source_reference from objective
+   - Search file_contents for matching XML tags
+   - Exact match: <source file='filename.vtt'>
+   - Fallback: Partial filename match
+   - Last resort: Use all file contents combined
+   ```
+   **b) Multi-Source Handling:**
+   ```python
+   if len(source_references) > 1:
+       Add special instruction:
+       "Question should synthesize information across sources"
+   ```
+   **c) Prompt Construction:**
+   ```python
+   Combines:
+   - Learning objective
+   - Correct answer
+   - Incorrect answer options from objective
+   - GENERAL_QUALITY_STANDARDS
+   - MULTIPLE_CHOICE_STANDARDS
+   - EXAMPLE_QUESTIONS
+   - QUESTION_SPECIFIC_QUALITY_STANDARDS
+   - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
+   - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+   - ANSWER_FEEDBACK_QUALITY_STANDARDS
+   - Matched course content
+   ```
+   **d) API Call:**
+   ```python
+   - Model: User-selected (default: gpt-5)
+   - Temperature: User-selected (if supported by model)
+   - Response format: MultipleChoiceQuestion
+   - Returns: Question with 4 options, each with feedback
+   ```
+   **e) Post-Processing:**
+   ```python
+   - Set question ID = learning_objective ID
+   - Verify all options have feedback
+   - Add default feedback if missing
+   ```
+   **Step 3.2.2: Quality Assessment** (`quiz_generator/question_improvement.py`)
+   ```python
+   judge_question_quality()
+   ```
+   **Quality Judging Process:**
+   ```python
+   1. Creates evaluation prompt with:
+      - Question text and all options
+      - Quality criteria from prompts
+      - Evaluation instructions
+   2. LLM evaluates question for:
+      - Clarity and unambiguity
+      - Alignment with learning objective
+      - Quality of incorrect options
+      - Feedback quality
+      - Appropriate difficulty
+   3. Returns:
+      - approved: bool
+      - feedback: str (reasoning for judgment)
+   4. Updates question:
+      question.approved = approved
+      question.judge_feedback = feedback
+   ```
+3. **Results Collection:**
+   ```python
+   - Questions collected as futures complete
+   - IDs assigned sequentially across runs
+   - All questions aggregated into single list
+   ```
+**Example:** 3 objectives × 1 run = 3 questions generated in parallel
+#### Step 3.3: Grouping Questions
+**Process:** `quiz_generator/question_ranking.py → group_questions()`
+```python
+1. Creates prompt with:
+   - All generated questions
+   - Grouping instructions
+   - Example format
+2. LLM identifies:
+   - Questions testing same concept (same learning_objective_id)
+   - Groups of similar questions
+   - Best question in each group
+3. Model: gpt-5-mini
+   Response format: GroupedMultipleChoiceQuestionsResponse
+4. Returns:
+   {
+     "grouped": [all questions with group metadata],
+     "best_in_group": [best questions from each group]
+   }
+```
+#### Step 3.4: Ranking Questions
+**Process:** `quiz_generator/question_ranking.py → rank_questions()`
+**Only ranks best-in-group questions:**
+```python
+1. Creates prompt with:
+   - RANK_QUESTIONS_PROMPT
+   - All quality standards
+   - Best-in-group questions only
+   - Course content for context
+2. Ranking Criteria:
+   - Question clarity and unambiguity
+   - Alignment with learning objective
+   - Quality of incorrect options
+   - Feedback quality
+   - Appropriate difficulty (prefers simple English)
+   - Adherence to all guidelines
+   - Avoidance of absolute terms
+3. Special Instructions:
+   - NEVER change question with ID=1
+   - Each question gets unique rank (2, 3, 4, ...)
+   - Rank 1 is reserved
+   - All questions must be returned
+4. Model: User-selected
+   Response format: RankedMultipleChoiceQuestionsResponse
+5. Returns:
+   {
+     "ranked": [questions with rank and ranking_reasoning]
+   }
+```
+#### Step 3.5: Format Results
+**Process:** `question_handlers._format_question_results()`
+**Three outputs:**
+1. **Best-in-Group Ranked Questions:**
+   ```python
+   - Sorted by rank
+   - Includes all question data
+   - Includes rank and ranking_reasoning
+   - Includes group metadata
+   - Formatted as JSON
+   ```
+2. **All Grouped Questions:**
+   ```python
+   - All questions with group metadata
+   - No ranking information
+   - Shows which questions are in groups
+   - Formatted as JSON
+   ```
+3. **Formatted Quiz:**
+   ```python
+   format_quiz_for_ui() creates human-readable format:
+   **Question 1 [Rank: 2]:** What is...
+   Ranking Reasoning: ...
+   • A [Correct]: Option text
+     ◦ Feedback: Correct feedback
+   • B: Option text
+     ◦ Feedback: Incorrect feedback
+   [continues for all questions]
+   ```
+### Phase 4: Custom Question Generation (Optional)
+**Tab 3 Workflow:**
+#### Step 4.1: User Input
+User provides:
+- Free-form guidance/feedback text
+- Model selection
+- Temperature setting
+#### Step 4.2: Generation
+**Process:** `feedback_handlers.propose_question_handler()`
+```python
+QuizGenerator.generate_multiple_choice_question_from_feedback()
+  ↓
+quiz_generator/feedback_questions.py
+```
+**Workflow:**
+```python
+1. Retrieves processed file contents from state
+2. Creates prompt combining:
+   - User feedback/guidance
+   - All quality standards
+   - Course content
+   - Generation criteria
+3. Model generates:
+   - Single question
+   - With learning objective inferred from guidance
+   - 4 options with feedback
+   - Source references
+4. Returns: MultipleChoiceQuestionFromFeedback object
+   (includes user feedback as metadata)
+5. Formatted as JSON for display
+```
+### Phase 5: Assessment Export (Automated)
+The final assessment can be saved using:
+```python
+QuizGenerator.save_assessment_to_json()
+  ↓
+quiz_generator/assessment.py → save_assessment_to_json()
+```
+**Process:**
+```python
+1. Convert Assessment object to dictionary
+   assessment_dict = assessment.model_dump()
+2. Write to JSON file with indent=2
+   Default filename: "assessment.json"
+3. Contains:
+   - All learning objectives (best-in-group)
+   - All ranked questions
+   - Complete metadata
+```
+---
+## Detailed Component Functionality
+### Content Processor (`ui/content_processor.py`)
+**Class: `ContentProcessor`**
+**Methods:**
+1. **`process_files(file_paths: List[str]) -> List[str]`**
+   - Main entry point for processing multiple files
+   - Returns list of XML-tagged content strings
+   - Stores results in `self.file_contents`
+2. **`process_file(file_path: str) -> List[str]`**
+   - Routes to appropriate handler based on file extension
+   - Returns single-item list with tagged content
+3. **`_process_subtitle_file(file_path: str) -> List[str]`**
+   - Filters out timestamps and metadata
+   - Preserves actual subtitle text
+   - Wraps in `<source file='...'>` tags
+4. **`_process_notebook_file(file_path: str) -> List[str]`**
+   - Validates JSON structure
+   - Parses with nbformat
+   - Extracts markdown and code cells
+   - Falls back to raw text on parsing errors
+   - Wraps in `<source file='...'>` tags
+### Learning Objective Generator (`learning_objective_generator/`)
+#### **generator.py - LearningObjectiveGenerator Class**
+**Orchestrator that delegates to specialized modules:**
+**Methods:**
+1. **`generate_base_learning_objectives()`**
+   - Delegates to `base_generation.py`
+   - Returns base objectives with correct answers
+2. **`group_base_learning_objectives()`**
+   - Delegates to `grouping_and_ranking.py`
+   - Groups similar objectives
+   - Identifies best in each group
+3. **`generate_incorrect_answer_options()`**
+   - Delegates to `enhancement.py`
+   - Adds 5 incorrect answer suggestions per objective
+4. **`regenerate_incorrect_answers()`**
+   - Delegates to `suggestion_improvement.py`
+   - Quality-checks and improves incorrect answers
+5. **`generate_and_group_learning_objectives()`**
+   - Complete workflow method
+   - Combines: base generation → grouping → incorrect answers
+   - Returns dict with all_grouped and best_in_group
+#### **base_generation.py**
+**Key Functions:**
+**`generate_base_learning_objectives()`**
+- Wrapper that calls two separate functions
+- First: Generate objectives without correct answers
+- Second: Generate correct answers for those objectives
+**`generate_base_learning_objectives_without_correct_answers()`**
+**Process:**
+```python
+1. Extract source filenames from XML tags
+2. Combine all file contents
+3. Create prompt with:
+   - BASE_LEARNING_OBJECTIVES_PROMPT
+   - BLOOMS_TAXONOMY_LEVELS
+   - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
+   - Course content
+4. API call:
+   - Model: User-selected
+   - Temperature: User-selected (if supported)
+   - Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse
+5. Post-process:
+   - Assign sequential IDs
+   - Normalize source_reference (extract basenames)
+6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer]
+```
+**`generate_correct_answers_for_objectives()`**
+**Process:**
+```python
+1. For each objective without answer:
+   - Create prompt with objective + course content
+   - Call OpenAI API (text response, not structured)
+   - Extract correct answer
+   - Create BaseLearningObjective with answer
+2. Error handling: Add "[Error generating correct answer]" on failure
+3. Returns: List[BaseLearningObjective]
+```
+**Quality Guidelines in Prompt:**
+- Objectives must be assessable via multiple-choice
+- Start with action verbs (identify, describe, define, list, compare)
+- One goal per objective
+- Derived directly from course content
+- Tool/framework agnostic (focus on principles, not specific implementations)
+- First objective should be relatively easy recall question
+- Avoid objectives about "building" or "creating" (not MC-assessable)
+#### **grouping_and_ranking.py**
+**Key Functions:**
+**`group_base_learning_objectives()`**
+**Process:**
+```python
+1. Format objectives for display in prompt
+2. Create grouping prompt with:
+   - Original generation criteria
+   - All base objectives
+   - Course content
+   - Grouping instructions
+3. Special rule:
+   - All objectives with IDs ending in 1 grouped together
+   - Best one selected from this group
+   - Will become primary objective (ID=1)
+4. API call:
+   - Model: "gpt-5-mini" (hardcoded for efficiency)
+   - Response format: GroupedBaseLearningObjectivesResponse
+5. Post-process:
+   - Normalize best_in_group to Python bool
+   - Filter for best-in-group objectives
+6. Returns:
+   {
+     "all_grouped": List[GroupedBaseLearningObjective],
+     "best_in_group": List[GroupedBaseLearningObjective]
+   }
+```
+**Grouping Criteria:**
+- Topic overlap
+- Similarity of concepts
+- Quality based on original generation criteria
+- Clarity and specificity
+- Alignment with course content
+#### **enhancement.py**
+**Key Function: `generate_incorrect_answer_options()`**
+**Process:**
+```python
+1. For each base objective:
+   - Create prompt with:
+     - Learning objective and correct answer
+     - INCORRECT_ANSWER_PROMPT (detailed guidelines)
+     - INCORRECT_ANSWER_EXAMPLES
+     - Course content
+   - Request 5 plausible incorrect options
+2. API call:
+   - Model: model_override or default
+   - Temperature: User-selected (if supported)
+   - Response format: LearningObjective (includes incorrect_answer_options)
+3. Returns: List[LearningObjective] with all fields populated
+```
+**Incorrect Answer Quality Principles:**
+- Create common misunderstandings
+- Maintain identical structure to correct answer
+- Use course terminology correctly but in wrong contexts
+- Include partially correct information
+- Avoid obviously wrong answers
+- Mirror detail level and style of correct answer
+- Avoid absolute terms ("always", "never", "exclusively")
+- Avoid contradictory second clauses
+#### **suggestion_improvement.py**
+**Key Function: `regenerate_incorrect_answers()`**
+**Process:**
+```python
+1. For each learning objective:
+   - Call should_regenerate_incorrect_answers()
+2. should_regenerate_incorrect_answers():
+   - Creates evaluation prompt with:
+     - Objective and all incorrect options
+     - IMMEDIATE_RED_FLAGS checklist
+     - RULES_FOR_SECOND_CLAUSES
+   - LLM evaluates each option
+   - Returns: needs_regeneration: bool
+3. If regeneration needed:
+   - Logs to incorrect_suggestion_debug/{id}.txt
+   - Creates new prompt with additional constraints
+   - Regenerates incorrect answers
+   - Validates again
+4. Returns: List[LearningObjective] with improved incorrect answers
+```
+**Red Flags Checked:**
+- Contradictory second clauses ("but not necessarily")
+- Explicit negations ("without automating")
+- Opposite descriptions ("fixed steps" for flexible systems)
+- Absolute/comparative terms
+- Hedging that creates limitations
+- Trade-off language creating false dichotomies
+### Quiz Generator (`quiz_generator/`)
+#### **generator.py - QuizGenerator Class**
+**Orchestrator with LearningObjectiveGenerator embedded:**
+**Initialization:**
+```python
+def __init__(self, api_key, model="gpt-5", temperature=1.0):
+    self.client = OpenAI(api_key=api_key)
+    self.model = model
+    self.temperature = temperature
+    self.learning_objective_generator = LearningObjectiveGenerator(
+        api_key=api_key, model=model, temperature=temperature
+    )
+```
+**Methods (delegates to specialized modules):**
+1. **`generate_base_learning_objectives()`** → delegates to LearningObjectiveGenerator
+2. **`generate_lo_incorrect_answer_options()`** → delegates to LearningObjectiveGenerator
+3. **`group_base_learning_objectives()`** → delegates to grouping_and_ranking.py
+4. **`generate_multiple_choice_question()`** → delegates to question_generation.py
+5. **`generate_questions_in_parallel()`** → delegates to assessment.py
+6. **`group_questions()`** → delegates to question_ranking.py
+7. **`rank_questions()`** → delegates to question_ranking.py
+8. **`judge_question_quality()`** → delegates to question_improvement.py
+9. **`regenerate_incorrect_answers()`** → delegates to question_improvement.py
+10. **`generate_multiple_choice_question_from_feedback()`** → delegates to feedback_questions.py
+11. **`save_assessment_to_json()`** → delegates to assessment.py
+#### **question_generation.py**
+**Key Function: `generate_multiple_choice_question()`**
+**Detailed Process:**
+**1. Source Content Matching:**
+```python
+source_references = learning_objective.source_reference
+if isinstance(source_references, str):
+    source_references = [source_references]
+combined_content = ""
+for source_file in source_references:
+    # Try exact match: <source file='filename'>
+    for file_content in file_contents:
+        if f"<source file='{source_file}'>" in file_content:
+            combined_content += file_content
+            break
+    # Fallback: partial match
+    if not found:
+        for file_content in file_contents:
+            if source_file in file_content:
+                combined_content += file_content
+                break
+# Last resort: use all content
+if not combined_content:
+    combined_content = "\n\n".join(file_contents)
+```
+**2. Multi-Source Instruction:**
+```python
+if len(source_references) > 1:
+    Add special instruction:
+    "This learning objective spans multiple sources.
+     Your question should:
+     1. Synthesize information across these sources
+     2. Test understanding of overarching themes
+     3. Require knowledge from multiple sources"
+```
+**3. Prompt Construction:**
+Combines extensive quality standards:
+```python
+- Learning objective
+- Correct answer
+- Incorrect answer options from objective
+- GENERAL_QUALITY_STANDARDS
+- MULTIPLE_CHOICE_STANDARDS
+- EXAMPLE_QUESTIONS
+- QUESTION_SPECIFIC_QUALITY_STANDARDS
+- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
+- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+- ANSWER_FEEDBACK_QUALITY_STANDARDS
+- Multi-source instruction (if applicable)
+- Matched course content
+```
+**4. API Call:**
+```python
+params = {
+    "model": model,
+    "messages": [
+        {"role": "system", "content": "Expert educational assessment creator"},
+        {"role": "user", "content": prompt}
+    ],
+    "response_format": MultipleChoiceQuestion
+}
+if not TEMPERATURE_UNAVAILABLE.get(model, True):
+    params["temperature"] = temperature
+response = client.beta.chat.completions.parse(**params)
+```
+**5. Post-Processing:**
+```python
+- Set response.id = learning_objective.id
+- Set response.learning_objective_id = learning_objective.id
+- Set response.learning_objective = learning_objective.learning_objective
+- Set response.source_reference = learning_objective.source_reference
+- Verify all options have feedback
+- Add default feedback if missing
+```
+**6. Error Handling:**
+```python
+On exception:
+- Create fallback question with 4 generic options
+- Include error message in question_text
+- Mark as questionable quality
+```
+#### **question_ranking.py**
+**Key Functions:**
+**`group_questions(questions, file_contents)`**
+**Process:**
+```python
+1. Create prompt with:
+   - GROUP_QUESTIONS_PROMPT
+   - All questions with complete data
+   - Grouping instructions
+2. Grouping Logic:
+   - Questions with same learning_objective_id are similar
+   - Group by topic overlap
+   - Mark best_in_group within each group
+   - Single-member groups: best_in_group = true by default
+3. API call:
+   - Model: User-selected
+   - Response format: GroupedMultipleChoiceQuestionsResponse
+4. Critical Instructions:
+   - MUST return ALL questions
+   - Each question must have group metadata
+   - best_in_group set appropriately
+5. Returns:
+   {
+     "grouped": List[GroupedMultipleChoiceQuestion],
+     "best_in_group": [questions where best_in_group=true]
+   }
+```
+**`rank_questions(questions, file_contents)`**
+**Process:**
+```python
+1. Create prompt with:
+   - RANK_QUESTIONS_PROMPT
+   - ALL quality standards (comprehensive)
+   - Best-in-group questions only
+   - Course content
+2. Ranking Criteria (from prompt):
+   - Question clarity and unambiguity
+   - Alignment with learning objective
+   - Quality of incorrect options
+   - Feedback quality
+   - Appropriate difficulty (simple English preferred)
+   - Adherence to all guidelines
+   - Avoidance of problematic words/phrases
+3. Special Instructions:
+   - DO NOT change question with ID=1
+   - Rank starting from 2 (rank 1 reserved)
+   - Each question gets unique rank
+   - Must return ALL questions
+4. API call:
+   - Model: User-selected
+   - Response format: RankedMultipleChoiceQuestionsResponse
+5. Returns:
+   {
+     "ranked": List[RankedMultipleChoiceQuestion]
+              (includes rank and ranking_reasoning for each)
+   }
+```
+**Simple vs Complex English Examples (from ranking criteria):**
+```
+Simple: "AI engineers create computer programs that can learn from data"
+Complex: "AI engineering practitioners architect computational paradigms
+          exhibiting autonomous erudition capabilities"
+```
+#### **question_improvement.py**
+**Key Functions:**
+**`judge_question_quality(client, model, temperature, question)`**
+**Process:**
+```python
+1. Create evaluation prompt with:
+   - Question text
+   - All options with feedback
+   - Quality criteria
+   - Evaluation instructions
+2. LLM evaluates:
+   - Clarity and lack of ambiguity
+   - Alignment with learning objective
+   - Quality of distractors (incorrect options)
+   - Feedback quality and helpfulness
+   - Appropriate difficulty level
+   - Adherence to all standards
+3. API call:
+   - Unstructured text response
+   - LLM returns: APPROVED or NOT APPROVED + reasoning
+4. Parsing:
+   approved = "APPROVED" in response.upper()
+   feedback = full response text
+5. Returns: (approved: bool, feedback: str)
+```
+**`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`**
+**Process:**
+```python
+1. Extract incorrect options from question
+2. Create evaluation prompt with:
+   - Each incorrect option
+   - IMMEDIATE_RED_FLAGS checklist
+   - Course content for context
+3. LLM checks each option for:
+   - Contradictory second clauses
+   - Explicit negations
+   - Absolute terms
+   - Opposite descriptions
+   - Trade-off language
+4. Returns: needs_regeneration: bool
+5. If true:
+   - Log to wrong_answer_debug/ directory
+   - Provides detailed feedback on issues
+```
+**`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`**
+**Process:**
+```python
+1. For each question:
+   - Check if regeneration needed
+   - If yes:
+     a. Create new prompt with stricter constraints
+     b. Include original question for context
+     c. Add specific rules about avoiding red flags
+     d. Regenerate options
+     e. Validate again
+   - If no: keep original
+2. Returns: List of questions with improved incorrect answers
+```
+#### **feedback_questions.py**
+**Key Function: `generate_multiple_choice_question_from_feedback()`**
+**Process:**
+```python
+1. Accept user feedback/guidance as free-form text
+2. Create prompt combining:
+   - User feedback
+   - All quality standards
+   - Course content
+   - Standard generation criteria
+3. LLM infers:
+   - Learning objective from feedback
+   - Appropriate question
+   - 4 options with feedback
+   - Source references
+4. API call:
+   - Model: User-selected
+   - Response format: MultipleChoiceQuestionFromFeedback
+5. Includes user feedback as metadata in response
+6. Returns: Single question object
+```
+#### **assessment.py**
+**Key Functions:**
+**`generate_questions_in_parallel()`**
+**Parallel Processing Details:**
+```python
+1. Setup:
+   max_workers = min(len(learning_objectives), 5)
+   # Limits to 5 concurrent threads
+2. Thread Pool Executor:
+   with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
+3. For each objective (in separate thread):
+   Worker function:
+   def generate_question_for_objective(objective, idx):
+       - Generate question
+       - Judge quality
+       - Update with approval and feedback
+       - Handle errors gracefully
+       - Return complete question
+4. Submit all tasks:
+   future_to_idx = {
+       executor.submit(generate_question_for_objective, obj, i): i
+       for i, obj in enumerate(learning_objectives)
+   }
+5. Collect results as completed:
+   for future in concurrent.futures.as_completed(future_to_idx):
+       question = future.result()
+       questions.append(question)
+       print progress
+6. Error handling:
+   - Individual failures don't stop other threads
+   - Placeholder questions created on error
+   - All errors logged
+7. Returns: List[MultipleChoiceQuestion] with quality judgments
+```
+**`save_assessment_to_json(assessment, output_path)`**
+```python
+1. Convert Pydantic model to dict:
+   assessment_dict = assessment.model_dump()
+2. Write to JSON file:
+   with open(output_path, "w") as f:
+       json.dump(assessment_dict, f, indent=2)
+3. File contains:
+   {
+     "learning_objectives": [...],
+     "questions": [...]
+   }
+```
+### State Management (`ui/state.py`)
+**Global State Variables:**
+```python
+processed_file_contents = []  # List of XML-tagged content strings
+generated_learning_objectives = []  # List of learning objective objects
+```
+**Functions:**
+- `get_processed_contents()` → retrieves file contents
+- `set_processed_contents(contents)` → stores file contents
+- `get_learning_objectives()` → retrieves objectives
+- `set_learning_objectives(objectives)` → stores objectives
+- `clear_state()` → resets both variables
+**Purpose:**
+- Persists data between UI tabs
+- Allows Tab 2 to access content processed in Tab 1
+- Allows Tab 3 to access content for custom questions
+- Enables regeneration with feedback
+### UI Handlers
+#### **objective_handlers.py**
+**`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`**
+**Complete Workflow:**
+```python
+1. Validate inputs (files exist, API key present)
+2. Extract file paths from Gradio file objects
+3. Process files → get XML-tagged content
+4. Store in state
+5. Create QuizGenerator
+6. Generate multiple runs of base objectives
+7. Group and rank objectives
+8. Generate incorrect answers for best-in-group
+9. Improve incorrect answers
+10. Reassign IDs (best from 001 group → ID=1)
+11. Format results for display
+12. Store in state
+13. Return 4 outputs: status, best-in-group, all-grouped, raw
+```
+**`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`**
+**Workflow:**
+```python
+1. Retrieve processed contents from state
+2. Append feedback to content:
+   file_contents_with_feedback.append(f"FEEDBACK: {feedback}")
+3. Generate new objectives with feedback context
+4. Group and rank
+5. Return regenerated objectives
+```
+**`_reassign_objective_ids(grouped_objectives)`**
+**ID Assignment Logic:**
+```python
+1. Find all objectives with IDs ending in 001 (1001, 2001, etc.)
+2. Identify their groups
+3. Find best_in_group objective from these groups
+4. Assign it ID = 1
+5. Assign all other objectives sequential IDs starting from 2
+```
+**`_format_objective_results(grouped_result, all_learning_objectives)`**
+**Formatting:**
+```python
+1. Sort by ID
+2. Create dictionaries from Pydantic objects
+3. Include all metadata fields
+4. Convert to JSON with indent=2
+5. Return 3 formatted outputs + status message
+```
+#### **question_handlers.py**
+**`generate_questions(objectives_json, model_name, temperature, num_runs)`**
+**Complete Workflow:**
+```python
+1. Validate inputs
+2. Parse objectives JSON → create LearningObjective objects
+3. Retrieve processed contents from state
+4. Create QuizGenerator
+5. Generate questions (multiple runs in parallel)
+6. Group questions by similarity
+7. Rank best-in-group questions
+8. Optionally improve incorrect answers (currently commented out)
+9. Format results
+10. Return 4 outputs: status, best-ranked, all-grouped, formatted
+```
+**`_generate_questions_multiple_runs()`**
+```python
+For each run:
+1. Call generate_questions_in_parallel()
+2. Assign unique IDs across runs:
+   start_id = len(all_questions) + 1
+   for i, q in enumerate(run_questions):
+       q.id = start_id + i
+3. Aggregate all questions
+```
+**`_group_and_rank_questions()`**
+```python
+1. Group all questions → get grouped and best_in_group
+2. Rank only best_in_group questions
+3. Return:
+   {
+     "grouped": all with group metadata,
+     "best_in_group_ranked": best with ranks
+   }
+```
+#### **feedback_handlers.py**
+**`propose_question_handler(guidance, model_name, temperature)`**
+**Workflow:**
+```python
+1. Validate state (processed contents available)
+2. Create QuizGenerator
+3. Call generate_multiple_choice_question_from_feedback()
+   - Passes user guidance and course content
+   - LLM infers learning objective
+   - Generates complete question
+4. Format as JSON
+5. Return status and question JSON
+```
+### Formatting Utilities (`ui/formatting.py`)
+**`format_quiz_for_ui(questions_json)`**
+**Process:**
+```python
+1. Parse JSON to list of question dictionaries
+2. Sort by rank if available
+3. For each question:
+   - Add header: "**Question N [Rank: X]:** {question_text}"
+   - Add ranking reasoning if available
+   - For each option:
+     - Add letter (A, B, C, D)
+     - Mark correct option
+     - Include option text
+     - Include feedback indented
+4. Return formatted string with markdown
+```
+**Output Example:**
+```
+**Question 1 [Rank: 2]:** What is the primary purpose of AI agents?
+Ranking Reasoning: Clear question that tests fundamental understanding...
+	• A [Correct]: To automate tasks and make decisions
+	  ◦ Feedback: Correct! AI agents are designed to automate tasks...
+	• B: To replace human workers entirely
+	  ◦ Feedback: While AI agents can automate tasks, they are not...
+[continues...]
+```
+---
+## Quality Standards and Prompts
+### Learning Objectives Quality Standards
+**From `prompts/learning_objectives.py`:**
+**BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:**
+1. **Assessability:**
+   - Must be testable via multiple-choice questions
+   - Cannot be about "building", "creating", "developing"
+   - Should use verbs like: identify, list, describe, define, compare
+2. **Specificity:**
+   - One goal per objective
+   - Don't combine multiple action verbs
+   - Example of what NOT to do: "identify X and explain Y"
+3. **Source Alignment:**
+   - Derived DIRECTLY from course content
+   - No topics not covered in content
+   - Appropriate difficulty level for course
+4. **Independence:**
+   - Each objective stands alone
+   - No dependencies on other objectives
+   - No context required from other objectives
+5. **Focus:**
+   - Address "why" over "what" when possible
+   - Critical knowledge over trivial facts
+   - Principles over specific implementation details
+6. **Tool/Framework Agnosticism:**
+   - Don't mention specific tools/frameworks
+   - Focus on underlying principles
+   - Example: Don't ask about "Pandas DataFrame methods",
+     ask about "data filtering concepts"
+7. **First Objective Rule:**
+   - Should be relatively easy recall question
+   - Address main topic/concept of course
+   - Format: "Identify what X is" or "Explain why X is important"
+8. **Answer Length:**
+   - Aim for ≤20 words in correct answer
+   - Avoid unnecessary elaboration
+   - No compound sentences with extra consequences
+**BLOOMS_TAXONOMY_LEVELS:**
+Levels from lowest to highest:
+- **Recall:** Retention of key concepts (not trivialities)
+- **Comprehension:** Connect ideas, demonstrate understanding
+- **Application:** Apply concept to new but similar scenario
+- **Analysis:** Examine parts, determine relationships, make inferences
+- **Evaluation:** Make judgments requiring critical thinking
+**LEARNING_OBJECTIVE_EXAMPLES:**
+Includes 7 high-quality examples with:
+- Appropriate action verbs
+- Clear learning objectives
+- Concise correct answers (mostly <20 words)
+- Multiple source references
+- Framework-agnostic language
+### Question Quality Standards
+**From `prompts/questions.py`:**
+**GENERAL_QUALITY_STANDARDS:**
+- Overall goal: Set learner up for success
+- Perfect score attainable for thoughtful students
+- Aligned with course content
+- Aligned with learning objective and correct answer
+- No references to manual intervention (software/AI course)
+**MULTIPLE_CHOICE_STANDARDS:**
+- **EXACTLY ONE** correct answer per question
+- Clear, unambiguous correct answer
+- Plausible distractors representing common misconceptions
+- Not obviously wrong distractors
+- All options similar length and detail
+- Mutually exclusive options
+- Avoid "all/none of the above"
+- Typically 4 options (A, B, C, D)
+- Don't start feedback with "Correct" or "Incorrect"
+**QUESTION_SPECIFIC_QUALITY_STANDARDS:**
+Questions must:
+- Match language and tone of course
+- Match difficulty level of course
+- Assess only course information
+- Not teach as part of quiz
+- Use clear, concise language
+- Not induce confusion
+- Provide slight (not major) challenge
+- Be easily interpreted and unambiguous
+- Have proper grammar and sentence structure
+- Be thoughtful and specific (not broad and ambiguous)
+- Be complete in wording (understanding question shouldn't be part of assessment)
+**CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
+Correct answers must:
+- Be factually correct and unambiguous
+- Match course language and tone
+- Be complete sentences
+- Match course difficulty level
+- Contain only course information
+- Not teach during quiz
+- Use clear, concise language
+- Be thoughtful and specific
+- Be complete (identifying correct answer shouldn't require interpretation)
+**INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:**
+Incorrect answers should:
+- Represent reasonable potential misconceptions
+- Sound plausible to non-experts
+- Require thought even from diligent learners
+- Not be obviously wrong
+- Use incorrect_answer_suggestions from objective (as starting point)
+**Avoid:**
+- Obviously wrong options anyone can eliminate
+- Absolute terms: "always", "never", "only", "exclusively"
+- Phrases like "used exclusively for scenarios where..."
+**ANSWER_FEEDBACK_QUALITY_STANDARDS:**
+**For Incorrect Answers:**
+- Be informational and encouraging (not punitive)
+- Single sentence, concise
+- Do NOT say "Incorrect" or "Wrong"
+**For Correct Answers:**
+- Be informational and encouraging
+- Single sentence, concise
+- Do NOT say "Correct!" (redundant after "Correct: " prefix)
+### Incorrect Answer Generation Guidelines
+**From `prompts/incorrect_answers.py`:**
+**Core Principles:**
+1. **Create Common Misunderstandings:**
+   - Represent how students actually misunderstand
+   - Confuse related concepts
+   - Mix up terminology
+2. **Maintain Identical Structure:**
+   - Match grammatical pattern of correct answer
+   - Same length and complexity
+   - Same formatting style
+3. **Use Course Terminology Correctly but in Wrong Contexts:**
+   - Apply correct terms incorrectly
+   - Confuse with related concepts
+   - Example: Describe backpropagation but actually describe forward propagation
+4. **Include Partially Correct Information:**
+   - First part correct, second part wrong
+   - Correct process but wrong application
+   - Correct concept but incomplete
+5. **Avoid Obviously Wrong Answers:**
+   - No contradictions with basic knowledge
+   - Not immediately eliminable
+   - Require course knowledge to reject
+6. **Mirror Detail Level and Style:**
+   - Match technical depth
+   - Match tone
+   - Same level of specificity
+7. **For Lists, Maintain Consistency:**
+   - Same number of items
+   - Same format
+   - Mix some correct with incorrect items
+8. **AVOID ABSOLUTE TERMS:**
+   - "always", "never", "exclusively", "primarily"
+   - "all", "every", "none", "nothing", "only"
+   - "must", "required", "impossible"
+   - "rather than", "as opposed to", "instead of"
+**IMMEDIATE_RED_FLAGS** (triggers regeneration):
+**Contradictory Second Clauses:**
+- "but not necessarily"
+- "at the expense of"
+- "rather than [core concept]"
+- "ensuring X rather than Y"
+- "without necessarily"
+- "but has no impact on"
+- "but cannot", "but prevents", "but limits"
+**Explicit Negations:**
+- "without automating", "without incorporating"
+- "preventing [main benefit]"
+- "limiting [main capability]"
+**Opposite Descriptions:**
+- "fixed steps" (for flexible systems)
+- "manual intervention" (for automation)
+- "simple question answering" (for complex processing)
+**Hedging Creating Limitations:**
+- "sometimes", "occasionally", "might"
+- "to some extent", "partially", "somewhat"
+**INCORRECT_ANSWER_EXAMPLES:**
+Includes 10 detailed examples showing:
+- Learning objective
+- Correct answer
+- 3 plausible incorrect suggestions
+- Explanation of why each is plausible but wrong
+- Consistent formatting across all options
+### Ranking and Grouping
+**RANK_QUESTIONS_PROMPT:**
+**Criteria:**
+1. Question clarity and unambiguity
+2. Alignment with learning objective
+3. Quality of incorrect options
+4. Quality of feedback
+5. Appropriate difficulty (simple English preferred)
+6. Adherence to all guidelines
+**Critical Instructions:**
+- DO NOT change question with ID=1
+- Rank starting from 2
+- Each question unique rank
+- Must return ALL questions
+- No omissions
+- No duplicate ranks
+**Simple vs Complex English:**
+```
+Simple: "AI engineers create computer programs that learn from data"
+Complex: "AI engineering practitioners architect computational paradigms
+          exhibiting autonomous erudition capabilities"
+```
+**GROUP_QUESTIONS_PROMPT:**
+**Grouping Logic:**
+- Questions with same learning_objective_id are similar
+- Identify topic overlap
+- Mark best_in_group within each group
+- Single-member groups: best_in_group = true
+**Critical Instructions:**
+- Must return ALL questions
+- Each question needs group metadata
+- No omissions
+- Best in group marked appropriately
+---
+## Summary of Data Flow
+### Complete End-to-End Flow
+```
+User Uploads Files
+      ↓
+ContentProcessor extracts and tags content
+      ↓
+[Stored in global state]
+      ↓
+Generate Base Objectives (multiple runs)
+      ↓
+Group Base Objectives (by similarity)
+      ↓
+Generate Incorrect Answers (for best-in-group only)
+      ↓
+Improve Incorrect Answers (quality check)
+      ↓
+Reassign IDs (best from 001 group → ID=1)
+      ↓
+[Objectives displayed in UI, stored in state]
+      ↓
+Generate Questions (parallel, multiple runs)
+      ↓
+Judge Question Quality (parallel)
+      ↓
+Group Questions (by similarity)
+      ↓
+Rank Questions (best-in-group only)
+      ↓
+[Questions displayed in UI]
+      ↓
+Format for Display
+      ↓
+Export to JSON (optional)
+```
+### Key Optimization Strategies
+1. **Multiple Generation Runs:**
+   - Generates variety of objectives/questions
+   - Grouping identifies best versions
+   - Reduces risk of poor quality individual outputs
+2. **Hierarchical Processing:**
+   - Generate base → Group → Enhance → Improve
+   - Only enhances best candidates (saves API calls)
+   - Progressive refinement
+3. **Parallel Processing:**
+   - Questions generated concurrently (up to 5 threads)
+   - Significant time savings for multiple objectives
+   - Independent evaluations
+4. **Quality Gating:**
+   - LLM judges question quality
+   - Checks for red flags in incorrect answers
+   - Regenerates problematic content
+5. **Source Tracking:**
+   - XML tags preserve origin
+   - Questions link back to source materials
+   - Enables accurate content matching
+6. **Modular Prompts:**
+   - Reusable quality standards
+   - Consistent across all generations
+   - Easy to update centrally
+---
+## Configuration and Customization
+### Available Models
+**Configured in `models/config.py`:**
+```python
+MODELS = [
+    "o3-mini", "o1",           # Reasoning models (no temperature)
+    "gpt-4.1", "gpt-4o",       # GPT-4 variants
+    "gpt-4o-mini", "gpt-4",
+    "gpt-3.5-turbo",           # Legacy
+    "gpt-5",                   # Latest (no temperature)
+    "gpt-5-mini",              # Efficient (no temperature)
+    "gpt-5-nano"               # Ultra-efficient (no temperature)
+]
+```
+**Temperature Support:**
+- Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature
+- Other models: Temperature 0.0 to 1.0
+**Model Selection Strategy:**
+- **Base objectives:** User-selected (default: gpt-5)
+- **Grouping:** Hardcoded gpt-5-mini (efficiency)
+- **Incorrect answers:** Separate user selection (default: gpt-5)
+- **Questions:** User-selected (default: gpt-5)
+- **Quality judging:** User-selected or gpt-5-mini
+### Environment Variables
+**Required:**
+```
+OPENAI_API_KEY=your_api_key_here
+```
+**Configured via `.env` file in project root**
+### Customization Points
+1. **Quality Standards:**
+   - Edit `prompts/learning_objectives.py`
+   - Edit `prompts/questions.py`
+   - Edit `prompts/incorrect_answers.py`
+   - Changes apply to all future generations
+2. **Example Questions/Objectives:**
+   - Modify LEARNING_OBJECTIVE_EXAMPLES
+   - Modify EXAMPLE_QUESTIONS
+   - Modify INCORRECT_ANSWER_EXAMPLES
+   - LLM learns from these examples
+3. **Generation Parameters:**
+   - Number of objectives per run
+   - Number of runs (variety)
+   - Temperature (creativity vs consistency)
+   - Model selection (quality vs cost/speed)
+4. **Parallel Processing:**
+   - `max_workers` in assessment.py
+   - Currently: min(len(objectives), 5)
+   - Adjust for your rate limits
+5. **Output Formats:**
+   - Modify `formatting.py` for display
+   - Assessment JSON structure in `models/assessment.py`
+---
+## Error Handling and Resilience
+### Content Processing Errors
+- **Invalid JSON notebooks:** Falls back to raw text
+- **Parse failures:** Wraps in code blocks, continues
+- **Missing files:** Logged, skipped
+- **Encoding issues:** UTF-8 fallback
+### Generation Errors
+- **API failures:** Logged with traceback
+- **Structured output parse errors:** Fallback responses created
+- **Missing required fields:** Default values assigned
+- **Validation errors:** Caught and logged
+### Parallel Processing Errors
+- **Individual thread failures:** Don't stop other threads
+- **Placeholder questions:** Created on error
+- **Complete error details:** Logged for debugging
+- **Graceful degradation:** Partial results returned
+### Quality Check Failures
+- **Regeneration failures:** Original kept with warning
+- **Judge unavailable:** Questions marked unapproved
+- **Validation failures:** Detailed logs in debug directories
+---
+## Debug and Logging
+### Debug Directories
+1. **`incorrect_suggestion_debug/`**
+   - Created during objective enhancement
+   - Contains logs of problematic incorrect answers
+   - Format: `{objective_id}.txt`
+   - Includes: Original suggestions, identified issues, regeneration attempts
+2. **`wrong_answer_debug/`**
+   - Created during question improvement
+   - Logs question-level incorrect answer issues
+   - Regeneration history
+### Console Logging
+**Extensive logging throughout:**
+- File processing status
+- Generation progress (run numbers)
+- Parallel thread activity (thread IDs)
+- API call results
+- Error messages with tracebacks
+- Timing information (start/end times)
+**Example Log Output:**
+```
+DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt']
+DEBUG - Found source file: file1.vtt
+Generating 3 learning objectives from 3 files
+Successfully generated 3 learning objectives without correct answers
+Generated correct answer for objective 1
+Grouping 9 base learning objectives
+Received 9 grouped results
+Generating incorrect answer options only for best-in-group objectives...
+PARALLEL: Starting ThreadPoolExecutor with 3 workers
+PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective...
+Question generation completed in 45.23 seconds
+```
+---
+## Performance Considerations
+### API Call Optimization
+**Calls per Workflow:**
+For 3 objectives × 3 runs = 9 base objectives:
+1. **Learning Objectives:**
+   - Base generation: 3 calls (one per run)
+   - Correct answers: 9 calls (one per objective)
+   - Grouping: 1 call
+   - Incorrect answers: ~3 calls (best-in-group only)
+   - Improvement checks: ~3 calls
+   - **Total: ~19 calls**
+2. **Questions (for 3 objectives × 1 run):**
+   - Question generation: 3 calls (parallel)
+   - Quality judging: 3 calls (parallel)
+   - Grouping: 1 call
+   - Ranking: 1 call
+   - **Total: ~8 calls**
+**Total for complete workflow: ~27 API calls**
+### Time Estimates
+**Typical Execution Times:**
+- File processing: <1 second
+- Objective generation (3×3): 30-60 seconds
+- Question generation (3×1): 20-40 seconds (with parallelization)
+- **Total: 1-2 minutes for small course**
+**Factors Affecting Speed:**
+- Model selection (gpt-5 slower than gpt-5-mini)
+- Number of runs
+- Number of objectives/questions
+- API rate limits
+- Network latency
+- Parallel worker count
+### Cost Optimization
+**Strategies:**
+1. Use gpt-5-mini for grouping/ranking (hardcoded)
+2. Reduce number of runs (trade-off: variety)
+3. Generate fewer objectives initially
+4. Use faster models for initial exploration
+5. Use premium models for final production
+---
+## Conclusion
+The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs:
+- **Modular architecture** for maintainability
+- **Structured output generation** for reliability
+- **Quality-driven iterative refinement** for excellence
+- **Parallel processing** for efficiency
+- **Comprehensive error handling** for resilience
+The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,91 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Development Commands
+### Running the Application
+```bash
+python app.py
+```
+The application will start a Gradio web interface at http://127.0.0.1:7860
+### Environment Setup
+```bash
+# Using uv (recommended)
+uv venv -p 3.12
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+uv pip install -r requirements.txt
+# Using pip
+pip install -r requirements.txt
+```
+### Environment Variables
+Create a `.env` file with:
+```
+OPENAI_API_KEY=your_api_key_here
+```
+## Architecture Overview
+This is an AI Course Assessment Generator that creates learning objectives and multiple-choice questions from course materials. The system uses OpenAI's language models with structured output generation via the `instructor` library.
+### Core Workflow
+1. **Content Processing**: Upload course materials (.vtt, .srt, .ipynb) → Extract and tag content with XML source references
+2. **Learning Objective Generation**: Generate base objectives → Group and rank → Enhance with incorrect answer suggestions
+3. **Question Generation**: Create multiple-choice questions from objectives → Quality assessment → Ranking and grouping
+4. **Assessment Export**: Save final assessment to JSON format
+### Key Architecture Patterns
+**Modular Prompt System**: The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules. This allows for consistent quality standards across different generation tasks.
+**Orchestrator Pattern**: Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.
+**Structured Output**: All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats.
+**Source Tracking**: Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.
+## Key Components
+### Main Generators
+- `LearningObjectiveGenerator` (`learning_objective_generator/generator.py`): Orchestrates learning objective generation, grouping, and enhancement
+- `QuizGenerator` (`quiz_generator/generator.py`): Orchestrates question generation, quality assessment, and ranking
+### Data Models (`models/`)
+- Learning objectives progress from `BaseLearningObjective` → `LearningObjective` (with incorrect answers) → `GroupedLearningObjective`
+- Questions progress from `MultipleChoiceQuestion` → `RankedMultipleChoiceQuestion` → `GroupedMultipleChoiceQuestion`
+- Final output is an `Assessment` containing both objectives and questions
+### Generation Pipeline
+1. **Base Generation**: Create initial learning objectives from content
+2. **Grouping & Ranking**: Group similar objectives and select best in each group
+3. **Enhancement**: Add incorrect answer suggestions to selected objectives
+4. **Question Generation**: Create multiple-choice questions with feedback
+5. **Quality Assessment**: Use LLM judge to evaluate question quality
+6. **Final Ranking**: Rank and group questions for output
+### UI Structure (`ui/`)
+- `app.py`: Gradio interface with tabs for objectives, questions, and export
+- Handler modules process user interactions and coordinate with generators
+- State management tracks data between UI components
+## Development Notes
+### Model Configuration
+- Default model: `gpt-5` with temperature `1.0`
+- Separate model selection for incorrect answer generation (typically `o1`)
+- Quality assessment often uses `gpt-5-mini` for cost efficiency
+### Content Processing
+- Supports `.vtt/.srt` subtitle files and `.ipynb` Jupyter notebooks
+- All content is tagged with XML source references for traceability
+- Content processor handles multiple file formats uniformly
+### Quality Standards
+The system enforces educational quality through modular prompt components:
+- General quality standards apply to all generated content
+- Specific standards for questions, correct answers, and incorrect answers
+- Bloom's taxonomy integration for appropriate learning levels
+- Example-based prompting for consistency

README.md ADDED Viewed

	@@ -0,0 +1,232 @@

+# AI Course Assessment Generator
+This application generates learning objectives and multiple-choice questions for AI course materials based on uploaded content files. It uses OpenAI's language models to create high-quality educational assessments that adhere to specified quality standards.
+## Features
+- Upload course materials in various formats (.vtt, .srt, .ipynb)
+- Generate customizable number of learning objectives
+- Create multiple-choice questions based on learning objectives
+- Evaluate question quality using an LLM judge
+- Save assessments to JSON format
+- Track source references for each learning objective and question
+## Setup
+1. Clone this repository
+2. Install the required dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+3. Create a `.env` file in the project root with your OpenAI API key:
+   ```
+   OPENAI_API_KEY=your_api_key_here
+   ```
+## Usage
+1. Run the application:
+   ```
+   python app.py
+   ```
+2. Open the Gradio interface in your web browser (typically at http://127.0.0.1:7860)
+3. Upload your course materials (.vtt, .srt, .ipynb files)
+4. Specify the number of learning objectives to generate
+5. Select the OpenAI model to use
+6. Generate learning objectives
+7. Review and provide feedback on the generated objectives
+8. Generate multiple-choice questions based on the approved objectives
+9. Review the generated questions and their quality assessments
+10. The final assessment will be saved as `assessment.json` in the project directory
+## Project Structure
+- `app.py`: Entry point for the application
+### Modules
+- `models/`: Pydantic data models
+  - `__init__.py`: Exports all models
+  - `learning_objectives.py`: Learning objective data models
+  - `questions.py`: Question and option data models
+  - `assessment.py`: Assessment data models
+- `ui/`: User interface components
+  - `__init__.py`: Package initialization
+  - `app.py`: Gradio UI implementation
+  - `content_processor.py`: Processes uploaded files and extracts content
+  - `objective_handlers.py`: Handlers for learning objective generation
+  - `question_handlers.py`: Handlers for question generation
+  - `feedback_handlers.py`: Handlers for feedback and regeneration
+  - `formatting.py`: Formatting utilities for UI display
+  - `state.py`: State management for the UI
+- `quiz_generator/`: Quiz generation components
+  - `__init__.py`: Package initialization
+  - `generator.py`: Main QuizGenerator class
+  - `assessment.py`: Assessment generation logic
+  - `question_generation.py`: Question generation logic
+  - `question_improvement.py`: Question quality improvement logic
+  - `question_ranking.py`: Question ranking and grouping logic
+  - `feedback_questions.py`: Feedback-based question generation
+- `learning_objective_generator/`: Learning objective generation components
+  - `__init__.py`: Package initialization
+  - `generator.py`: Main generator class
+  - `base_generation.py`: Base generation logic
+  - `enhancement.py`: Enhancement logic
+  - `grouping_and_ranking.py`: Grouping and ranking logic
+- `prompts/`: Prompt templates and components
+  - `questions.py`: Question generation prompts
+  - `incorrect_answers.py`: Incorrect answer generation prompts
+  - `learning_objectives.py`: Learning objective generation prompts
+- `obsolete/`: Deprecated files (not used in current implementation)
+- `specs.md`: Project specifications
+- `project_flow.md`: Detailed description of the project architecture and workflow
+## Requirements
+- Python 3.8+
+- Gradio 4.19.2+
+- Pydantic 2.8.0+
+- OpenAI 1.52.0+
+- nbformat 5.9.2+
+- instructor 1.7.9+
+- python-dotenv 1.0.0+
+Install dependencies using uv (recommended):
+```
+uv venv -p 3.12
+source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
+uv pip install -r requirements.txt
+```
+## Notes
+- The application uses XML-style source tags to track which file each piece of content comes from
+- Questions are evaluated against quality standards to ensure they meet educational requirements
+- Each question includes feedback for both correct and incorrect answers
+## Prompt Structure
+The application's prompt system in `prompts.py` has been refactored into modular components for better maintainability:
+- `GENERAL_QUALITY_STANDARDS`: Overall quality standards for all generated content
+- `QUESTION_SPECIFIC_QUALITY_STANDARDS`: Standards specific to question generation
+- `CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS`: Standards for correct answer options
+- `INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS`: Standards for creating plausible incorrect answers
+- `EXAMPLE_QUESTIONS`: A collection of high-quality example questions for model guidance
+- `MULTIPLE_CHOICE_STANDARDS`: Standards specific to multiple-choice question format
+- `BLOOMS_TAXONOMY_LEVELS`: Educational taxonomy for different levels of learning
+- `ANSWER_FEEDBACK_QUALITY_STANDARDS`: Standards for providing helpful feedback
+- `LEARNING_OBJECTIVES_PROMPT`: Template for generating learning objectives
+- `LEARNING_OBJECTIVE_EXAMPLES`: Examples of well-formulated learning objectives
+These components are imported and combined in `quiz_generator.py` to create comprehensive prompts for different generation tasks. This modular approach makes it easier to:
+1. Update individual aspects of the prompt without affecting others
+2. Reuse common standards across different generation tasks
+3. Maintain consistent quality across all generated content
+## Detailed Project Flow
+### Overview
+This section provides a more detailed look at how the various components of the system work together to generate educational assessments.
+### Core Components
+1. **Content Processing**: Handles ingestion of course materials from different file formats
+2. **Learning Objective Generation**: Creates learning objectives from the processed content
+3. **Question Generation**: Produces multiple-choice questions for each learning objective
+4. **Quality Assessment**: Evaluates the generated questions for quality
+5. **UI Interface**: Provides a Gradio-based web interface for user interaction
+### Application Entry Point (`app.py`)
+- Serves as the entry point for the application
+- Loads environment variables (including OpenAI API key)
+- Creates and launches the Gradio UI
+### User Interface (`ui/` module)
+- Creates the Gradio interface for user interaction
+- Organizes functionality into tabs:
+  - File upload and learning objective generation
+  - Question generation
+  - Preview and export
+- Key components:
+  - `app.py`: Creates the Gradio interface and defines the UI layout
+  - `objective_handlers.py`: Handles learning objective generation and regeneration
+  - `question_handlers.py`: Handles question generation and regeneration
+  - `feedback_handlers.py`: Handles user feedback and custom question generation
+  - `formatting.py`: Formats quiz data for UI display
+  - `state.py`: Manages state between UI components
+### Content Processing (`ui/content_processor.py`)
+- `ContentProcessor` class processes different file types:
+  - `.vtt` and `.srt` subtitle files
+  - `.ipynb` Jupyter notebook files
+- For each file, adds XML source tags to track the origin of content
+- Returns structured content for further processing
+### Quiz Generation (`quiz_generator/` module)
+- `QuizGenerator` class is the central component that:
+  - Generates learning objectives from processed content
+  - Creates multiple-choice questions for each objective
+  - Judges question quality
+  - Saves assessments to JSON
+#### Learning Objective Generation
+1. Takes processed file contents as input
+2. Combines content and creates a prompt (utilizing modular components from `prompts.py`)
+3. Uses OpenAI's API with instructor to generate learning objectives
+4. Returns structured `LearningObjective` objects
+#### Question Generation
+1. For each learning objective:
+   - Retrieves relevant content from source files
+   - Creates a prompt by combining modular components from `prompts.py`
+   - Generates a multiple-choice question with feedback for each option
+   - Returns a structured `MultipleChoiceQuestion` object
+### Data Models (`models/` module)
+Defines the data structures used throughout the application:
+- `LearningObjective`: Represents a learning objective with ID, text, and source references
+- `MultipleChoiceOption`: Represents an answer option with text, correctness flag, and feedback
+- `MultipleChoiceQuestion`: Represents a complete question with options, linked to learning objectives
+- `RankedMultipleChoiceQuestion`: Extends MultipleChoiceQuestion with ranking information
+- `GroupedMultipleChoiceQuestion`: Extends RankedMultipleChoiceQuestion with grouping information
+- `Assessment`: Collection of learning objectives and questions
+### Prompt Component Integration
+The modular prompt components in the `prompts/` directory are imported into the quiz generation modules and assembled into complete prompts as needed:
+1. **Learning Objective Generation**:
+   - Components like `LEARNING_OBJECTIVES_PROMPT`, `LEARNING_OBJECTIVE_EXAMPLES`, and `BLOOMS_TAXONOMY_LEVELS` are combined with course content
+   - This creates a comprehensive prompt that guides the LLM in generating relevant and well-structured learning objectives
+2. **Question Generation**:
+   - Components like `GENERAL_QUALITY_STANDARDS`, `MULTIPLE_CHOICE_STANDARDS`, `QUESTION_SPECIFIC_QUALITY_STANDARDS`, etc. are combined
+   - Along with the learning objective and course content, these form a detailed prompt that ensures high-quality question generation
+### Workflow Summary
+1. User uploads content files (notebooks, subtitles) through the UI
+2. System processes files and extracts content with source references
+3. LLM generates learning objectives based on content
+4. User reviews and approves learning objectives
+5. System generates multiple-choice questions for each approved objective
+6. Questions are presented to the user for review and export
+This modular approach makes it easier to maintain, update, and experiment with different prompt components without disrupting the overall system. Any changes to the components in `prompts.py` will affect how learning objectives and questions are generated, potentially changing the style, format, and quality of the output.

app.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import os
+from dotenv import load_dotenv
+from ui.app import create_ui
+# Load environment variables
+load_dotenv()
+# Check if API key is set
+if not os.getenv("OPENAI_API_KEY"):
+    print("Warning: OPENAI_API_KEY environment variable not set.")
+    print("Please set it in a .env file or in your environment variables.")
+if __name__ == "__main__":
+    # Create and launch the Gradio UI
+    app = create_ui()
+    app.launch(share=False)

diagram.mmd ADDED Viewed

	@@ -0,0 +1,106 @@

+sequenceDiagram
+    participant U as User
+    participant App as app.py
+    participant UI as ui/app.py
+    participant OH as objective_handlers.py
+    participant QH as question_handlers.py
+    participant FH as feedback_handlers.py
+    participant CP as ContentProcessor
+    participant QG as QuizGenerator
+    participant State as state.py
+    participant OpenAI as OpenAI API
+    Note over U, OpenAI: Application Startup
+    U->>App: python app.py
+    App->>App: load_dotenv()
+    App->>App: Check OPENAI_API_KEY
+    App->>UI: create_ui()
+    UI->>UI: Create Gradio interface with 3 tabs
+    UI->>U: Launch web interface at http://127.0.0.1:7860
+    Note over U, OpenAI: Tab 1: Generate Learning Objectives
+    U->>UI: Upload files (.vtt, .srt, .ipynb)
+    U->>UI: Set parameters (objectives, runs, model, temperature)
+    U->>UI: Click "Generate Learning Objectives"
+    UI->>OH: process_files(files, params)
+    OH->>OH: _extract_file_paths(files)
+    OH->>CP: process_files(file_paths)
+    CP->>OH: file_contents (with XML tags)
+    OH->>State: set_processed_contents(file_contents)
+    OH->>QG: QuizGenerator(api_key, model, temperature)
+    OH->>OH: _generate_multiple_runs()
+    loop For each run
+        OH->>QG: generate_base_learning_objectives()
+        QG->>OpenAI: API call for objectives
+        OpenAI->>QG: Base learning objectives
+    end
+    OH->>OH: _group_base_objectives_add_incorrect_answers()
+    OH->>QG: group_base_learning_objectives()
+    QG->>OpenAI: API call for grouping/ranking
+    OpenAI->>QG: Grouped objectives
+    OH->>QG: generate_lo_incorrect_answer_options()
+    QG->>OpenAI: API call for incorrect answers
+    OpenAI->>QG: Enhanced objectives
+    OH->>State: set_learning_objectives(grouped_result)
+    OH->>OH: _format_objective_results()
+    OH->>UI: Return formatted results
+    UI->>U: Display objectives in 3 text boxes
+    Note over U, OpenAI: Tab 2: Generate Questions
+    U->>UI: Review objectives JSON (auto-populated)
+    U->>UI: Set question generation parameters
+    U->>UI: Click "Generate Questions"
+    UI->>QH: generate_questions(objectives_json, params)
+    QH->>QH: _parse_learning_objectives(objectives_json)
+    QH->>State: get_processed_contents()
+    QH->>QG: QuizGenerator(api_key, model, temperature)
+    QH->>QH: _generate_questions_multiple_runs()
+    loop For each run
+        QH->>QG: generate_questions_in_parallel()
+        QG->>OpenAI: API calls for questions
+        OpenAI->>QG: Multiple choice questions
+    end
+    QH->>QH: _group_and_rank_questions()
+    QH->>QG: group_questions()
+    QG->>OpenAI: API call for grouping
+    OpenAI->>QG: Grouped questions
+    QH->>QG: rank_questions()
+    QG->>OpenAI: API call for ranking
+    OpenAI->>QG: Ranked questions
+    QH->>QH: _format_question_results()
+    QH->>UI: Return formatted quiz results
+    UI->>U: Display questions and formatted quiz
+    Note over U, OpenAI: Tab 3: Propose/Edit Question
+    U->>UI: Enter question guidance/feedback
+    U->>UI: Set model parameters
+    U->>UI: Click "Generate Question"
+    UI->>FH: propose_question_handler(guidance, params)
+    FH->>State: get_processed_contents()
+    FH->>QG: QuizGenerator(api_key, model, temperature)
+    FH->>QG: generate_multiple_choice_question_from_feedback()
+    QG->>OpenAI: API call with feedback
+    OpenAI->>QG: Single question
+    FH->>UI: Return formatted question JSON
+    UI->>U: Display generated question
+    Note over U, OpenAI: Optional: Regenerate Objectives
+    U->>UI: Provide feedback on objectives
+    U->>UI: Click "Regenerate Learning Objectives"
+    UI->>OH: regenerate_objectives(objectives, feedback, params)
+    OH->>State: get_processed_contents()
+    OH->>OH: Add feedback to file_contents
+    OH->>QG: Generate with feedback context
+    QG->>OpenAI: API calls with feedback
+    OpenAI->>QG: Improved objectives
+    OH->>UI: Return regenerated objectives
+    UI->>U: Display updated objectives

learning_objective_generator/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .generator import LearningObjectiveGenerator
2	+
3	+ __all__ = ['LearningObjectiveGenerator']

learning_objective_generator/base_generation.py ADDED Viewed

	@@ -0,0 +1,201 @@

+from typing import List
+from openai import OpenAI
+import re
+import os
+from models import BaseLearningObjective, BaseLearningObjectiveWithoutCorrectAnswer, BaseLearningObjectivesWithoutCorrectAnswerResponse, TEMPERATURE_UNAVAILABLE
+from prompts.learning_objectives import BASE_LEARNING_OBJECTIVES_PROMPT, BLOOMS_TAXONOMY_LEVELS, LEARNING_OBJECTIVE_EXAMPLES, LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from ui.run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+def generate_base_learning_objectives(client: OpenAI, model: str, temperature: float, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjective]:
+    """
+    Generate learning objectives with correct answers by first generating the objectives and then adding correct answers.
+    This is a wrapper function that calls the two separate functions for better separation of concerns.
+    """
+    print(f"Generating {num_objectives} learning objectives from {len(file_contents)} files")
+    # First, generate the learning objectives without correct answers
+    objectives_without_answers = generate_base_learning_objectives_without_correct_answers(
+        client, model, temperature, file_contents, num_objectives
+    )
+    # Then, generate correct answers for those objectives
+    objectives_with_answers = generate_correct_answers_for_objectives(
+        client, model, temperature, file_contents, objectives_without_answers
+    )
+    return objectives_with_answers
+def generate_base_learning_objectives_without_correct_answers(client: OpenAI, model: str, temperature: float, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjectiveWithoutCorrectAnswer]:
+    """Generate learning objectives without correct answers from course content."""
+    # Extract the source filenames for reference
+    sources = set()
+    for file_content in file_contents:
+        source_match = re.search(r"<source file='([^']+)'>", file_content)
+        if source_match:
+            source = source_match.group(1)
+            sources.add(source)
+            print(f"DEBUG - Found source file: {source}")
+    print(f"DEBUG - Found {len(sources)} source files: {sources}")
+    print(f"DEBUG - Using {len(file_contents)} files for learning objectives")
+    combined_content = "\n\n".join(file_contents)
+    prompt = f"""
+    You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials. Based on the following course content, generate {num_objectives} clear and concise learning objectives.
+    {BASE_LEARNING_OBJECTIVES_PROMPT}
+    Consider Bloom's taxonomy in the context of the learning objective you are writing and choose the appropriate framing for the question
+    and answer options in the context of Bloom's taxonomy.
+    <BloomsTaxonomyLevels>
+    {BLOOMS_TAXONOMY_LEVELS}
+    </BloomsTaxonomyLevels>
+    Format your response like this, according to the data model provided for each objective:
+    ```json
+    {{
+    id: int = Unique identifier for the learning objective,
+    learning_objective: str = the learning objective text,
+    source_reference: Union[List[str], str] = List of paths to the files from which this learning objective was extracted,
+    }}
+    ```
+    Here is an example of high quality learning objectives:
+    <learning objectives>
+    {LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS}
+    </learning objectives>
+    Below is the course content. The source references are embedded in xml tags within the context.
+    <course content>
+    {combined_content}
+    </course content>
+    """
+    try:
+        # Use OpenAI beta API for structured output
+        params = {
+            "model": model,
+            "messages": [
+                {"role": "system", "content": "You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials."},
+                {"role": "user", "content": prompt}
+            ],
+            "response_format": BaseLearningObjectivesWithoutCorrectAnswerResponse
+        }
+        if not TEMPERATURE_UNAVAILABLE.get(model, True):
+            params["temperature"] = temperature
+        completion = client.beta.chat.completions.parse(**params)
+        response = completion.choices[0].message.parsed.objectives
+        # Assign IDs and format source_reference
+        for i, objective in enumerate(response):
+            objective.id = i + 1
+            if isinstance(objective.source_reference, str):
+                if "," in objective.source_reference:
+                    source_refs = [os.path.basename(src.strip()) for src in objective.source_reference.split(",")]
+                    objective.source_reference = source_refs
+                else:
+                    objective.source_reference = os.path.basename(objective.source_reference)
+            elif isinstance(objective.source_reference, list):
+                objective.source_reference = [os.path.basename(src) for src in objective.source_reference]
+        print(f"Successfully generated {len(response)} learning objectives without correct answers")
+        return response
+    except Exception as e:
+        print(f"Error generating learning objectives without correct answers: {e}")
+        # Re-raise the exception instead of generating fallbacks
+        raise
+def generate_correct_answers_for_objectives(client: OpenAI, model: str, temperature: float, file_contents: List[str], objectives_without_answers: List[BaseLearningObjectiveWithoutCorrectAnswer]) -> List[BaseLearningObjective]:
+    """Generate correct answers for the given learning objectives."""
+    combined_content = "\n\n".join(file_contents)
+    run_manager = _get_run_manager()
+    # Create a list to store the objectives with answers
+    objectives_with_answers = []
+    # Process each objective to generate a correct answer
+    for objective in objectives_without_answers:
+        prompt = f"""
+        You are an expert educational content creator specializing in creating precise, relevant, and concise correct answers for learning objectives.
+        Use the below learning objective to generate the correct answer:
+        <learning_objective>
+        "id": {objective.id},
+        "learning_objective": "{objective.learning_objective}",
+        "source_reference": "{objective.source_reference}"
+        </learning_objective>
+        Use the below course content to generate the correct answer:
+        <course_content>
+        {combined_content}
+        </course_content>
+        Please provide a clear, concise, and accurate correct answer for this learning objective. The answer should be:
+        1. Directly answering the learning objective
+        2. Concise (preferably under 20 words). Avoids unnecessary length. See example below on avoiding unnecessary length.
+        3. Focused on the core concept without unnecessary elaboration
+        4. Based on the course content provided
+        Format your response as a plain text answer only, without any additional explanation or formatting.
+        Here are examples of high quality learning objective examples with correct answers:
+        <learning_objective_examples>
+        {LEARNING_OBJECTIVE_EXAMPLES}
+        </learning_objective_examples>
+        """
+        try:
+            params = {
+                "model": model,
+                "messages": [
+                    {"role": "system", "content": "You are an expert educational content creator specializing in creating precise, relevant correct answers for learning objectives."},
+                    {"role": "user", "content": prompt}
+                ]
+            }
+            if not TEMPERATURE_UNAVAILABLE.get(model, True):
+                params["temperature"] = temperature
+            completion = client.chat.completions.create(**params)
+            correct_answer = completion.choices[0].message.content.strip()
+            # Create a new BaseLearningObjective with the correct answer
+            objective_with_answer = BaseLearningObjective(
+                id=objective.id,
+                learning_objective=objective.learning_objective,
+                source_reference=objective.source_reference,
+                correct_answer=correct_answer
+            )
+            objectives_with_answers.append(objective_with_answer)
+            if run_manager:
+                run_manager.log(f"Generated correct answer for objective {objective.id}", level="INFO")
+        except Exception as e:
+            if run_manager:
+                run_manager.log(f"Error generating correct answer for objective {objective.id}: {e}", level="ERROR")
+            # Create an objective with an error message as the correct answer
+            objective_with_answer = BaseLearningObjective(
+                id=objective.id,
+                learning_objective=objective.learning_objective,
+                source_reference=objective.source_reference,
+                correct_answer="[Error generating correct answer]"
+            )
+            objectives_with_answers.append(objective_with_answer)
+    print(f"Successfully generated correct answers for {len(objectives_with_answers)} learning objectives")
+    return objectives_with_answers

learning_objective_generator/enhancement.py ADDED Viewed

	@@ -0,0 +1,121 @@

+from typing import List
+from openai import OpenAI
+from models import BaseLearningObjective, LearningObjective, TEMPERATURE_UNAVAILABLE
+from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+def generate_incorrect_answer_options(client: OpenAI, model: str, temperature: float, file_contents: List[str], base_objectives: List[BaseLearningObjective], model_override: str = None) -> List[LearningObjective]:
+    """
+    Generate incorrect answer options for each base learning objective.
+    Args:
+        file_contents: List of file contents with source tags
+        base_objectives: List of base learning objectives to enhance
+    Returns:
+        List of learning objectives with incorrect answer suggestions
+    """
+    print(f"Generating incorrect answer options for {len(base_objectives)} learning objectives")
+    # Create combined content for context
+    combined_content = "\n\n".join(file_contents)
+    enhanced_objectives = []
+    for i, objective in enumerate(base_objectives):
+        print(f"Processing objective {i+1}/{len(base_objectives)}: {objective.learning_objective[:50]}...")
+        print(f"Learning objective: {objective.learning_objective}")
+        print(f"Correct answer: {objective.correct_answer}")
+    #     # Create the prompt for generating incorrect answer options
+        prompt = f"""
+Based on the learning objective and correct answer provided below.
+Learning Objective: {objective.learning_objective}
+Correct Answer: {objective.correct_answer}
+Generate 3 incorrect answer options.
+Use the examples with explanations below to guide you in generating incorrect answer options:
+<examples_with_explanation>
+    {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+</examples_with_explanation>
+    Here's the course content that the student has been exposed to:
+    <course_content>
+    {combined_content}
+    </course_content>
+    Here's the learning objective that was identified:
+    <learning_objective>
+    "id": {objective.id},
+    "learning_objective": "{objective.learning_objective}",
+    "source_reference": "{objective.source_reference}",
+    "correct_answer": "{objective.correct_answer}"
+    </learning_objective>
+        When creating incorrect answers, refer to the correct answer <correct_answer>{objective.correct_answer}</correct_answer>.
+    Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
+    Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence and 20 words long. Pay attention to the
+    example in <examples_with_explanation> about avoiding unnecessary length.
+    """
+        try:
+            model_to_use = model_override if model_override else model
+            # Use OpenAI beta API for structured output
+            system_prompt = "You are an expert in designing effective multiple-choice questions that assess higher-order thinking skills while following established educational best practices."
+            params = {
+                "model": model_to_use,
+                "messages": [
+                    {"role": "system", "content": system_prompt},
+                    {"role": "user", "content": prompt}
+                ],
+                "response_format": LearningObjective
+            }
+            if not TEMPERATURE_UNAVAILABLE.get(model_to_use, True):
+                params["temperature"] = temperature # Use higher temperature for creative misconceptions
+            print(f"DEBUG - Using model {model_to_use} for incorrect answer options with temperature {params.get('temperature', 'N/A')}")
+            completion = client.beta.chat.completions.parse(**params)
+            enhanced_obj = completion.choices[0].message.parsed
+            # Simple debugging for incorrect answer suggestions
+            if enhanced_obj.incorrect_answer_options:
+                print(f"  → Got {len(enhanced_obj.incorrect_answer_options)} incorrect answers")
+                print(f"  → First option: {enhanced_obj.incorrect_answer_options[0][:100]}..." if len(enhanced_obj.incorrect_answer_options[0]) > 100 else enhanced_obj.incorrect_answer_options[0])
+            else:
+                print("  → No incorrect answer options received!")
+            # Preserve grouping metadata from the original objective
+            enhanced_obj.in_group = getattr(objective, 'in_group', None)
+            enhanced_obj.group_members = getattr(objective, 'group_members', None)
+            enhanced_obj.best_in_group = getattr(objective, 'best_in_group', None)
+            enhanced_objectives.append(enhanced_obj)
+        except Exception as e:
+            print(f"Error generating incorrect answer options for objective {objective.id}: {e}")
+            # If there's an error, create a learning objective without suggestions
+            enhanced_obj = LearningObjective(
+                id=objective.id,
+                learning_objective=objective.learning_objective,
+                source_reference=objective.source_reference,
+                correct_answer=objective.correct_answer,
+                incorrect_answer_options=None,
+                in_group=getattr(objective, 'in_group', None),
+                group_members=getattr(objective, 'group_members', None),
+                best_in_group=getattr(objective, 'best_in_group', None)
+            )
+            enhanced_objectives.append(enhanced_obj)
+    print(f"Generated incorrect answer options for {len(enhanced_objectives)} learning objectives")
+    return enhanced_objectives

learning_objective_generator/generator.py ADDED Viewed

	@@ -0,0 +1,57 @@

+from typing import List, Dict, Any
+from openai import OpenAI
+from models import BaseLearningObjective, LearningObjective
+from .base_generation import generate_base_learning_objectives
+from .enhancement import generate_incorrect_answer_options
+from .grouping_and_ranking import group_base_learning_objectives, get_best_in_group_objectives
+from .suggestion_improvement import regenerate_incorrect_answers
+class LearningObjectiveGenerator:
+    """Simple orchestrator for learning objective generation."""
+    def __init__(self, api_key: str, model: str = "gpt-5", temperature: float = 1.0):
+        self.client = OpenAI(api_key=api_key)
+        self.model = model
+        self.temperature = temperature
+    def generate_base_learning_objectives(self, file_contents: List[str], num_objectives: int) -> List[BaseLearningObjective]:
+        """Generate base learning objectives without incorrect answer suggestions."""
+        return generate_base_learning_objectives(
+            self.client, self.model, self.temperature, file_contents, num_objectives
+        )
+    def group_base_learning_objectives(self, base_objectives: List[BaseLearningObjective], file_contents: List[str]) -> Dict[str, List]:
+        """Group base learning objectives and identify the best in each group."""
+        return group_base_learning_objectives(
+            self.client, self.model, self.temperature, base_objectives, file_contents
+        )
+    def generate_incorrect_answer_options(self, file_contents: List[str], base_objectives: List[BaseLearningObjective], model_override: str = None) -> List[LearningObjective]:
+        """Generate incorrect answer options for the given base learning objectives."""
+        return generate_incorrect_answer_options(
+            self.client, self.model, self.temperature, file_contents, base_objectives, model_override
+        )
+    def generate_and_group_learning_objectives(self, file_contents: List[str], num_objectives: int, model_override: str = None) -> Dict[str, List]:
+        """Complete workflow: generate base objectives, group them, and generate incorrect answers only for best in group."""
+        # Step 1: Generate base learning objectives
+        base_objectives = self.generate_base_learning_objectives(file_contents, num_objectives)
+        # Step 2: Group base learning objectives and get best in group
+        grouped_result = self.group_base_learning_objectives(base_objectives, file_contents)
+        best_in_group_base = grouped_result["best_in_group"]
+        # Step 3: Generate incorrect answer suggestions only for best in group objectives
+        enhanced_best_objectives = self.generate_incorrect_answer_options(file_contents, best_in_group_base, model_override)
+        # Return both the full grouped list (without enhancements) and the enhanced best-in-group list
+        return {
+            "all_grouped": grouped_result["all_grouped"],
+            "best_in_group": enhanced_best_objectives
+        }
+    def regenerate_incorrect_answers(self, learning_objectives: List[LearningObjective], file_contents: List[str]) -> List[LearningObjective]:
+        """Regenerate incorrect answer suggestions for learning objectives that need improvement."""
+        return regenerate_incorrect_answers(
+            self.client, self.model, self.temperature, learning_objectives, file_contents
+        )

learning_objective_generator/grouping_and_ranking.py ADDED Viewed

	@@ -0,0 +1,328 @@

+from typing import List, Dict, Any
+from openai import OpenAI
+import json
+from models import LearningObjective, BaseLearningObjective, GroupedLearningObjectivesResponse, GroupedBaseLearningObjectivesResponse
+from prompts.learning_objectives import BASE_LEARNING_OBJECTIVES_PROMPT, BLOOMS_TAXONOMY_LEVELS, LEARNING_OBJECTIVE_EXAMPLES
+def group_learning_objectives(client: OpenAI, model: str, temperature: float, learning_objectives: List[LearningObjective], file_contents: List[str]) -> dict:
+    """Group learning objectives and return both the full ranked list and the best-in-group list as Python objects."""
+    try:
+        print(f"Grouping {len(learning_objectives)} learning objectives")
+        objectives_to_rank = learning_objectives
+        if not objectives_to_rank:
+            return learning_objectives  # Nothing to rank
+        # Create combined content for context
+        combined_content = "\n\n".join(file_contents)
+        # Format the objectives for display in the prompt
+        objectives_display = "\n".join([f"ID: {obj.id}\nLearning Objective: {obj.learning_objective}\nSource: {obj.source_reference}\nCorrect Answer: {getattr(obj, 'correct_answer', '')}\nIncorrect Answer Options: {json.dumps(getattr(obj, 'incorrect_answer_options', []))}\n" for obj in objectives_to_rank])
+        # Create prompt for ranking using the same context as generation but without duplicating content
+        ranking_prompt = f"""
+        The generation prompt below was used to generate the learning objectives and now your job is to group and determine the best in the group. Group according
+        to topic overlap, and select the best in the group according to the criteria in the generation prompt.
+        Here's the generation prompt:
+        <generation prompt>
+        You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials.
+        {BASE_LEARNING_OBJECTIVES_PROMPT}
+        <BloomsTaxonomyLevels>
+        {BLOOMS_TAXONOMY_LEVELS}
+        </BloomsTaxonomyLevels>
+        Here is an example of high quality learning objectives:
+        <learning objectives>
+        {LEARNING_OBJECTIVE_EXAMPLES}
+        </learning objectives>
+        Use the below course content to assess topic overlap. The source references are embedded in xml tags within the context.
+        <course content>
+        {combined_content}
+        </course content>
+        </generation prompt>
+        The learning objectives below were generated based on the content and criteria in the generation prompt above. Now your task is to group these learning objectives
+        based on how well they meet the criteria described in the generation prompt above.
+        IMPORTANT GROUPING INSTRUCTIONS:
+        1. Group learning objectives by similarity, including those that cover the same foundational concept.
+        2. Return a JSON array with each objective's original ID and its group information ("in_group": bool, "group_members": list[int], "best_in_group": bool). See example below.
+        3. Consider clarity, specificity, alignment with the course content, and how well each objective follows the criteria in the generation prompt.
+        4. Identify groups of similar learning objectives that cover essentially the same concept or knowledge area.
+        5. For each objective, indicate whether it belongs to a group of similar objectives by setting "in_group" to true or false.
+        6. For objectives that are part of a group, include a "group_members" field with a list of all IDs in that group (including the objective itself). If an objective is not part of a group, set "group_members" to a list containing only the objective's ID.
+        7. For each objective, add a boolean field "best_in_group": set this to true for the highest-quality objective in each group, and false for all others in the group. For objectives not in a group, set "best_in_group" to true by default.
+        8. SPECIAL INSTRUCTION: All objectives with IDs ending in 1 (like 1001, 2001, etc.) are the first objectives from different generation runs. Group ALL of these together and mark the best one as "best_in_group": true. This is critical for ensuring one of these objectives is selected as the primary objective:
+           a. Group ALL objectives with IDs ending in 1 together in the SAME group.
+           b. Evaluate these objectives carefully and select the SINGLE best one based on clarity, specificity, and alignment with course content.
+           c. Mark ONLY the best one with "best_in_group": true and all others with "best_in_group": false.
+           d. This objective will later be assigned ID=1 and will serve as the primary objective, so choose the highest quality one.
+           e. If you find other objectives that cover the same concept but don't have IDs ending in 1, include them in this group but do NOT mark them as best_in_group.
+        Here are the learning objectives to group:
+        <learning objectives>
+        {objectives_display}
+        </learning objectives>
+        Return your grouped learning objectives as a JSON array in this format. Each objective must include ALL of the following fields:
+        [
+            {{
+            "id": int,
+            "learning_objective": str,
+            "source_reference": list[str] or str,
+            "correct_answer": str,
+            "incorrect_answer_suggestions": list[str],
+            "in_group": bool,
+            "group_members": list[int],
+            "best_in_group": bool
+            }},
+            ...
+        ]
+        Example:
+        [
+            {{
+            "id": 3,
+            "learning_objective": "Describe the main applications of AI agents.",
+            "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+            "correct_answer": "AI agents are used for automation, decision-making, and information retrieval.",
+            "incorrect_answer_suggestions": [
+            "AI agents are  used for automation and data analysis",
+            "AI agents are  designed for information retrieval and prediction",
+            "AI agents are specialized for either automation or decision-making",
+],
+            "in_group": true,
+            "group_members": [3, 5, 7],
+            "best_in_group": true
+            }}
+        ]
+        """
+        # Use OpenAI beta API for structured output
+        try:
+            params = {
+                "model": "gpt-5-mini",
+                "messages": [
+                    {"role": "system", "content": "You are an expert educational content evaluator."},
+                    {"role": "user", "content": ranking_prompt}
+                ],
+                "response_format": GroupedLearningObjectivesResponse
+            }
+            completion = client.beta.chat.completions.parse(**params)
+            grouped_results = completion.choices[0].message.parsed.grouped_objectives
+            print(f"Received {len(grouped_results)} grouped results")
+            # Normalize best_in_group to Python bool
+            for obj in grouped_results:
+                val = getattr(obj, "best_in_group", False)
+                if isinstance(val, str):
+                    obj.best_in_group = val.lower() == "true"
+                elif isinstance(val, (bool, int)):
+                    obj.best_in_group = bool(val)
+                else:
+                    obj.best_in_group = False
+            # if id_one_objective:
+            #     final_objectives[0].best_in_group = True
+            # Initialize final_objectives with the grouped results
+            final_objectives = []
+            for obj in grouped_results:
+                final_objectives.append(obj)
+            # Filter for best-in-group objectives (including id==1 always)
+            best_in_group_objectives = [obj for obj in final_objectives if getattr(obj, "best_in_group", False) is True]
+            return {
+                "all_grouped": final_objectives,
+                "best_in_group": best_in_group_objectives
+            }
+        except Exception as e:
+            print(f"Error ranking learning objectives: {e}")
+            return {"all_grouped": learning_objectives, "best_in_group": get_best_in_group_objectives(learning_objectives)}
+    except Exception as e:
+        print(f"Error ranking learning objectives: {e}")
+        return {"all_grouped": learning_objectives, "best_in_group": get_best_in_group_objectives(learning_objectives)}
+def get_best_in_group_objectives(grouped_objectives: list) -> list:
+    """Return only objectives where best_in_group is True or id==1, ensuring Python bools."""
+    best_in_group_objectives = []
+    for obj in grouped_objectives:
+        val = getattr(obj, "best_in_group", False)
+        if isinstance(val, str):
+            obj.best_in_group = val.lower() == "true"
+        elif isinstance(val, (bool, int)):
+            obj.best_in_group = bool(val)
+        else:
+            obj.best_in_group = False
+        if obj.best_in_group is True:
+            best_in_group_objectives.append(obj)
+    return best_in_group_objectives
+def group_base_learning_objectives(client: OpenAI, model: str, temperature: float, base_objectives: List[BaseLearningObjective], file_contents: List[str]) -> Dict[str, List]:
+    """Group base learning objectives (without incorrect answer options) and return both the full grouped list and the best-in-group list."""
+    try:
+        print(f"Grouping {len(base_objectives)} base learning objectives")
+        objectives_to_group = base_objectives
+        if not objectives_to_group:
+            return {"all_grouped": base_objectives, "best_in_group": base_objectives}  # Nothing to group
+        # Create combined content for context
+        combined_content = "\n\n".join(file_contents)
+        # Format the objectives for display in the prompt
+        objectives_display = "\n".join([f"ID: {obj.id}\nLearning Objective: {obj.learning_objective}\nSource: {obj.source_reference}\nCorrect Answer: {getattr(obj, 'correct_answer', '')}\n" for obj in objectives_to_group])
+        # Create prompt for grouping using the same context as generation but without duplicating content
+        grouping_prompt = f"""
+        The generation prompt below was used to generate the learning objectives and now your job is to group and determine the best in the group. Group according
+        to topic overlap, and select the best in the group according to the criteria in the generation prompt.
+        Here's the generation prompt:
+        <generation prompt>
+        You are an expert educational content creator specializing in creating precise, relevant learning objectives from course materials.
+        {BASE_LEARNING_OBJECTIVES_PROMPT}
+        <BloomsTaxonomyLevels>
+        {BLOOMS_TAXONOMY_LEVELS}
+        </BloomsTaxonomyLevels>
+        Here is an example of high quality learning objectives:
+        <learning objectives>
+        {LEARNING_OBJECTIVE_EXAMPLES}
+        </learning objectives>
+        Below is the course content. The source references are embedded in xml tags within the context.
+        <course content>
+        {combined_content}
+        </course content>
+        </generation prompt>
+        The learning objectives below were generated based on the content and criteria in the generation prompt above. Now your task is to group these learning objectives
+        based on how well they meet the criteria described in the generation prompt above.
+        IMPORTANT GROUPING INSTRUCTIONS:
+        1. Group learning objectives by similarity, including those that cover the same foundational concept.
+        2. Return a JSON array with each objective's original ID and its group information ("in_group": bool, "group_members": list[int], "best_in_group": bool). See example below.
+        3. Consider clarity, specificity, alignment with the course content, and how well each objective follows the criteria in the generation prompt.
+        4. Identify groups of similar learning objectives that cover essentially the same concept or knowledge area.
+        5. For each objective, indicate whether it belongs to a group of similar objectives by setting "in_group" to true or false.
+        6. For objectives that are part of a group, include a "group_members" field with a list of all IDs in that group (including the objective itself). If an objective is not part of a group, set "group_members" to a list containing only the objective's ID.
+        7. For each objective, add a boolean field "best_in_group": set this to true for the highest-quality objective in each group, and false for all others in the group. For objectives not in a group, set "best_in_group" to true by default.
+        8. SPECIAL INSTRUCTION: All objectives with IDs ending in 1 (like 1001, 2001, etc.) are the first objectives from different generation runs. Group ALL of these together and mark the best one as "best_in_group": true. This is critical for ensuring one of these objectives is selected as the primary objective:
+           a. Group ALL objectives with IDs ending in 1 together in the SAME group.
+           b. Evaluate these objectives carefully and select the SINGLE best one based on clarity, specificity, and alignment with course content.
+           c. Mark ONLY the best one with "best_in_group": true and all others with "best_in_group": false.
+           d. This objective will later be assigned ID=1 and will serve as the primary objective, so choose the highest quality one.
+           e. If you find other objectives that cover the same concept but don't have IDs ending in 1, include them in this group but do NOT mark them as best_in_group.
+        Here are the learning objectives to group:
+        <learning_objectives>
+        {objectives_display}
+        </learning_objectives>
+        Your response should be a JSON array of objects with this structure:
+        [
+            {{
+            "id": int,
+            "learning_objective": str,
+            "source_reference": Union[List[str], str],
+            "correct_answer": str,
+            "in_group": bool,
+            "group_members": list[int],
+            "best_in_group": bool
+            }},
+            ...
+        ]
+        Example:
+        [
+            {{
+            "id": 3,
+            "learning_objective": "Describe the main applications of AI agents.",
+            "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+            "correct_answer": "AI agents are used for automation, decision-making, and information retrieval.",
+            "in_group": true,
+            "group_members": [3, 5, 7],
+            "best_in_group": true
+            }}
+        ]
+        """
+        # Use OpenAI beta API for structured output
+        try:
+            params = {
+                "model": "gpt-5-mini",
+                "messages": [
+                    {"role": "system", "content": "You are an expert educational content evaluator."},
+                    {"role": "user", "content": grouping_prompt}
+                ],
+                "response_format": GroupedBaseLearningObjectivesResponse
+            }
+            completion = client.beta.chat.completions.parse(**params)
+            grouped_results = completion.choices[0].message.parsed.grouped_objectives
+            print(f"Received {len(grouped_results)} grouped results")
+            # Normalize best_in_group to Python bool
+            for obj in grouped_results:
+                val = getattr(obj, "best_in_group", False)
+                if isinstance(val, str):
+                    obj.best_in_group = val.lower() == "true"
+                elif isinstance(val, (bool, int)):
+                    obj.best_in_group = bool(val)
+                else:
+                    obj.best_in_group = False
+            # Initialize final_objectives with the grouped results
+            final_objectives = []
+            for obj in grouped_results:
+                final_objectives.append(obj)
+            # Filter for best-in-group objectives (including id==1 always)
+            best_in_group_objectives = [obj for obj in final_objectives if getattr(obj, "best_in_group", False) is True]
+            return {
+                "all_grouped": final_objectives,
+                "best_in_group": best_in_group_objectives
+            }
+        except Exception as e:
+            print(f"Error grouping base learning objectives: {e}")
+            # If there's an error, just mark all objectives as best-in-group
+            for obj in base_objectives:
+                obj.in_group = False
+                obj.group_members = [obj.id]
+                obj.best_in_group = True
+            return {"all_grouped": base_objectives, "best_in_group": base_objectives}
+    except Exception as e:
+        print(f"Error grouping base learning objectives: {e}")
+        # If there's an error, just mark all objectives as best-in-group
+        for obj in base_objectives:
+            obj.in_group = False
+            obj.group_members = [obj.id]
+            obj.best_in_group = True
+        return {"all_grouped": base_objectives, "best_in_group": base_objectives}

learning_objective_generator/suggestion_improvement.py ADDED Viewed

	@@ -0,0 +1,393 @@

+from typing import List, Tuple
+import os
+import json
+from openai import OpenAI
+from models import LearningObjective
+from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from ui.run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+def should_regenerate_individual_suggestion(client: OpenAI, model: str, temperature: float,
+                                           learning_objective: LearningObjective,
+                                           option: str,
+                                           file_contents: List[str]) -> Tuple[bool, str]:
+    """
+    Check if an individual incorrect answer option needs regeneration.
+    Args:
+        client: OpenAI client
+        model: Model name to use for regeneration
+        temperature: Temperature for generation
+        learning_objective: Learning objective to check
+        option: The individual option to check
+        file_contents: List of file contents with source tags
+    Returns:
+        Tuple of (needs_regeneration, reason)
+    """
+    # Extract relevant content from file_contents
+    combined_content = ""
+    if hasattr(learning_objective, 'source_reference') and learning_objective.source_reference:
+        source_references = learning_objective.source_reference if isinstance(learning_objective.source_reference, list) else [learning_objective.source_reference]
+        for source_file in source_references:
+            for file_content in file_contents:
+                if f"<source file='{source_file}'>" in file_content:
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    break
+    # If no content found, use all content
+    if not combined_content:
+        combined_content = "\n\n".join(file_contents)
+    # Create a prompt to evaluate the individual suggestion
+    prompt = f"""
+You are evaluating the quality of an incorrect answer suggestion for a learning objective. You are going to the incorrect answer option and determine if it needs to be regenerated.
+Learning Objective: {learning_objective.learning_objective}
+Use the correct answer to help you make informed decisions:
+Correct Answer: {learning_objective.correct_answer}
+Incorrect Answer Option to Evaluate: {option}
+Use the relevant content from the course content to help you make informed decisions:
+COURSE CONTENT:
+{combined_content}
+Here are some examples of high quality incorrect answer suggestions which you should use to make informed decisions about whether regeneration of options is needed:
+<incorrect_answer_examples_with_explanation>
+{INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+</incorrect_answer_examples_with_explanation>
+Based on the above examples, evaluate this incorrect answer suggestion.
+Respond with TRUE if the incorrect answer suggestion needs regeneration, or FALSE if it is good quality.
+If TRUE, briefly explain why regeneration is needed in this format: "true – reason for regeneration". Cite the examples with explanation that you used to make your decision.
+If FALSE, respond with just "false".
+"""
+    # Use a lightweight model for evaluation
+    params = {
+        "model": "gpt-5-mini",
+        "messages": [
+            {"role": "system", "content": "You are an expert in educational assessment design and will determine if an incorrect answer option needs to be regenerated according to a set of quality standards, and examples of good and bad incorrect answer options."},
+            {"role": "user", "content": prompt}
+        ]
+    }
+    try:
+        completion = client.chat.completions.create(**params)
+        response_text = completion.choices[0].message.content.strip().lower()
+        # Check if regeneration is needed and extract reason
+        needs_regeneration = response_text.startswith("true")
+        reason = ""
+        if needs_regeneration and "–" in response_text:
+            parts = response_text.split("–", 1)
+            if len(parts) > 1:
+                reason = "– " + parts[1].strip()
+        # Log the evaluation result
+        run_manager = _get_run_manager()
+        if needs_regeneration:
+            # # Create debug directory if it doesn't exist
+            # debug_dir = os.path.join("incorrect_suggestion_debug")
+            # os.makedirs(debug_dir, exist_ok=True)
+            # suggestion_id = learning_objective.incorrect_answer_options.index(suggestion) if suggestion in learning_objective.incorrect_answer_options else "unknown"
+            # with open(os.path.join(debug_dir, f"lo_{learning_objective.id}_suggestion_{suggestion_id}_evaluation.txt"), "w") as f:
+            #     f.write(f"Learning Objective: {learning_objective.learning_objective}\n")
+            #     f.write(f"Correct Answer: {learning_objective.correct_answer}\n")
+            #     f.write(f"Incorrect Answer Option: {option}\n\n")
+            #     f.write(f"Evaluation Response: {response_text}\n")
+            if run_manager:
+                run_manager.log(f"Option '{option[:50]}...' needs regeneration: True - {reason}", level="DEBUG")
+            else:
+                print(f"Option '{option[:50]}...' needs regeneration: True - {reason}")
+        else:
+            if run_manager:
+                run_manager.log(f"Option '{option[:50]}...' is good quality, keeping as is", level="DEBUG")
+            else:
+                print(f"Option '{option[:50]}...' is good quality, keeping as is")
+        return needs_regeneration, reason
+    except Exception as e:
+        run_manager = _get_run_manager()
+        if run_manager:
+            run_manager.log(f"Error evaluating option '{option[:50]}...': {e}", level="ERROR")
+        else:
+            print(f"Error evaluating option '{option[:50]}...': {e}")
+        # If there's an error, assume regeneration is needed with a generic reason
+        return True, "– error during evaluation"
+def regenerate_individual_suggestion(client: OpenAI, model: str, temperature: float,
+                                    learning_objective: LearningObjective,
+                                    option_to_replace: str,
+                                    file_contents: List[str],
+                                    reason: str = "") -> str:
+    """
+    Regenerate an individual incorrect answer option.
+    Args:
+        client: OpenAI client
+        model: Model name to use for regeneration
+        temperature: Temperature for generation
+        learning_objective: Learning objective containing the option
+        option_to_replace: The incorrect answer option to replace
+        file_contents: List of file contents with source tags
+        reason: The reason for regeneration (optional)
+    Returns:
+        A new incorrect answer option
+    """
+    run_manager = _get_run_manager()
+    if run_manager:
+        run_manager.log(f"Regenerating suggestion for learning objective {learning_objective.id}", level="DEBUG")
+    else:
+        print(f"Regenerating suggestion for learning objective {learning_objective.id}")
+    # Extract relevant content from file_contents
+    combined_content = ""
+    if hasattr(learning_objective, 'source_reference') and learning_objective.source_reference:
+        source_references = learning_objective.source_reference if isinstance(learning_objective.source_reference, list) else [learning_objective.source_reference]
+        for source_file in source_references:
+            for file_content in file_contents:
+                if f"<source file='{source_file}'>" in file_content:
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    break
+    # If no content found, use all content
+    if not combined_content:
+        combined_content = "\n\n".join(file_contents)
+    # If no reason provided, use a default one
+    if not reason:
+        reason = "– no reason provided"
+    # Create a prompt to regenerate the suggestion
+    prompt = f"""
+    You are generating a high-quality incorrect answer option for a learning objective.
+    Consider the learning objective and it's correct answer to generate an incorrect answer option.
+    Learning Objective: {learning_objective.learning_objective}
+    Correct Answer: {learning_objective.correct_answer}
+    Current Incorrect Answer Options:
+    {json.dumps(learning_objective.incorrect_answer_options, indent=2)}
+    The following option needs improvement: {option_to_replace}
+    Consider the following reason for improvement in order to make the option better: {reason}
+    Use the relevant content from the course content to help you make informed decisions:
+    COURSE CONTENT:
+    {combined_content}
+    Refer to the examples with explanation below to generate a new incorrect answer option:
+    <incorrect_answer_examples_with_explanation>
+    {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+    </incorrect_answer_examples_with_explanation>
+    Based on the above quality standards and examples, generate a new incorrect answer option.
+    Provide ONLY the new incorrect answer option, with no additional explanation.
+    """
+    # # Use the specified model for regeneration
+    # params = {
+    #     "model": model,
+    #     "messages": [
+    #         {"role": "system", "content": "You are an expert in educational assessment design."},
+    #         {"role": "user", "content": prompt}
+    #     ],
+    #     "temperature": temperature
+    # }
+    params = {
+        "model": "gpt-5-mini",
+        "messages": [
+            {"role": "system", "content": "You are an expert in educational assessment design. You will generate a new incorrect answer option for a learning objective based on a set of quality standards, and examples of good and bad incorrect answer options."},
+            {"role": "user", "content": prompt}
+        ]
+    }
+    try:
+        completion = client.chat.completions.create(**params)
+        new_suggestion = completion.choices[0].message.content.strip()
+        # Only create debug files if the suggestion actually changed
+        run_manager = _get_run_manager()
+        if new_suggestion != option_to_replace:
+            # Create debug directory if it doesn't exist
+            debug_dir = os.path.join("incorrect_suggestion_debug")
+            os.makedirs(debug_dir, exist_ok=True)
+            # Log the regeneration in the question-style format
+            suggestion_id = learning_objective.incorrect_answer_options.index(option_to_replace) if option_to_replace in learning_objective.incorrect_answer_options else "unknown"
+            # Format the log message in the same format as question regeneration
+            log_message = f"""Learning Objective ID: {learning_objective.id}
+Learning Objective: {learning_objective.learning_objective}
+REASON FOR REGENERATION:
+{reason}
+BEFORE:
+Option Text: {option_to_replace}
+Feedback: Incorrect answer representing a common misconception.
+AFTER:
+Option Text: {new_suggestion}
+Feedback: Incorrect answer representing a common misconception.
+"""
+            # Write to the log file
+            log_file = os.path.join(debug_dir, f"lo_{learning_objective.id}_suggestion_{suggestion_id}.txt")
+            with open(log_file, "w") as f:
+                f.write(log_message)
+            # Also log to run manager
+            if run_manager:
+                run_manager.log(f"Regenerated Option for Learning Objective {learning_objective.id}, Option {suggestion_id}", level="DEBUG")
+                run_manager.log(f"BEFORE: {option_to_replace[:80]}...", level="DEBUG")
+                run_manager.log(f"AFTER:  {new_suggestion[:80]}...", level="DEBUG")
+                run_manager.log(f"Log saved to {log_file}", level="DEBUG")
+            else:
+                print(f"\n--- Regenerated Option for Learning Objective {learning_objective.id}, Option {suggestion_id} ---")
+                print(f"BEFORE: {option_to_replace}")
+                print(f"AFTER:  {new_suggestion}")
+                print(f"Log saved to {log_file}")
+        else:
+            if run_manager:
+                run_manager.log(f"Generated option is identical to original, not saving debug file", level="DEBUG")
+            else:
+                print(f"Generated option is identical to original, not saving debug file")
+        return new_suggestion
+    except Exception as e:
+        run_manager = _get_run_manager()
+        if run_manager:
+            run_manager.log(f"Error regenerating option: {e}", level="ERROR")
+        else:
+            print(f"Error regenerating option: {e}")
+        # If there's an error, return the original option
+        return option_to_replace
+def regenerate_incorrect_answers(client: OpenAI, model: str, temperature: float,
+                                 learning_objectives: List[LearningObjective],
+                                 file_contents: List[str]) -> List[LearningObjective]:
+    """
+    Regenerate incorrect answer options for all learning objectives.
+    Args:
+        client: OpenAI client
+        model: Model name to use for regeneration
+        temperature: Temperature for generation
+        learning_objectives: List of learning objectives to improve
+        file_contents: List of file contents with source tags
+    Returns:
+        The same list of learning objectives with improved incorrect answer options
+    """
+    run_manager = _get_run_manager()
+    if run_manager:
+        run_manager.log(f"Regenerating incorrect answers for {len(learning_objectives)} learning objectives", level="INFO")
+    else:
+        print(f"Regenerating incorrect answers for {len(learning_objectives)} learning objectives")
+    for i, lo in enumerate(learning_objectives):
+        if run_manager:
+            run_manager.log(f"Processing learning objective {i+1}/{len(learning_objectives)}: {lo.id}", level="INFO")
+        else:
+            print(f"Processing learning objective {i+1}/{len(learning_objectives)}: {lo.id}")
+        # Check each suggestion individually
+        if lo.incorrect_answer_options:
+            new_suggestions = []
+            for j, option in enumerate(lo.incorrect_answer_options):
+                # Check if this specific suggestion needs regeneration
+                needs_regeneration, reason = should_regenerate_individual_suggestion(client, model, temperature, lo, option, file_contents)
+                if needs_regeneration:
+                    # Regenerate this specific suggestion with the reason
+                    if run_manager:
+                        run_manager.log(f"Regenerating option '{option[:50]}...' for learning objective {lo.id}", level="INFO")
+                    else:
+                        print(f"Regenerating option '{option[:50]}...' for learning objective {lo.id}")
+                    # Initialize variables for the regeneration loop
+                    current_option = option
+                    max_iterations = 5
+                    iteration = 0
+                    # Loop until we get a good option or reach max iterations
+                    while needs_regeneration and iteration < max_iterations:
+                        iteration += 1
+                        if run_manager:
+                            run_manager.log(f"  Regeneration attempt {iteration}/{max_iterations}", level="INFO")
+                        else:
+                            print(f"  Regeneration attempt {iteration}/{max_iterations}")
+                        # Regenerate the option
+                        new_option = regenerate_individual_suggestion(client, model, temperature, lo, current_option, file_contents, reason)
+                        # Check if the new option still needs regeneration
+                        if iteration < max_iterations:  # Skip check on last iteration to save API calls
+                            needs_regeneration, new_reason = should_regenerate_individual_suggestion(client, model, temperature, lo, new_option, file_contents)
+                            if needs_regeneration:
+                                if run_manager:
+                                    run_manager.log(f"  Regenerated option still needs improvement: {new_reason}", level="DEBUG")
+                                else:
+                                    print(f"  Regenerated option still needs improvement: {new_reason}")
+                                current_option = new_option
+                                reason = new_reason
+                            else:
+                                if run_manager:
+                                    run_manager.log(f"  Regenerated option passes quality check on attempt {iteration}", level="INFO")
+                                else:
+                                    print(f"  Regenerated option passes quality check on attempt {iteration}")
+                        else:
+                            needs_regeneration = False
+                    # Use the final regenerated option
+                    new_suggestions.append(new_option)
+                else:
+                    # Keep the original suggestion
+                    if run_manager:
+                        run_manager.log(f"Keeping original option '{option[:50]}...' for learning objective {lo.id}", level="INFO")
+                    else:
+                        print(f"Keeping original option '{option[:50]}...' for learning objective {lo.id}")
+                    new_suggestions.append(option)
+            # Update the learning objective with the new suggestions
+            lo.incorrect_answer_options = new_suggestions
+        else:
+            # If there are no suggestions, generate completely new ones
+            if run_manager:
+                run_manager.log(f"No incorrect answer options found for learning objective {lo.id}, generating new ones", level="INFO")
+            else:
+                print(f"No incorrect answer options found for learning objective {lo.id}, generating new ones")
+            # This would typically call back to the enhancement.py function, but to avoid circular imports,
+            # we'll just leave it empty and let the next generation cycle handle it
+            lo.incorrect_answer_options = []
+    return learning_objectives

models/__init__.py ADDED Viewed

	@@ -0,0 +1,60 @@

+# Learning objectives
+from .learning_objectives import (
+    BaseLearningObjective,
+    LearningObjective,
+    GroupedLearningObjective,
+    GroupedBaseLearningObjective,
+    LearningObjectivesResponse,
+    GroupedLearningObjectivesResponse,
+    GroupedBaseLearningObjectivesResponse,
+    BaseLearningObjectiveWithoutCorrectAnswer, BaseLearningObjectivesWithoutCorrectAnswerResponse
+)
+# Config
+from .config import (MODELS, TEMPERATURE_UNAVAILABLE)
+# Questions
+from .questions import (
+    MultipleChoiceOption,
+    MultipleChoiceQuestion,
+    RankedNoGroupMultipleChoiceQuestion,
+    RankedMultipleChoiceQuestion,
+    GroupedMultipleChoiceQuestion,
+    MultipleChoiceQuestionFromFeedback,
+    RankedNoGroupMultipleChoiceQuestionsResponse,
+    RankedMultipleChoiceQuestionsResponse,
+    GroupedMultipleChoiceQuestionsResponse
+)
+# Assessment
+from .assessment import Assessment
+__all__ = [
+    # Learning objectives
+    'BaseLearningObjective',
+    'LearningObjective',
+    'GroupedLearningObjective',
+    'GroupedBaseLearningObjective',
+    'LearningObjectivesResponse',
+    'GroupedLearningObjectivesResponse',
+    'GroupedBaseLearningObjectivesResponse',
+    'BaseLearningObjectiveWithoutCorrectAnswer', 'BaseLearningObjectivesWithoutCorrectAnswerResponse',
+    # Config
+    'MODELS',
+    'TEMPERATURE_UNAVAILABLE',
+    # Questions
+    'MultipleChoiceOption',
+    'MultipleChoiceQuestion',
+    'RankedNoGroupMultipleChoiceQuestion',
+    'RankedMultipleChoiceQuestion',
+    'GroupedMultipleChoiceQuestion',
+    'MultipleChoiceQuestionFromFeedback',
+    'RankedNoGroupMultipleChoiceQuestionsResponse',
+    'RankedMultipleChoiceQuestionsResponse',
+    'GroupedMultipleChoiceQuestionsResponse',
+    # Assessment
+    'Assessment'
+]

models/assessment.py ADDED Viewed

	@@ -0,0 +1,10 @@

+from typing import List
+from pydantic import BaseModel, Field
+from .learning_objectives import LearningObjective
+from .questions import RankedMultipleChoiceQuestion
+class Assessment(BaseModel):
+    """Model for an assessment."""
+    learning_objectives: List[LearningObjective] = Field(description="List of learning objectives")
+    questions: List[RankedMultipleChoiceQuestion] = Field(description="List of ranked questions")

models/config.py ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ MODELS = ["o3-mini","o1","gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5.2", "gpt-5.1", "gpt-5", "gpt-5-mini", "gpt-5-nano"]
2	+
3	+ TEMPERATURE_UNAVAILABLE = {"o3-mini": True,"o1": True,"gpt-4.1": False, "gpt-4o": False, "gpt-4o-mini": False, "gpt-4": False, "gpt-3.5-turbo": False, "gpt-5.2": True, "gpt-5.1": True, "gpt-5": True, "gpt-5-mini": True, "gpt-5-nano": True}
4	+

models/learning_objectives.py ADDED Viewed

	@@ -0,0 +1,59 @@

+from typing import List, Optional, Union
+from pydantic import BaseModel, Field
+class BaseLearningObjectiveWithoutCorrectAnswer(BaseModel):
+    """Model for a learning objective without a correct answer."""
+    id: int = Field(description="Unique identifier for the learning objective")
+    learning_objective: str = Field(description="Description of the learning objective")
+    source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
+class BaseLearningObjective(BaseModel):
+    """Model for a learning objective."""
+    id: int = Field(description="Unique identifier for the learning objective")
+    learning_objective: str = Field(description="Description of the learning objective")
+    source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
+    correct_answer: str = Field(description="Correct answer to the learning objective")
+class LearningObjective(BaseModel):
+    """Model for a learning objective."""
+    id: int = Field(description="Unique identifier for the learning objective")
+    learning_objective: str = Field(description="Description of the learning objective")
+    source_reference: Union[List[str], str] = Field(description="Paths to the files from which this learning objective was extracted")
+    correct_answer: str = Field(description="Correct answer to the learning objective")
+    incorrect_answer_options: Union[List[str], str] = Field(description="A list of five incorrect answer options")
+    in_group: Optional[bool] = Field(default=None, description="Whether this objective is part of a group")
+    group_members: Optional[List[int]] = Field(default=None, description="List of IDs of objectives in the same group")
+    best_in_group: Optional[bool] = Field(default=None, description="Whether this is the best objective in its group")
+class GroupedLearningObjective(LearningObjective):
+    """Model for a learning objective that has been grouped."""
+    in_group: bool = Field(description="Whether this objective is part of a group of similar objectives")
+    group_members: List[int] = Field(description="List of IDs of all objectives in the same similarity group, including this one")
+    best_in_group: bool = Field(description="True if this objective is the highest ranked in its group")
+class GroupedBaseLearningObjective(BaseLearningObjective):
+    """Model for a base learning objective that has been grouped (without incorrect answer suggestions)."""
+    in_group: bool = Field(description="Whether this objective is part of a group of similar objectives")
+    group_members: List[int] = Field(description="List of IDs of all objectives in the same similarity group, including this one")
+    best_in_group: bool = Field(description="True if this objective is the highest ranked in its group")
+# Response models for learning objectives
+class BaseLearningObjectivesWithoutCorrectAnswerResponse(BaseModel):
+    objectives: List[BaseLearningObjectiveWithoutCorrectAnswer] = Field(description="List of learning objectives without correct answers")
+class LearningObjectivesResponse(BaseModel):
+    objectives: List[LearningObjective] = Field(description="List of learning objectives")
+class GroupedLearningObjectivesResponse(BaseModel):
+    grouped_objectives: List[GroupedLearningObjective] = Field(description="List of grouped learning objectives")
+class GroupedBaseLearningObjectivesResponse(BaseModel):
+    grouped_objectives: List[GroupedBaseLearningObjective] = Field(description="List of grouped base learning objectives")

models/questions.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from typing import List, Optional, Union
+from pydantic import BaseModel, Field
+class MultipleChoiceOption(BaseModel):
+    """Model for a multiple choice option."""
+    option_text: str = Field(description="Text of the option")
+    is_correct: bool = Field(description="Whether this option is correct")
+    feedback: str = Field(description="Feedback for this option")
+class MultipleChoiceQuestion(BaseModel):
+    """Model for a multiple choice question."""
+    id: int = Field(description="Unique identifier for the question")
+    question_text: str = Field(description="Text of the question")
+    options: List[MultipleChoiceOption] = Field(description="List of options for the question")
+    learning_objective_id: int = Field(description="ID of the learning objective this question addresses")
+    learning_objective: str = Field(description="Learning objective this question addresses")
+    correct_answer: str = Field(description="Correct answer to the question")
+    source_reference: Union[List[str], str] = Field(description="Paths to the files from which this question was extracted")
+    judge_feedback: Optional[str] = Field(None, description="Feedback from the LLM judge")
+    approved: Optional[bool] = Field(None, description="Whether this question has been approved by the LLM judge")
+class RankedNoGroupMultipleChoiceQuestion(MultipleChoiceQuestion):
+    """Model for a multiple choice question that has been ranked but not grouped."""
+    rank: int = Field(description="Rank assigned to the question (1 = best)")
+    ranking_reasoning: str = Field(description="Reasoning for the assigned rank")
+class RankedMultipleChoiceQuestion(MultipleChoiceQuestion):
+    """Model for a multiple choice question that has been ranked."""
+    rank: int = Field(description="Rank assigned to the question (1 = best)")
+    ranking_reasoning: str = Field(description="Reasoning for the assigned rank")
+    in_group: bool = Field(description="Whether this question is part of a group of similar questions")
+    group_members: List[int] = Field(description="IDs of questions in the same group")
+    best_in_group: bool = Field(description="Whether this is the best question in its group")
+class GroupedMultipleChoiceQuestion(MultipleChoiceQuestion):
+    """Model for a multiple choice question that has been grouped but not ranked."""
+    in_group: bool = Field(description="Whether this question is part of a group of similar questions")
+    group_members: List[int] = Field(description="IDs of questions in the same group")
+    best_in_group: bool = Field(description="Whether this is the best question in its group")
+class MultipleChoiceQuestionFromFeedback(BaseModel):
+    """Model for a multiple choice question."""
+    id: int = Field(description="Unique identifier for the question")
+    question_text: str = Field(description="Text of the question")
+    options: List[MultipleChoiceOption] = Field(description="List of options for the question")
+    learning_objective: str = Field(description="Learning objective this question addresses")
+    source_reference: Union[List[str], str] = Field(description="Paths to the files from which this question was extracted")
+    feedback: str = Field(description="User criticism for this question, this will be found at the bottom of <QUESTION FOLLOWED BY USER CRITICISM> and it is a criticism of something which suggests a change.")
+# Response models for questions
+class RankedNoGroupMultipleChoiceQuestionsResponse(BaseModel):
+    ranked_questions: List[RankedNoGroupMultipleChoiceQuestion] = Field(description="List of ranked multiple choice questions without grouping")
+class RankedMultipleChoiceQuestionsResponse(BaseModel):
+    ranked_questions: List[RankedMultipleChoiceQuestion] = Field(description="List of ranked multiple choice questions")
+class GroupedMultipleChoiceQuestionsResponse(BaseModel):
+    grouped_questions: List[GroupedMultipleChoiceQuestion] = Field(description="List of grouped multiple choice questions")

prompts/__init__.py ADDED Viewed

File without changes

prompts/all_quality_standards.py ADDED Viewed

File without changes

prompts/incorrect_answers.py ADDED Viewed

	@@ -0,0 +1,184 @@

+INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION = """
+## 1. Here are examples of inappropriate incorrect answer options that demonstrate the importance of avoiding absolute terms and unnecessary comparisons because these terms and comparisons tend to lead to obviously incorrect options. Specifically, you should avoid using words like "always," "only", "solely", "never", "mainly", "exclusively", "primarily" or phrases like "rather than", "instead of", "regardless of", "as opposed to" . These words are absolute or extreme qualifiers and comparative terms that make it too easy to recognize incorrect answer options.
+More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
+<example>
+Learning Objective: "Explain the purpose of index partitioning in databases."
+Correct Answer: "Index partitioning improves query performance by dividing large indexes into smaller, more manageable segments."
+Inappropriate incorrect answer options to avoid:
+"Index partitioning always guarantees the fastest possible query performance regardless of data size." (Obviously wrong because it uses absolute terms "always" and "regardless of")
+"Index partitioning improves query performance rather than ensuring data integrity or providing backup functionality." (Obviously wrong with the unnecessary comparison using "rather than")
+"Index partitioning exclusively supports distributed database systems that never operate on a single server." (Obviously wrong because it uses absolute terms "exclusively" and "never")
+Appropriate incorrect answer option:
+"Index partitioning improves query performance by distributing data across multiple servers to reduce network latency." (Confuses partitioning with distributed computing but avoids absolute terms)
+</example>
+## 2. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when creating distractors that use explicit negation patterns with key terms from the learning objective.
+Rule: Flag options that contain explicit negation patterns combined with key terms from the learning objective.
+Detect these specific patterns:
+"[Topic/Concept] without [key term]"
+"[Topic/Concept] by avoiding [key action]"
+"[Topic/Concept] by minimizing [core concept]"
+"[Topic/Concept] that skips [essential process]"
+"[Topic/Concept] by eliminating [key component]"
+Common negation words to watch for: without, avoiding, minimizing, skipping, eliminating, excluding, preventing, reducing, limiting (when used with core concepts)
+Do NOT flag: Options that swap roles/behaviors between related concepts or assign different characteristics to similar methods.
+<example>
+Learning Objective: "Identify the main purpose of evaluation-driven development in the context of AI agents."
+Correct Answer: "The main purpose of evaluation-driven development is to iteratively improve AI agents by using structured evaluations to guide development and ensure consistent performance."
+Inappropriate incorrect answer options with explicit negation patterns:
+"The main purpose of evaluation-driven development is to automate all agent decisions by relying solely on predefined rules and minimizing the need for ongoing evaluation." (Pattern: "[Topic] by minimizing [key term]")
+"The main purpose of evaluation-driven development is to accelerate agent deployment by skipping iterative testing and focusing on rapid feature addition." (Pattern: "[Topic] by skipping [essential process]")
+Appropriate incorrect answer option:
+"The main purpose of evaluation-driven development is to establish standardized benchmarks that ensure consistent performance across different AI agent architectures." (No negation patterns - legitimate misconception)
+</example>
+## 3. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when using "but" clauses that explicitly negate the core concept in the learning objective, making the options obviously wrong.
+Avoid contradictory second clauses - Don't add qualifying phrases that explicitly negate the main benefit or create obvious limitations
+Keep second clauses supportive - If you include a second clause, it should reinforce the incorrect direction, not contradict it
+* Look for explicit negations using "without," "lacking", "rather than," "instead of," "but not," "but", "except", or "excluding" that directly contradict the core concept
+<example>
+Learning Objective: "Explain why observability and tracing are important when developing and evaluating AI agents."
+Correct Answer: "Observability and tracing provide detailed visibility into every step taken by an agent, making it easier to debug, monitor performance, and evaluate agent behavior systematically."
+Inappropriate incorrect answer options with problematic "but" clauses:
+"Observability and tracing facilitate real-time scaling of agent components in response to usage spikes, improving resource management but not revealing detailed decision logic." (Uses "but not revealing" to explicitly negate a core benefit of observability)
+"Observability and tracing collect aggregated performance metrics and logs for monitoring agent uptime and error trends, making them useful for system health checks but insufficient for diagnosing specific agent decision flows." (Uses "but insufficient for diagnosing" to directly contradict the debugging purpose)
+"Observability and tracing provide high-level summaries of agent outputs, which are useful for presenting results but do not address debugging or understanding the agent's internal processes." (Uses "but do not address debugging" to explicitly exclude the main benefit)
+Appropriate incorrect answer option:
+"Observability and tracing provide comprehensive logging capabilities that help teams maintain detailed audit trails for compliance and regulatory reporting requirements." (No contradictory second clause)
+</example>
+## 4. Here is an example of inappropriate incorrect answer options that demonstrates what to avoid when adding unnecessary clauses that extend beyond the core misconception being tested.
+Rule: Avoid compound sentences where the second clause introduces additional consequences, effects, or elaborations that are not essential to the primary misconception.
+Look for these patterns that indicate unnecessary length:
+- Clauses beginning with "which," "that," "providing," "enabling," "reducing," "ensuring"
+- Phrases connected by commas that add consequences or effects
+- Additional explanations after the main misconception is established
+As you can see the first example, the sentence in the incorrect answer option has two parts. The second part "reducing the complexity of data transformations in automations." must be discarded and you should regard any such type of sentence as inappropriate. This is because the option could have been considered complete without the second part.
+In the second example the section that says "simplifying tool configuration" is unnecessary and should also be discarded. The first part of the sentence would have been enough.
+<example>
+Learning Objective:
+ "Identify why integrating internal and external systems is important when building multi-agent AI applications",
+Correct Answer: "Integrating internal and external systems enables AI agents to access, process, and act on real-world data.",
+Inappropriate incorrect answer options:
+"Integrating internal and external systems enables AI agents to unify disparate data schemas into a single standardized format, reducing the complexity of data transformations in automations."
+Appropriate incorrect answer option:
+"Integrating internal and external systems enables AI agents to unify disparate data schemas into a single standardized format."
+</example>
+<example>
+"learning_objective": "Describe the importance of integrating AI agents with internal and external systems.",
+"correct_answer": "Integrating with internal and external systems enables AI agents to access and process relevant real-world data during automation.",
+Inappropriate incorrect answer options:
+        "Integrating with internal and external systems enables AI agents to unify diverse APIs into a standard interface, simplifying tool configuration.",
+Appropriate incorrect answer option:
+"Integrating with internal and external systems enables AI agents to unify diverse APIs into a standard interface"
+</example>
+## 5. Here is a learning objective and correct answer with appropriate incorrect answer options that demonstrate structural consistency with the correct answer's grammatical pattern, length, and formatting:
+<example>
+Learning Objective: "Identify the three primary data structures used in machine learning algorithms."
+Correct Answer: "The three primary data structures used in machine learning algorithms are arrays, matrices, and tensors."
+Appropriate incorrect answer options:
+"The three primary data structures used in machine learning algorithms are dictionaries, trees, and queues." (Same structure, different data structures)
+"The three primary data structures used in machine learning algorithms are lists, sets, and databases." (Same structure but mixing concepts)
+"The three primary data structures used in machine learning algorithms are features, labels, and parameters." (Same structure, but confuses data structures with other ML concepts)
+Inappropriate incorrect answer option:
+"Machine learning algorithms first store data in arrays, then process it using functional programming techniques." (Different grammatical structure - doesn't follow the pattern)
+</example>
+## 6. Here is an example of appropriate incorrect answer options for list questions. Here all options give a list of three items:
+<example>
+Learning Objective: "Identify the three key principles of object-oriented programming."
+Correct Answer: "The three key principles of object-oriented programming are encapsulation, inheritance, and polymorphism."
+Appropriate incorrect answer options:
+"The three key principles of object-oriented programming are encapsulation, inheritance, and composition." (Same structure, two correct, one incorrect)
+"The three key principles of object-oriented programming are abstraction, polymorphism, and delegation." (Same structure, mix of correct/incorrect)
+"The three key principles of object-oriented programming are instantiation, implementation, and isolation." (Same structure, all incorrect terms)
+Inappropriate incorrect answer option:
+"Object-oriented programming uses classes and objects to organize code into reusable components." (Different structure and doesn't list three specific principles)
+</example>
+## 7. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when using unnecessary second clauses with comparative phrases that create obvious contradictions.
+<example>
+Learning Objective: "Explain the primary benefit of incorporating human feedback into AI agent workflows."
+Correct Answer: "The primary benefit of incorporating human feedback is to enable agents to correct errors and improve their decision-making accuracy."
+Inappropriate incorrect answer options with problematic second clauses:
+"The primary benefit of incorporating human feedback is to validate agent outputs rather than actually improving their performance over time." (Uses "rather than" to create false dichotomy that makes the option obviously wrong)
+"The primary benefit of incorporating human feedback is to increase processing speed instead of focusing on accuracy or quality improvements." (Uses "instead of" to unnecessarily contrast with core benefits, making it clearly incorrect)
+"The primary benefit of incorporating human feedback is to maintain consistency across outputs but not necessarily to enhance the agent's learning capabilities." (Uses "but not necessarily" to explicitly negate a key benefit, making it obviously eliminable)
+Appropriate incorrect answer option:
+"The primary benefit of incorporating human feedback is to establish standardized response formats that ensure consistent output structure across different tasks." (Doesn't have a contradictory second clause)
+</example>
+## 8. Here is an example of inappropriate incorrect answer options that demonstrate what to avoid when the question asks for positive aspects but distractors present negative aspects, making them obviously wrong. Same applies for negative aspects and distractors being positive.
+<example>
+Learning Objective: "Identify the main advantages of using cloud computing for business applications."
+Correct Answer: "Cloud computing provides cost savings, scalability, and improved accessibility for business applications."
+Inappropriate incorrect answer options that present disadvantages when asked for advantages:
+"Cloud computing increases security risks, creates vendor dependency, and requires constant internet connectivity." (Obviously wrong - lists disadvantages when the question asks for advantages)
+Appropriate incorrect answer option:
+"Cloud computing provides faster processing speeds, enhanced data encryption, and simplified software licensing." (All purported benefits as the question asks about benefits)
+</example>
+"""

prompts/learning_objectives.py ADDED Viewed

	@@ -0,0 +1,216 @@

+BASE_LEARNING_OBJECTIVES_PROMPT = """
+The learning objectives you generate will be assessed through multiple choice
+quiz questions, so the learning objectives cannnot be about building, creating,
+developing, etc. Instead, the objectives should be about identifying, listing,
+describing, defining, comparing, e.g., the kinds of things that can be assessed in a multiple choice quiz. For example, a learning objective like: “identify the key reason for providing examples of the response format you’re looking for in an LLM prompt”, where then the correct answer might be “providing an example of the response format you're looking for gives the LLM clear guidance on the expected output”. Limit learning objectives to one goal per objective, i.e., don't say "identify the <key concepts of the course> and explain why they are important in the context of <topic of the course>". Either choose which of these (identify or explain) is most relevant to the learning objective or create two learning objectives.
+INSTRUCTIONS:
+1. Each learning objective must be derived DIRECTLY from the COURSE CONTENT provided
+below. Do not create objectives for topics not covered in the content.
+2. Learning objectives should be specific, measurable, and focused on important
+concepts.
+3. Each objective should start with an action verb that allows for assessment using
+a multiple choice question (e.g., identify, describe, define, list, compare, etc.). Do not include more than one action verb per learning objective, e.g., do not say "identify and explain" or similar.
+4. Make each objective unique and independent, covering different aspects of the content. It is ok if two objectives address different aspects or angles of the same topic.
+5. Learning objectives should not contain part or all of the correct answer associated with them.
+6. No learning objective should depend on context from another learning objective, or in other words, each learning objective should be able to stand alone without knowing anything about what the other learning objectives are.
+7. Ensure objectives are at an appropriate level of difficulty for the course, meaning they are consistent with the difficulty level of the course content.
+8. Write learning objectives that address critical knowledge and skills in the content, not trivial facts or details of a specific use case implementation or coding exercise.
+9. Wherever possible, write learning objectives that address the “why” of the concepts presented in the course rather than the “what”. For example, if the course presents an implementation of a use case using a particular framework or tool, don’t write learning objectives that ask about the details of exactly what was presented in the implementation or how the framework or tool was used. Rather, write a learning objective that addresses the why behind the example.
+10. The course content you are provided is presenting principles or methods in the artificial intelligence space, and the means of presentation is through the use of a particular tool or framework. Do not mention the name of whatever tool or framework is used in the course as part of a learning objective. Instead aim for tool or framework agnostic learning objectives that address the principles or methods as well as topics and concepts being presented.
+11. Do not write learning objectives that address specific tool or framework functionality presented in the course content. Write learning objectives that are completely tool or framework agnostic that get at the core principles or methods being presented.
+12. Because this is a software development course and an AI development course, refrain from any references to manual intervention unless absolutely relevant
+12. Write the first learning objective to address the “what” or "why" of the main topic of the course in a way that will lead to a relatively easy recall question as the first question in the quiz. To write this first learning objective you should identify the main
+topic, concept or principle that the course is about and form an objective
+something like “identify what <important main topic / concept / principle> is” or
+“explain why <important main topic / concept / principle> is important?” or something of similar simplicity.
+"""
+BLOOMS_TAXONOMY_LEVELS = """
+The levels of Bloom's taxonomy from lowest to highest are as follows:
+- Recall: Demonstrates the retention of key concepts and facts, not trivialities.
+Avoid a simple recall structure, where there is an opportunity to ask learners a
+question at a higher level of Bloom's taxonomy, for example, asking the learner
+to apply a concept seen in course to a new but similar scenario.
+- Comprehension: Connect ideas and concepts to demonstrate a deeper grasp of
+the material beyond simple recall.
+- Application: Apply a concept to a new / different but similar scenario to that
+seen in course.
+- Analysis: Examine and break information into parts, determine how the parts
+relate, identify motives or causes, make inferences or calculations, and find
+evidence to support generalizations.
+- Evaluation: Make judgments, assessments, or evaluations regarding a scenario,
+statement, or concept. The answer choices should offer different plausible options
+that require critical thinking to discern the most valid or appropriate choice.
+"""
+LEARNING_OBJECTIVE_EXAMPLES = """
+<appropriate_learning_objectives_and_correct_answers>
+[
+  {
+    "id": 1,
+    "learning_objective": "Identify what a code agent is.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L0_v1.vtt",
+      "sc-HuggingFace-C5-L1_v4.vtt",
+      "sc-HuggingFace-C5-L2_v4.vtt"
+    ],
+    "correct_answer": "A code agent is a system that uses an AI model to generate and execute code as its way of performing tasks.",
+  },
+  {
+    "id": 2,
+    "learning_objective": "Explain how code agents can be more efficient than traditional tool calling agents.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L2_v4.vtt"
+    ],
+    "correct_answer": "Code agent actions are more compact and can execute loops and variables in a single code snippet, which reduces the number of steps, latency, and risk of errors in comparison to traditional tool calling agents.",
+  },
+  {
+    "id": 3,
+    "learning_objective": "Describe how running code agents using a custom Python interpreter, like the one used in the course, can mitigate security concerns associated with executing AI-generated code.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L3_v3.vtt"
+    ],
+    "correct_answer": "Running code in a dedicated interpreter can help mitigate security risks by restricting imports, blocking undefined commands, and limiting resource usage.",
+  },
+  {
+    "id": 4,
+    "learning_objective": "Describe how running code agents in a remote sandbox can mitigate security concerns associated with executing AI-generated code.": [
+      "sc-HuggingFace-C5-L3_v3.vtt"
+    ],
+    "correct_answer": "Running code in a remote sandbox environment helps prevent malicious or accidental damage to your local system."
+  },
+  {
+    "id": 5,
+    "learning_objective": "Describe why and how agent performance can be tracked or traced during execution.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L4_v3.vtt"
+    ],
+    "correct_answer": "Tracing captures the agent's reasoning steps and tool usage, making it possible to evaluate the correctness, efficiency, and reliability of its decisions over multiple steps."
+  },
+  {
+    "id": 6,
+    "learning_objective": "Discuss the benefits of using multiple specialized agents that collaborate on complex tasks.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L5_v2.vtt"
+    ],
+    "correct_answer": "Splitting tasks among distinct agents with focused roles, each having its own memory and capabilities, can improve performance, reduce errors, and enable more advanced planning."
+  },
+  {
+ "id": 7,
+ "learning_objective": "Explain what it means to calculate advantages by normalizing reward scores across generated responses and centering them around zero in reinforcement fine-tuning.",
+ "source_reference": [
+   "sc-Predibase-C2-L4.vtt"
+ ],
+ "correct_answer": "Advantages are calculated by normalizing reward scores across generated responses, centering them around zero to highlight which outputs are better or worse than average."
+}
+]
+</appropriate_learning_objectives_and_correct_answers>
+Avoid adding unnecessary length to the correct answer. Aim for 20 word correct answers or less. Below is an example of unnecessary length, which typically occurs in the last part of a long sentence:
+<inappropriate_learning_objectives_and_correct_answers>
+[
+  {
+    "id": 1,
+    "learning_objective": "Identify why integrating internal and external systems is important when building multi-agent AI applications",
+    "source_reference": [
+      "sc-CrewAI-C2-L2_eng.vtt"
+    ],
+    "correct_answer": "Integrating internal and external systems enables AI agents to access, process, and act on real-world data, expanding the usefulness and applicability of automations.",
+    },
+      {
+    "id": 2,
+    "learning_objective": "Identify the main advantage of assigning different AI models to different agents in a multi-agent system, as described in the course"
+    "source_reference": [
+      "sc-CrewAI-C2-L2_eng.vtt"
+    ],
+    "correct_answer": "Assigning different AI models to different agents enables the system to optimize for factors like speed, quality, or task complexity, making the overall workflow more efficient and effective"
+    }
+]
+In id: 1 we should avoid "expanding the usefulness and applicability of automations." and in id: 2 we should avoid "making the overall workflow more efficient and effective". These statements are considered unnecessary length.
+Rule: Avoid compound sentences where the second clause introduces additional consequences, effects, or elaborations that are not essential to the core concept.
+Look for these patterns that indicate unnecessary length:
+- Clauses beginning with "which," "that," "providing," "enabling," "reducing," "ensuring"
+- Phrases connected by commas that add consequences or effects
+- Additional explanations after the main misconception is established
+</inappropriate_learning_objectives_and_correct_answers>
+"""
+LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS = """
+<learning_objectives>
+[
+  {
+    "id": 1,
+    "learning_objective": "Identify what a code agent is.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L0_v1.vtt",
+      "sc-HuggingFace-C5-L1_v4.vtt",
+      "sc-HuggingFace-C5-L2_v4.vtt"
+    ],
+  },
+  {
+    "id": 2,
+    "learning_objective": "Explain how code agents can be more efficient than traditional tool calling agents.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L2_v4.vtt"
+    ],
+  },
+  {
+    "id": 3,
+    "learning_objective": "Describe how running code agents using a custom Python interpreter, like the one used in the course, can mitigate security concerns associated with executing AI-generated code.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L3_v3.vtt"
+    ],
+  },
+  {
+    "id": 4,
+    "learning_objective": "Describe how running code agents in a remote sandbox can mitigate security concerns associated with executing AI-generated code.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L3_v3.vtt"
+    ],
+  },
+  {
+    "id": 5,
+    "learning_objective": "Describe why and how agent performance can be tracked or traced during execution.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L4_v3.vtt"
+    ],
+  },
+  {
+    "id": 6,
+    "learning_objective": "Discuss the benefits of using multiple specialized agents that collaborate on complex tasks.",
+    "source_reference": [
+      "sc-HuggingFace-C5-L5_v2.vtt"
+    ],
+  },
+  {
+ "id": 7,
+ "learning_objective": "Explain what it means to calculate advantages by normalizing reward scores across generated responses and centering them around zero in reinforcement fine-tuning.",
+ "source_reference": [
+   "sc-Predibase-C2-L4.vtt"
+ ]
+}
+]
+</learning_objectives>
+"""

prompts/questions.py ADDED Viewed

	@@ -0,0 +1,886 @@

+GENERAL_QUALITY_STANDARDS = """
+The overall goal of the quiz is to set the learner up for success in answering
+interesting and non-trivial questions (not to give them a painful or discouraging
+experience). A perfect score should be attainable for someone who thoughtfully
+consumed the course content the quiz is based on. The question you write must be
+aligned with the course content and the provided
+learning objective and correct answer.
+Because this is a software development course and an AI development course, refrain from any references to manual intervention unless absolutely relevant
+"""
+MULTIPLE_CHOICE_STANDARDS = """
+- Each question must have EXACTLY ONE correct answer (no more, no less)
+- Each question should have a clear, unambiguous correct answer
+- Distractors (wrong answer options) should be plausible and represent common misconceptions. Not obviously wrong.
+- All options should be of similar length and detail
+- Options should be mutually exclusive
+- Avoid "all/none of the above" options unless pedagogically necessary
+- Typically include 4 options (A, B, C, D)
+- IMPORTANT NOTE: do not start the answer feedback with “correct” or “i
+"""
+EXAMPLE_QUESTIONS = """
+<EXAMPLE_QUESTION_1>
+What is a code agent in the context of an LLM workflow?
+A: An AI model that generates text responses without executing external actions.
+Feedback: Code agents do more than generate text—they generate and run code to perform actions.
+*B: An AI agent that can write and execute code as part of its decision-making process.
+Feedback: Well done! Code agents can write and execute code to handle tasks, rather than just output text or follow a strict script.
+C: A pre-programmed or “coded” AI system that follows a strict decision tree to perform tasks.
+Feedback: Code agents are not fixed, rule-based systems. Code agents can write and execute code, rather than following a single predetermined decision tree.
+D: An AI assistant that can perform calculations and simple tasks without external interaction.
+Feedback: Code agents can write and execute code to perform complex tasks, not just simple calculations.
+</EXAMPLE_QUESTION_1>
+<EXAMPLE_QUESTION_2>
+In the context of agent architectures, what is a key performance trade-off between representing agent actions as code (code agents) versus representing them as JSON tool calls, when it comes to complex multi-step tasks?
+*A: Code agents use fewer tokens, exhibit lower latency, and have reduced error rates, since complex actions can be represented and executed in a single, consolidated code snippet.
+Feedback: Nice work! Code agents can execute loops, reuse variables, and call multiple tools with a single code snippet, so they need far fewer LLM turns. Fewer turns mean lower token usage, shorter round-trip latency, and fewer chances for the model to make mistakes.
+B: JSON-based agents require fewer tokens because each step is more compact, resulting in faster execution and fewer errors for complex tasks.
+Feedback: JSON tool calls actually increase token usage because each micro-action and its context must be sent back to the model repeatedly. This longer chain of steps results in slower execution and an increased chance of errors.
+C: Both code-based and JSON-based representations are equivalent in terms of token usage, latency, and error rates for complex multi-step tasks.
+Feedback: Code agents can chain many actions in a single code execution step, which reduces token usage, latency, and error rates, while tool calling agents execute a chain step-by-step, which typically increases token usage, latency, and the chance of errors.
+D: Code agents have higher latency due to the complexity of parsing code, while JSON action representation avoids errors by breaking down tasks into isolated calls.
+Feedback: Parsing a short code snippet is straightforward and fast, and overall latency tends to be dominated by LLM-token traffic. Code agents can execute complex tasks in one step, so they typically run faster and with a reduced chance of errors. JSON action agents often execute the same complex task step-by-step using many LLM turns, so latency and error rate are higher, not lower.
+</EXAMPLE_QUESTION_2>
+<EXAMPLE_QUESTION_3>
+What is one of the main risks associated with running code agents on your local computer?
+A: They might send spam emails from your email account.
+Feedback: The lesson highlights risks such as file deletion, resource abuse, or network compromise, not sending spam emails.
+*B: They could execute code that deletes critical files or creates many files that bloat your system.
+Feedback: Good work! Letting an agent run code locally can compromise your system in a number of ways, such as deleting vital files, or generating a large number of files.
+C: They might launch a denial-of-service attack on your own computer.
+Feedback: The lesson focuses on more direct threats such as file deletion and creation, or installing malware. While a denial-of-service is theoretically possible, it isn��t emphasized as a primary risk.
+D: They might cause your computer to overheat by overusing the CPU.
+Feedback: Overheating and CPU overuse isn’t one of the primary risks discussed in the lesson.
+</EXAMPLE_QUESTION_3>
+<EXAMPLE_QUESTION_4>
+How does the custom local Python interpreter demonstrated in the course mitigate risks from harmful code execution?
+A: The local interpreter uses the standard Python interpreter but logs all output for manual review.
+Feedback: The custom interpreter presented in the course does not rely on the normal Python interpreter at all. It enforces safeguards such as blocking imports, ignoring shell commands, and capping the number of loop-iteration caps.
+*B: It ignores undefined commands, disallows imports outside an explicit whitelist, and sets a hard cap on loop iterations to prevent infinite loops and resource abuse.
+Feedback: That’s right! The custom Python interpreter presented in the course skips undefined shell-style commands, blocks any import not explicitly approved, and stops loop executions that exceed the cap, all of which help mitigate security and resource risks.
+C: It only allows execution of code that does not require any external packages, preventing all imports regardless of configuration.
+Feedback: The interpreter isn’t a blanket “no-imports” sandbox. It blocks imports by default, but you can pass an explicit whitelist (e.g., LocalPythonExecutor(["numpy", "PIL"])) that lets approved external packages load.
+D: It prevents all code execution by rejecting any code containing loops or function definitions.
+Feedback: The interpreter doesn’t blanket-ban loops or function definitions. It still runs normal Python code—including loops and functions—but adds safeguards: it caps loop iterations, blocks disallowed imports, and skips undefined commands.
+</EXAMPLE_QUESTION_4>
+<EXAMPLE_QUESTION_5>
+Which of the following is a key security advantage of using a remote sandbox environment for executing code agents, as discussed in the lesson?
+A: It ensures faster execution of agents by optimizing code compilation. Feedback: Remote sandboxes primarily protect systems from potential harm, not enhance code execution speed.
+*B: It allows execution of code without the risk of affecting local systems.
+Feedback: Great job! Running agents in a remote sandbox isolates their code execution so that any errors or malicious actions cannot harm your local system.
+C: It provides detailed real-time monitoring of all code executions.
+Feedback: A remote sandbox mainly prevents harmful code from threatening your local system, not through real-time monitoring.
+D: It guarantees execution of the code with reduced computational cost.
+Feedback: Using a remote sandbox protects your local system from malicious or faulty code. It does not guarantee reduced computational cost.
+</EXAMPLE_QUESTION_5>
+Note that all example questions follow the general quality standards as well as
+the question specific quality standards. The correct answer (marked with a *) and
+incorrect answer options follow the standards specific to correct and incorrect
+answers.
+"""
+QUESTION_SPECIFIC_QUALITY_STANDARDS = """
+The question you write must:
+- be in the language and tone of the course.
+- be at a similar level of difficulty or complexity as encountered in the course.
+- assess only information from the course and not depend on information that was
+not covered in the course.
+- not attempt to teach something as part of the quiz.
+- use clear and concise language
+- not induce confusion
+- provide a slight (not major) challenge.
+- be easily interpreted and unambiguous.
+- be well written in clear and concise language, proper grammar, good sentence
+structure, and consistent formatting
+- be thoughtful and specific rather than broad and ambiguous
+- be complete in its wording such that understanding the question is not part
+of the assessment
+"""
+CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS = """
+The correct answer you write:
+- must be factually correct and unambiguous
+- be in the language and tone of the course and in complete sentence form.
+- be at a similar level of difficulty or complexity as encountered in the course.
+- contain only information from the course and not depend on information that was
+not covered in the course.
+- not attempt to teach something as part of the quiz.
+- use clear and concise language
+- be thoughtful and specific rather than broad and ambiguous
+- be complete in its wording such that understanding which is the correct answer
+is not part of the assessment
+"""
+INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS = """
+The incorrect answer options you write should ideally represent reasonable potential misconceptions, but they could also be answers that would sound plausible to someone who has not taken the course, or was not paying close enough attention during the course. In that sense, they should require some thought, even by the learner who has diligently completed the course, to determine they are incorrect.
+When constructing incorrect answer feedback, pay attention to the incorrect_answer_suggestions provided along with the learning objective. These are not your only options for incorrect answers but you can use them directly, or as a starting point or in addition to other plausible incorrect answers.
+Wrong answeres should not be so obviously wrong that a learner who has not taken the course can immediately rule them out.
+Here are some examples of poorly written incorrect answer options for a particular
+question:
+<QUESTION>
+Which statement best explains why monitoring an agent's trace can be helpful
+in debugging and establishing the performance of and agent?
+</QUESTION>
+<POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_1>
+The agent's trace is a visual interface for users that adds limited insight into
+the agent's internal processes.
+</POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_1>
+The example above is poorly written because it is obviously wrong. The question
+is asking for why monitoring the agent trace can be helpful and this answer option
+states that it is a visual interface that provides limited insight, which is
+certainly incorrect, but does not represent a reasonable potential misconception
+and even a learner who has not taken the course can immediately rule it out.
+<POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_2>
+The agent trace is used exclusively for scenarios where the agent is underperforming.
+</POORLY_WRITTEN_INCORRECT_ANSWER_OPTION_2>
+This answer option is also poorly written because it is obviously wrong. The use of the word "exclusively" is a tipoff that this is not the right answer. Similarly, formulating incorrect answer options with words like "only", "always", "never", and similar words that in and of themselves make the answer option wrong represent poor word choice for incorrect answer options.
+Below is an example of a well written incorrect answer option for the same question:
+<WELL_WRITTEN_INCORRECT_ANSWER_OPTION>
+The agent trace comprises a list of the error messages generated during an agent's
+execution, which is helpful for debugging.
+</WELL_WRITTEN_INCORRECT_ANSWER_OPTION>
+The example above is well written because it is not obviously wrong and represents
+a reasonable potential misconception. It requires some thought to determine it is
+incorrect and a learner who has not taken the course will not be able to
+immediately rule it out. In fact, if you changed the word "comprises" to "includes" the answer option would be correct, in a sense, just incomplete. But in this case, the learner needs to be paying close attention to identify this as incorrect.a
+"""
+ANSWER_FEEDBACK_QUALITY_STANDARDS = """
+Every correct and incorrect answer must include feedback.
+Incorrect answer feedback should:
+- be informational and encouraging, not punitive.
+- be a single sentence, concise and to the point.
+- Do not say "Incorrect" or "Wrong".
+Correct answer feedback should:
+- be informational and encouraging.
+- be a single sentence, concise and to the point.
+- Do not say Correct! or anything that will sound redundant after the string "Correct: ", e.g. "Correct: Correct!".
+"""
+INCORRECT_ANSWER_PROMPT  = """
+# CORE PRINCIPLES WITH EXAMPLES:
+## 1. CREATE COMMON MISUNDERSTANDINGS
+Create incorrect answer suggestions that represent how students actually misunderstand the material:
+<example>
+Learning Objective: "What is version control in software development?"
+Correct Answer: "A system that tracks changes to files over time so specific versions can be recalled later."
+Plausible Incorrect Answer Suggestions:
+- "A testing method that ensures software works correctly across different operating system versions." (Confuses with cross-platform testing)
+- "A project management approach where each team member works on a separate software version." (Misunderstands the concept entirely)
+- "A release strategy that maintains multiple versions of software for different customer needs." (Confuses with product versioning)
+</example>
+## 2. MAINTAIN IDENTICAL STRUCTURE
+All incorrect answer suggestions must match the correct answer's grammatical pattern, length, and formatting:
+<example>
+Learning Objective: "What are the three primary data structures used in machine learning algorithms?"
+Correct Answer: "Arrays, matrices, and graphs."
+Good Incorrect Answer Suggestions:
+- "Dictionaries, trees, and queues." (Same structure, different data structures)
+- "Tensors, vectors, and databases." (Same structure but mixing concepts)
+- "Features, labels, and parameters." (Same structure, but confuses data structures with ML concepts)
+Bad Incorrect Answer Suggestion:
+- "Machine learning algorithms first store data in arrays, then process it using functional programming." (Different structure)
+</example>
+## 3. USE COURSE TERMINOLOGY CORRECTLY BUT IN WRONG CONTEXTS
+Use terms from the course material but apply them incorrectly:
+<example>
+Learning Objective: "What is the purpose of backpropagation in neural networks?"
+Correct Answer: "To calculate gradients used to update weights during training."
+Plausible Incorrect Answer Suggestions:
+- "To normalize input data across layers to prevent gradient explosion." (Uses correct terms but describes batch normalization)
+- "To optimize the activation functions by adjusting their thresholds during inference." (Misapplies neural network terminology)
+- "To propagate inputs forward through the network during the prediction phase." (Confuses with forward propagation)
+</example>
+## 4. INCLUDE PARTIALLY CORRECT INFORMATION
+Create incorrect answer suggestions that contain some correct elements but miss critical aspects:
+<example>
+Learning Objective: "How does transfer learning improve deep neural network training?"
+Correct Answer: "By reusing features learned from a large dataset to initialize a model that can then be fine-tuned on a smaller, task-specific dataset."
+Plausible Incorrect Answer Suggestions:
+- "By transferring trained models between different neural network frameworks to improve compatibility and deployment options." (Misunderstands the concept of knowledge transfer)
+- "By reusing features learned from a large dataset and freezing all weights to prevent any updates during task-specific training." (First part correct, second part wrong)
+- "By combining multiple pre-trained models into a committee that votes on final predictions for improved accuracy." (Confuses with ensemble learning)
+</example>
+## 5. AVOID OBVIOUSLY WRONG ANSWERS
+Don't create incorrect answer suggestions that anyone with basic knowledge could eliminate:
+<example>
+Learning Objective: "What is unit testing in software development?"
+Correct Answer: "Testing individual components in isolation to verify they work as expected."
+Bad Incorrect Answer Suggestions to Avoid:
+- "A process where code is randomly modified to see if it still works." (Too obviously wrong)
+- "Testing that should never be done because it wastes development time." (Contradicts basic principles)
+- "Running the software on different units of hardware like phones and laptops." (Misunderstands the basic concept)
+</example>
+## 6. MIRROR THE DETAIL LEVEL AND STYLE
+Match the technical depth and tone of the correct answer:
+<example>
+Learning Objective: "What is the time complexity of quicksort in the average case?"
+Correct Answer: "O(n log n), where n is the number of elements to be sorted."
+Good Incorrect Answer Suggestions:
+- "O(n^2), where n is the number of elements to be sorted." (Same level of detail)
+- "O(n), where n is the number of elements to be sorted." (Same structure and detail)
+- "O(log n), where n is the number of elements to be sorted." (Same structure but incorrect complexity)
+Bad Incorrect Answer Suggestion:
+- "Quicksort is generally faster than bubble sort but can perform poorly on already sorted arrays." (Different style and not answering the specific objective)
+</example>
+## 7. FOR LIST QUESTIONS, MAINTAIN CONSISTENCY
+If the correct answer lists specific items, all incorrect answer suggestions should list the same number of items:
+<example>
+Learning Objective: "What are the three key principles of object-oriented programming?"
+Correct Answer: "Encapsulation, inheritance, and polymorphism."
+Good Incorrect Answer Suggestions:
+- "Encapsulation, inheritance, and composition." (Same structure, two correct, one incorrect)
+- "Abstraction, polymorphism, and delegation." (Same structure, mix of correct/incorrect)
+- "Instantiation, implementation, and isolation." (Same structure, all incorrect but plausible terms)
+</example>
+## 8. AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
+Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
+These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
+More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
+<example>
+Learning Objective: "What is the purpose of index partitioning in databases?"
+Correct Answer: "To improve query performance by dividing large indexes into smaller, more manageable segments."
+Bad Incorrect Answer Suggestions to Avoid:
+- "To always guarantee the fastest possible query performance regardless of data size." (Uses "always")
+- "To improve query performance rather than ensuring data integrity or providing backup functionality." (Unnecessary comparison)
+- "To exclusively support distributed database systems that never operate on a single server." (Uses absolute terms)
+</example>
+"""
+INCORRECT_ANSWER_EXAMPLES = """
+<example>
+Learning Objective: "What is the purpose of activation functions in neural networks?"
+Correct Answer: "To introduce non-linearity into the network's output."
+Plausible Incorrect Answer Suggestions:
+- "To normalize input data across different feature scales." (Confuses with data normalization)
+- "To reduce computational complexity during forward propagation." (Misunderstands as performance optimization)
+- "To prevent gradient explosion during backpropagation training." (Confuses with gradient clipping)
+</example>
+Note: All options follow the same grammatical structure ("To [verb] [object]") across all options.
+<example>
+Learning Objective: "What is the main function of Git branching?"
+Correct Answer: "To separate work on different features or fixes from the main codebase."
+Plausible Incorrect Answer Suggestions:
+- "To create backup copies of the repository in case of system failure." (Confuses with backup functionality)
+- "To track different versions of files across multiple development environments." (Mixes up with version tracking)
+- "To isolate unstable code until it passes integration testing protocols." (Focuses only on testing aspects)
+</example>
+Note: All options maintain identical sentence structure ("To [verb] [object phrase]") with similar length and complexity.
+<example>
+Learning Objective: "Which category of machine learning algorithms does K-means clustering belong to?"
+Correct Answer: "Unsupervised learning algorithms that identify patterns without labeled training data."
+Plausible Incorrect Answer Suggestions:
+- "Supervised learning algorithms that predict continuous values based on labeled examples." (Confuses with regression)
+- "Reinforcement learning algorithms that optimize decisions through environment interaction." (Misclassifies algorithm type)
+- "Semi-supervised learning algorithms that combine labeled and unlabeled data for training." (Incorrect classification)
+</example>
+Note: All options follow consistent structure: "[Category] algorithms that [what they do]" while using correct ML terminology in wrong contexts.
+<example>
+Learning Objective: "How does feature scaling improve the performance of distance-based machine learning models?"
+Correct Answer: "By ensuring all features contribute equally to distance calculations regardless of their original ranges."
+Plausible Incorrect Answer Suggestions:
+- "By removing redundant features that would otherwise dominate the learning algorithm." (Confuses with feature selection)
+- "By converting categorical variables into numerical representations for mathematical operations." (Mixes up with encoding)
+- "By increasing the dimensionality of the feature space to capture more complex relationships." (Confuses with feature expansion)
+</example>
+Note: All options maintain consistent grammatical structure ("By [verb+ing] [object] [qualification]") while including partially correct concepts.
+<example>
+Learning Objective: "How do NoSQL databases differ from relational databases?"
+Correct Answer: "NoSQL databases use flexible schema designs while relational databases enforce strict predefined schemas."
+Plausible Incorrect Answer Suggestions:
+- "NoSQL databases support ACID transactions while relational databases prioritize eventual consistency." (Reverses actual characteristics)
+- "NoSQL databases require SQL for queries while relational databases support multiple query languages." (Fundamentally incorrect)
+- "NoSQL databases are primarily used for small datasets while relational databases handle big data applications." (Inverts typical use cases)
+</example>
+Note: All options follow identical grammatical structure: "NoSQL databases [characteristic] while relational databases [contrasting characteristic]" with similar technical detail.
+<example>
+Learning Objective: "What are the three primary service models in cloud computing?"
+Correct Answer: "Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS)."
+Plausible Incorrect Answer Suggestions:
+- "Virtual Machines as a Service (VMaaS), Containers as a Service (CaaS), and Functions as a Service (FaaS)." (Confuses with deployment methods)
+- "Storage as a Service (STaaS), Network as a Service (NaaS), and Compute as a Service (CaaS)." (Mixes up with service categories)
+- "Public Cloud, Private Cloud, and Hybrid Cloud." (Confuses with deployment models)
+</example>
+Note: Each option follows the pattern "[Item 1], [Item 2], and [Item 3]" with consistent abbreviation formatting and exactly three items.
+<example>
+Learning Objective: "What is the best practice for conducting effective code reviews?"
+Correct Answer: "Review small, focused changes regularly rather than large batches of code infrequently."
+Plausible Incorrect Answer Suggestions:
+- "Ensure only senior developers conduct reviews to maintain code quality standards." (Overemphasizes seniority)
+- "Focus on identifying bugs rather than architectural or stylistic issues." (Narrows scope too much)
+- "Require code to pass automated tests with 100 percent coverage before human review." (Overstates requirements)
+</example>
+Note: All options follow similar imperative structure with concrete recommendations while avoiding absolute terms like "always" or "never".
+<example>
+Learning Objective: "Which statement accurately describes the role of a Scrum Master in agile development?"
+Correct Answer: "A facilitator who removes impediments and ensures the team follows agile practices."
+Plausible Incorrect Answer Suggestions:
+- "A technical leader who reviews code quality and makes final architectural decisions." (Confuses with tech lead role)
+- "A project manager who assigns tasks and tracks individual team member performance." (Mixes up with traditional PM)
+- "A product owner who prioritizes features and accepts completed work on behalf of stakeholders." (Confuses with Product Owner)
+</example>
+Note: All options follow consistent grammatical structure ("A [role] who [does something specific]") with parallel descriptions.
+<example>
+Learning Objective: "What is the most likely cause of an SQL injection vulnerability?"
+Correct Answer: "Directly incorporating user input into database queries without proper validation or parameterization."
+Plausible Incorrect Answer Suggestions:
+- "Using outdated database management systems that lack modern security features." (Confuses with database vulnerabilities)
+- "Implementing weak password hashing algorithms for user authentication." (Mixes up with authentication issues)
+- "Failing to enable HTTPS for secure data transmission between client and server." (Confuses with transport security)
+</example>
+Note: All options follow consistent structure describing a security issue while focusing on different security domains that students might confuse.
+"""
+RANK_QUESTIONS_PROMPT = """
+Rank the following multiple-choice questions based on their quality as assessment items.
+These questions have been selected as the best in a group of questions already. Your task is to rank them based on their quality as assessment items.
+<RANKING_CRITERIA>
+1. Question clarity and unambiguity
+2. Alignment with the stated learning objective
+3. Quality of incorrect answer options see guidelines
+4. Quality of feedback for each option
+5. Appropriate difficulty level and use of simple english. See below examples of simple versus complex english, and consider simple english better for your ranking.
+  <DIFFICULTY_LEVEL_GUIDELINES>
+      <EXAMPLE_1>
+        <SIMPLE_ENGLISH>AI engineers create computer programs that can learn from data and make decisions.</SIMPLE_ENGLISH>
+        <COMPLEX_ENGLISH>AI engineering practitioners architect computational paradigms exhibiting autonomous erudition capabilities via statistical data assimilation and subsequent decisional extrapolation.</COMPLEX_ENGLISH>
+      </EXAMPLE_1>
+      <EXAMPLE_2>
+        <SIMPLE_ENGLISH>Machine learning models need large amounts of good data to work well.</SIMPLE_ENGLISH>
+        <COMPLEX_ENGLISH>Machine learning algorithmic frameworks necessitate voluminous, high-fidelity datasets to achieve optimal efficacy in their inferential capacities.</COMPLEX_ENGLISH>
+      </EXAMPLE_2>
+</DIFFICULTY_LEVEL_GUIDELINES>
+6. It's adherence to the below guidelines:
+      <GUIDELINES>
+                  <General Quality Standards>
+                  {GENERAL_QUALITY_STANDARDS}
+                  </General Quality Standards>
+                  <Multiple Choice Specific Standards>
+                  {MULTIPLE_CHOICE_STANDARDS}
+                  </Multiple Choice Specific Standards>
+                  Follows these example questions:
+                  <Example Questions>
+                  {EXAMPLE_QUESTIONS}
+                  </Example Questions>
+                  Questions followed these instructions:
+                  <Question Specific Quality Standards>
+                  {QUESTION_SPECIFIC_QUALITY_STANDARDS}
+                  </Question Specific Quality Standards>
+                  Correct answers followed these instructions:
+                  <Correct Answer Specific Quality Standards>
+                  {CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS}
+                  </Correct Answer Specific Quality Standards>
+                  Incorrect answers followed these instructions:
+                  <Incorrect Answer Specific Quality Standards>
+                  {INCORRECT_ANSWER_PROMPT}
+                  </Incorrect Answer Specific Quality Standards>
+                  Here are some examples of high quality incorrect answer suggestions:
+                  <incorrect_answer_examples>
+                  {INCORRECT_ANSWER_EXAMPLES}
+                  </incorrect_answer_examples>
+                  Words to avoid:
+                  <Words To Avoid>
+                  AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
+                  Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
+                  These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
+                  More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
+                  </Words To Avoid>
+                  <Answer Feedback Quality Standards>
+                  {ANSWER_FEEDBACK_QUALITY_STANDARDS}
+                  </Answer Feedback Quality Standards>
+      </GUIDELINES>
+</RANKING_CRITERIA>
+<IMPORTANT RANKING INSTRUCTIONS>
+1. DO NOT change the question with ID=1 (if present).
+2. Rank ONLY the questions listed below.
+3. Return a JSON array with each question's original ID and its rank (2, 3, 4, etc.).
+4. The best question should have rank 2 (since rank 1 is reserved).
+5. Consider clarity, specificity, alignment with the learning objectives, and how well each question follows the criteria above.
+6. CRITICAL: You MUST return ALL questions that were provided for ranking. Do not omit any questions. Each question must be assigned a unique rank.
+7. CRITICAL: Each question must have a UNIQUE rank. No two questions can have the same rank.
+<CRITICAL INSTRUCTION - READ CAREFULLY>
+YOU MUST RETURN ALL QUESTIONS THAT WERE PROVIDED FOR RANKING.
+If you receive 30 questions to rank, you must return all 30 questions in your response.
+DO NOT OMIT ANY QUESTIONS.
+EACH QUESTION MUST HAVE A UNIQUE RANK (2, 3, 4, 5, etc. with no duplicates).
+</CRITICAL INSTRUCTION - READ CAREFULLY>
+Your response must be in the following JSON format. Each question must include ALL of the following fields:
+[
+  {{
+    "id": int,
+    "question_text": str,
+    "options": list[dict],
+    "learning_objective": str,
+    "learning_objective_id": int,
+    "correct_answer": str,
+    "source_reference": list[str] or str,
+    "judge_feedback": str or null,
+    "approved": bool or null,
+    "rank": int,
+    "ranking_reasoning": str,
+    "in_group": bool,
+    "group_members": list[int],
+    "best_in_group": bool
+  }},
+  ...
+]
+<RANKING EXAMPLE>
+{
+  "id": 2,
+  "question_text": "What is the primary purpose of AI agents?",
+  "options": [...],
+  "learning_objective_id": 3,
+  "learning_objective": "Describe the main applications of AI agents.",
+  "correct_answer": "To automate tasks and make decisions",
+  "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+  "judge_feedback": "This question effectively tests understanding of AI agent applications.",
+  "approved": true,
+  "rank": 3,
+  "ranking_reasoning": "Clear question that tests understanding of AI agents, but could be more specific.",
+  "in_group": false,
+  "group_members": [2],
+  "best_in_group": true
+}
+{
+  "id": 3,
+  "question_text": "Which of the following best describes machine learning?",
+  "options": [...],
+  "learning_objective_id": 2,
+  "learning_objective": "Define machine learning.",
+  "correct_answer": "A subset of AI that enables systems to learn from data",
+  "source_reference": ["sc-Arize-C1-L2-eng.vtt"],
+  "judge_feedback": "Good fundamental question.",
+  "approved": true,
+  "rank": 2,
+  "ranking_reasoning": "Excellent clarity and directly addresses a fundamental concept.",
+  "in_group": true,
+  "group_members": [3, 8],
+  "best_in_group": true
+}
+{
+  "id": 4,
+  "question_text": "What is a neural network?",
+  "options": [...],
+  "learning_objective_id": 4,
+  "learning_objective": "Explain neural networks.",
+  "correct_answer": "A computing system inspired by biological neural networks",
+  "source_reference": ["sc-Arize-C1-L4-eng.vtt"],
+  "judge_feedback": "Basic definition question.",
+  "approved": true,
+  "rank": 4,
+  "ranking_reasoning": "Clear but very basic definition question without application context.",
+  "in_group": false,
+  "group_members": [4],
+  "best_in_group": true
+}
+</RANKING EXAMPLE>
+</IMPORTANT RANKING INSTRUCTIONS>
+"""
+GROUP_QUESTIONS_PROMPT = """
+Group the following multiple-choice questions based on their quality as assessment items.
+<GROUPING_INSTRUCTIONS>
+1. Identify groups of similar questions that test essentially the same concept or knowledge area.
+2. You can identify similar groups if the learning_objective.id is the same. If two questions have the same learning_objective.id assume they are testing the same concept.
+3. For each question, indicate whether it belongs to a group of similar questions by setting "in_group" to true or false.
+4. For questions that are part of a group, include a "group_members" field with a list of all IDs in that group (including the question itself). If a question has only one group member, set "group_members" to a list with the ID of the question itself.
+5. For each question, add a boolean field "best_in_group": set this to true for the highest-ranked (lowest rank number) question in each group, and false for all others in the group. For questions not in a group, set "best_in_group" to true by default.
+6. CRITICAL: You MUST return ALL questions that were provided for grouping. Do not omit any questions.
+7. CRITICAL: Each question must have a UNIQUE rank. No two questions can have the same rank.
+Your response must be in the following JSON format. Each question must include ALL of the following fields:
+</GROUPING_INSTRUCTIONS>
+<CRITICAL INSTRUCTION - READ CAREFULLY>
+YOU MUST RETURN ALL QUESTIONS THAT WERE PROVIDED FOR  GROUPING.
+If you receive 30 questions to group, you must return all 30 questions in your response.
+DO NOT OMIT ANY QUESTIONS.
+</CRITICAL INSTRUCTION - READ CAREFULLY>
+Your response must be in the following JSON format. Each question must include ALL of the following fields:
+[
+  {{
+    "id": int,
+    "question_text": str,
+    "options": list[dict],
+    "learning_objective_id": int,
+    "learning_objective": str,
+    "correct_answer": str,
+    "source_reference": list[str] or str,
+    "judge_feedback": str or null,
+    "approved": bool or null,
+    "in_group": bool,
+    "group_members": list[int],
+    "best_in_group": bool
+  }},
+  ...
+]
+<Example>
+[
+  {{
+    "id": 2,
+    "question_text": "What is the primary purpose of AI agents?",
+    "options": [
+      {{
+        "option_text": "To automate tasks and make decisions",
+        "is_correct": true,
+        "feedback": "Correct! AI agents are designed to automate tasks and make decisions based on their programming and environment."
+      }},
+      {{
+        "option_text": "To replace human workers entirely",
+        "is_correct": false,
+        "feedback": "Incorrect. While AI agents can automate certain tasks, they are not designed to replace humans entirely."
+      }},
+      {{
+        "option_text": "To process large amounts of data",
+        "is_correct": false,
+        "feedback": "Incorrect. While data processing is a capability of some AI systems, it's not the primary purpose of AI agents specifically."
+      }},
+      {{
+        "option_text": "To simulate human emotions",
+        "is_correct": false,
+        "feedback": "Incorrect. AI agents are not primarily designed to simulate human emotions."
+      }}
+    ],
+    "learning_objective_id": 3,
+    "learning_objective": "Describe the main applications of AI agents.",
+    "correct_answer": "To automate tasks and make decisions",
+    "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+    "judge_feedback": "This question effectively tests understanding of AI agent applications.",
+    "approved": true,
+    "in_group": true,
+    "group_members": [2, 5, 7],
+    "best_in_group": true
+  }}
+]
+</Example>
+<EXAMPLE OF COMPLETE GROUPING RESPONSE>
+Here's an example of how to properly group a set of 5 questions:
+Input questions with IDs: [2, 3, 4, 5, 6]
+Correct output (all questions returned with unique ranks):
+[
+  {
+    "id": 2,
+    "question_text": "What is the primary purpose of AI agents?",
+    "options": [...],
+    "learning_objective_id": 3,
+    "learning_objective": "Describe the main applications of AI agents.",
+    "correct_answer": "To automate tasks and make decisions",
+    "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+    "judge_feedback": "This question effectively tests understanding of AI agent applications.",
+    "approved": true,
+    "in_group": true,
+    "group_members": [2, 5],
+    "best_in_group": true
+  },
+  {
+    "id": 3,
+    "question_text": "Which of the following best describes machine learning?",
+    "options": [...],
+    "learning_objective_id": 2,
+    "learning_objective": "Define machine learning.",
+    "correct_answer": "A subset of AI that enables systems to learn from data",
+    "source_reference": ["sc-Arize-C1-L2-eng.vtt"],
+    "judge_feedback": "Good fundamental question.",
+    "approved": true,
+    "in_group": false,
+    "group_members": [3],
+    "best_in_group": true
+  },
+  {
+    "id": 4,
+    "question_text": "What is a neural network?",
+    "options": [...],
+    "learning_objective_id": 4,
+    "learning_objective": "Explain neural networks.",
+    "correct_answer": "A computing system inspired by biological neural networks",
+    "source_reference": ["sc-Arize-C1-L4-eng.vtt"],
+    "judge_feedback": "Basic definition question.",
+    "approved": true,
+    "in_group": false,
+    "group_members": [4],
+    "best_in_group": true
+  },
+  {
+    "id": 5,
+    "question_text": "How do AI agents help in automation?",
+    "options": [...],
+    "learning_objective_id": 3,
+    "learning_objective": "Describe the main applications of AI agents.",
+    "correct_answer": "By performing tasks based on programmed rules or learned patterns",
+    "source_reference": ["sc-Arize-C1-L3-eng.vtt"],
+    "judge_feedback": "Related to question 2 but more specific.",
+    "approved": true,
+    "in_group": true,
+    "group_members": [2, 5],
+    "best_in_group": false
+  },
+  {
+    "id": 6,
+    "question_text": "What is deep learning?",
+    "options": [...],
+    "learning_objective_id": 5,
+    "learning_objective": "Differentiate deep learning from traditional machine learning.",
+    "correct_answer": "A subset of machine learning using multi-layered neural networks",
+    "source_reference": ["sc-Arize-C1-L5-eng.vtt"],
+    "judge_feedback": "Good definition question.",
+    "approved": true,
+    "in_group": false,
+    "group_members": [6],
+    "best_in_group": true
+  }
+]
+Notice that:
+1. ALL 5 input questions are returned in the output
+2. Each question has a UNIQUE rank (2, 3, 4, 5, 6)
+3. Questions 2 and 5 are identified as being in the same group
+4. Question 2 is marked as best_in_group=true while question 5 has best_in_group=false
+5. Questions that aren't in groups with other questions have group_members containing only their own ID
+</EXAMPLE OF COMPLETE RANKING RESPONSE>
+</IMPORTANT RANKING INSTRUCTIONS>
+"""
+RULES_FOR_SECOND_CLAUSES = """
+Avoid contradictory second clauses - Don't add qualifying phrases that explicitly negate the main benefit or create obvious limitations
+Bad: "Human feedback enables complex reasoning, allowing workflows to handle cases without any human involvement" (contradicts the premise of human feedback)
+Fixed: "Human feedback enables the agent to develop more sophisticated reasoning patterns for handling complex document structures" (stays positive, just misdirects the benefit)
+Additional guidance:
+Keep second clauses supportive - If you include a second clause, it should reinforce the incorrect direction, not contradict it
+Bad: "Context awareness helps agents understand code, but prevents them from adapting to new situations"
+Good: "Context awareness helps agents understand code by focusing on the most recently modified files and functions"
+Focus on misdirection, not negation - Wrong answers should point toward a plausible but incorrect benefit, not explicitly limit or negate the concept
+Bad: "Version control tracks changes but cannot recall previous versions"
+Good: "Version control tracks changes to ensure compatibility across different development environments"
+Maintain positive framing - All options should sound like genuine benefits, just targeting the wrong aspect
+Bad: "Transfer learning reuses features but freezes all weights, preventing any updates"
+Good: "Transfer learning reuses features to establish consistent baseline performance across different model architectures"
+Better versions of those options:
+B: "Human feedback enables the agent to develop more sophisticated automated reasoning capabilities for handling complex document analysis tasks."
+C: "Human feedback provides the agent with contextual understanding that enhances its decision-making framework for future similar documents."
+D: "Human feedback allows the agent to establish consistent formatting and presentation standards across all processed documents."
+* Look for explicit negations using "without," "rather than," "instead of," "but not," "but", "except", or "excluding" that directly contradict the core concept
+Avoid negating phrases that explicitly exclude the main concept:
+- Bad: "provides simple Q&A without automating structured tasks"
+- Good: "provides simple Q&A and basic document classification capabilities"
+- Bad: "focuses on efficiency rather than handling complex processing"
+- Good: "focuses on optimizing document throughput and processing speed"
+- Bad: "uses pre-defined rules with agents handling only basic tasks"
+- Good: "uses standardized rule frameworks with agents managing document classification"
+It is very important to consider the following:
+<VERY IMPORTANT>
+IMMEDIATE RED FLAGS - Mark as needing regeneration if ANY option contains:
+- "but not necessarily"
+- "at the expense of"
+- "sometimes at the expense"
+- "rather than [core concept]"
+- "ensuring X rather than Y"
+- "without necessarily"
+- "but has no impact on"
+</VERY IMPORTANT>
+"""
+IMMEDIATE_RED_FLAGS = """
+IMMEDIATE RED FLAGS - Mark as needing regeneration if ANY option contains:
+CONTRADICTORY SECOND CLAUSES:
+- "but not necessarily"
+- "at the expense of"
+- "sometimes at the expense"
+- "rather than [core concept]"
+- "ensuring X rather than Y"
+- "without necessarily"
+- "but has no impact on"
+- "but cannot"
+- "but prevents"
+- "but limits"
+- "but reduces"
+EXPLICIT NEGATIONS OF CORE CONCEPTS:
+- "without automating"
+- "without incorporating"
+- "without using"
+- "without supporting"
+- "preventing [main benefit]"
+- "limiting [main capability]"
+- "reducing the need for [core function]"
+OPPOSITE DESCRIPTIONS:
+- "fixed steps" or "rigid sequences" (when describing flexible systems)
+- "manual intervention" (when describing automation)
+- "passive components" (when describing active agents)
+- "simple question answering" (when describing complex processing)
+- "predefined rules" (when describing adaptive systems)
+ABSOLUTE/COMPARATIVE TERMS TO AVOID:
+- "always," "never," "exclusively," "purely," "solely," "only"
+- "primarily," "mainly," "instead of," "as opposed to"
+- "all," "every," "none," "nothing," "must," "required"
+- "completely," "totally," "utterly," "impossible"
+HEDGING THAT CREATES OBVIOUS LIMITATIONS:
+- "sometimes," "occasionally," "might," "could potentially"
+- "generally," "typically," "usually" (when limiting capabilities)
+- "to some extent," "partially," "somewhat"
+TRADE-OFF LANGUAGE THAT CREATES FALSE DICHOTOMIES:
+- "focusing on X instead of Y"
+- "prioritizing X over Y"
+- "emphasizing X rather than Y"
+- "optimizing for X at the cost of Y"
+Check for descriptions of opposite approaches:
+Identify when an answer describes a fundamentally different methodology
+For example, "intuition-based" vs "evaluation-based", "feature-driven" vs "evaluation-driven"
+"""

quiz_generator/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .generator import QuizGenerator
2	+
3	+ __all__ = ['QuizGenerator']

quiz_generator/assessment.py ADDED Viewed

	@@ -0,0 +1,190 @@

+import concurrent.futures
+from typing import List
+from openai import OpenAI
+from models import Assessment, MultipleChoiceQuestion, MultipleChoiceOption
+from .question_generation import generate_multiple_choice_question
+from .question_improvement import judge_question_quality
+from .question_ranking import rank_questions
+import time
+import threading
+import json
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from ui.run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+def generate_assessment(client: OpenAI, model: str, temperature: float, learning_objective_generator, file_contents: List[str], num_objectives: int) -> Assessment:
+        """
+        Generate a complete assessment with learning objectives and questions.
+        Args:
+            file_contents: List of file contents with source tags
+            num_objectives: Number of learning objectives to generate
+        Returns:
+            Complete assessment
+        """
+        print(f"Generating assessment with {num_objectives} learning objectives")
+        start_time = time.time()
+        # Generate learning objectives using the new optimized workflow
+        # This generates base objectives, groups them, and generates incorrect answers only for best-in-group
+        result = learning_objective_generator.generate_and_group_learning_objectives(file_contents, num_objectives)
+        # Use the enhanced best-in-group objectives for question generation
+        learning_objectives = result["best_in_group"]
+        # Generate questions for each learning objective in parallel
+        questions = generate_questions_in_parallel(client, model, temperature, learning_objectives, file_contents)
+        # Rank questions based on quality criteria
+        ranked_questions = rank_questions(questions, file_contents)
+        print(f"Ranked {len(ranked_questions)} questions")
+        # Create assessment
+        assessment = Assessment(
+            learning_objectives=learning_objectives,
+            questions=ranked_questions
+        )
+        end_time = time.time()
+        print(f"Assessment generation completed in {end_time - start_time:.2f} seconds")
+        return assessment
+def generate_questions_in_parallel(client: OpenAI, model: str, temperature: float, learning_objectives: List['RankedLearningObjective'], file_contents: List[str]) -> List[MultipleChoiceQuestion]:
+        """
+        Generate multiple choice questions in parallel for each learning objective.
+        Args:
+            learning_objectives: List of learning objectives
+            file_contents: List of file contents with source tags
+        Returns:
+            List of generated questions
+        """
+        run_manager = _get_run_manager()
+        if run_manager:
+            run_manager.log(f"Generating {len(learning_objectives)} questions in parallel", level="INFO")
+        start_time = time.time()
+        questions = []
+        # Function to generate a single question based on a learning objective
+        def generate_question_for_objective(objective, idx):
+            try:
+                thread_id = threading.get_ident()
+                if run_manager:
+                    run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Starting work on objective: {objective.learning_objective[:50]}...", level="DEBUG")
+                # Generate the question
+                if run_manager:
+                    run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Generating question...", level="DEBUG")
+                question = generate_multiple_choice_question(client, model, temperature, objective, file_contents)
+                # Judge question quality
+                if run_manager:
+                    run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Judging question quality...", level="DEBUG")
+                approved, feedback = judge_question_quality(client, model, temperature, question)
+                # Update question with judgment
+                question.approved = approved
+                question.judge_feedback = feedback
+                if run_manager:
+                    run_manager.log(f"PARALLEL: Worker {idx} (Thread ID: {thread_id}): Question completed with approval: {approved}", level="DEBUG")
+                return question
+            except Exception as e:
+                if run_manager:
+                    run_manager.log(f"Worker {idx}: Error generating question: {str(e)}", level="ERROR")
+                # Create a placeholder question on error
+                options = [
+                    MultipleChoiceOption(option_text=f"Option {i}", is_correct=(i==0), feedback="Feedback")
+                    for i in range(4)
+                ]
+                error_question = MultipleChoiceQuestion(
+                    id=idx,
+                    question_text=f"Error generating question: {str(e)}",
+                    options=options,
+                    learning_objective_id=objective.id,
+                    source_reference=objective.source_reference,
+                    approved=False,
+                    judge_feedback=f"Error: {str(e)}"
+                )
+                return error_question
+        # Use ThreadPoolExecutor for parallel execution
+        max_workers = min(len(learning_objectives), 5)  # Limit to 5 concurrent workers
+        if run_manager:
+            run_manager.log(f"PARALLEL: Starting ThreadPoolExecutor with {max_workers} workers", level="INFO")
+        with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
+            # Submit tasks
+            if run_manager:
+                run_manager.log(f"PARALLEL: Submitting {len(learning_objectives)} tasks to thread pool", level="DEBUG")
+            future_to_idx = {executor.submit(generate_question_for_objective, objective, i): i
+                           for i, objective in enumerate(learning_objectives)}
+            if run_manager:
+                run_manager.log(f"PARALLEL: All tasks submitted, waiting for completion", level="DEBUG")
+            # Collect results as they complete
+            for future in concurrent.futures.as_completed(future_to_idx):
+                idx = future_to_idx[future]
+                try:
+                    question = future.result()
+                    questions.append(question)
+                    if run_manager:
+                        run_manager.log(f"Completed question {idx+1}/{len(learning_objectives)}", level="INFO")
+                except Exception as e:
+                    if run_manager:
+                        run_manager.log(f"Question {idx+1} generation failed: {str(e)}", level="ERROR")
+                    # Add a placeholder for failed questions
+                    options = [
+                        MultipleChoiceOption(option_text=f"Option {i}", is_correct=(i==0), feedback="Feedback")
+                        for i in range(4)
+                    ]
+                    error_question = MultipleChoiceQuestion(
+                        id=idx,
+                        question_text=f"Failed to generate question: {str(e)}",
+                        options=options,
+                        learning_objective_id=learning_objectives[idx].id,
+                        learning_objective=getattr(learning_objectives[idx], "learning_objective", "N/A"),
+                        correct_answer="N/A",
+                        source_reference=learning_objectives[idx].source_reference,
+                        judge_feedback=f"Error: {str(e)}",
+                        approved=False
+                    )
+                    questions.append(error_question)
+        end_time = time.time()
+        if run_manager:
+            run_manager.log(f"Question generation completed in {end_time - start_time:.2f} seconds", level="INFO")
+        return questions
+def save_assessment_to_json(assessment: Assessment, output_path: str) -> None:
+    """
+    Save assessment to a JSON file.
+    Args:
+        assessment: Assessment to save
+        output_path: Path to save the assessment to
+    """
+    # Convert assessment to dict
+    assessment_dict = assessment.model_dump()
+    # Save to file
+    with open(output_path, "w") as f:
+        json.dump(assessment_dict, f, indent=2)

quiz_generator/feedback_questions.py ADDED Viewed

	@@ -0,0 +1,210 @@

+from openai import OpenAI
+from models import MultipleChoiceQuestionFromFeedback, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
+import os
+from typing import List
+def generate_multiple_choice_question_from_feedback(client: OpenAI, model: str, temperature: float, feedback: str, file_contents: List[str]) -> MultipleChoiceQuestionFromFeedback:
+    """
+    Generate a multiple choice question based on user feedback.
+    Args:
+        feedback: User feedback or guidance
+        file_contents: List of file contents with source tags
+    Returns:
+        Generated multiple choice question
+    """
+    print(f"Processing feedback: {feedback[:100]}...")
+    # Step 1: Extract structured information from feedback using LLM
+    extraction_prompt = f"""
+    Extract the following information from the user's feedback to create a multiple choice question:
+    1. Any source references mentioned
+    2. The learning objective
+    3. The difficulty level
+    4. The original question text (if present)
+    5. Any specific feedback about what to change or improve
+    <QUESTION FOLLOWED BY USER CRITICISM>
+    {feedback}
+    </QUESTION FOLLOWED BY USER CRITICISM>
+    """
+    try:
+        # Extract structured information
+        # Different parameter handling for different model families
+        params = {
+            "model": model,
+            "response_model": MultipleChoiceQuestionFromFeedback,
+            "messages": [
+                {"role": "system", "content": "You are an expert at extracting structured information from text to prepare for question generation."},
+                {"role": "user", "content": extraction_prompt}
+            ]
+        }
+        # Add temperature parameter only if not using o-series models
+        if not TEMPERATURE_UNAVAILABLE.get(model, True):
+            params["temperature"] = temperature
+        completion = client.beta.chat.completions.parse(
+            model=model,
+            messages=[
+                {"role": "system", "content": "You are an expert at extracting structured information from text to prepare for question generation."},
+                {"role": "user", "content": extraction_prompt}
+            ],
+            temperature=params["temperature"],
+            response_format=MultipleChoiceQuestionFromFeedback
+        )
+        extraction = completion.choices[0].message.parsed
+        print(f"Extracted question structure")
+        # Step 2: Find relevant content based on extracted source references
+        source_references = []
+        if extraction.source_reference:
+            if isinstance(extraction.source_reference, list):
+                source_references = extraction.source_reference
+            else:
+                source_references = [extraction.source_reference]
+        # If no source references extracted, get all sources from file_contents
+        if not source_references:
+            for file_content in file_contents:
+                source_match = re.search(r"<source file='([^']+)'>", file_content)
+                if source_match:
+                    source = source_match.group(1)
+                    source_references.append(source)
+                    print(f"Found source file: {source}")
+        # Find relevant content based on source references
+        combined_content = ""
+        for source_file in source_references:
+            source_found = False
+            for file_content in file_contents:
+                # Look for the XML source tag with the matching filename
+                if f"<source file='{source_file}'>" in file_content:
+                    print(f"Found matching source content for {source_file}")
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    source_found = True
+                    break
+            # If no exact match found, try a more flexible match
+            if not source_found:
+                print(f"No exact match for {source_file}, looking for partial matches")
+                for file_content in file_contents:
+                    if source_file in file_content:
+                        print(f"Found partial match for {source_file}")
+                        if combined_content:
+                            combined_content += "\n\n"
+                        combined_content += file_content
+                        source_found = True
+                        break
+        # If still no matching content, use all file contents combined
+        if not combined_content:
+            print(f"No content found for any source files, using all content")
+            combined_content = "\n\n".join(file_contents)
+        # Step 3: Generate new question using extracted information and content
+        generation_prompt = f"""
+        Create a multiple choice question based on the following information:
+        USER CRITICISM:
+        {extraction.feedback}
+        EXTRACTED QUESTION STRUCTURE:
+        Question: {extraction.question_text}
+        Learning Objective: {extraction.learning_objective}
+        COURSE CONTENT:
+        {combined_content}
+        INSTRUCTIONS:
+        1. Create a question that addresses the user's critique or criticism. This is the top priority. Your response should align with the user's critique or criticism.
+        2. Base your question ONLY on the COURSE CONTENT provided above
+        3. The question must be clear, specific, and unambiguous
+        4. Include EXACTLY 4 options labeled A, B, C, and D
+        5. Have EXACTLY 1 correct answer
+        6. All options MUST include detailed feedback explaining why the answer is correct or incorrect
+        7. For the correct answer, include positive feedback that reinforces the concept
+        8. For incorrect answers, provide informative feedback explaining the misconception
+        9. All options should be plausible - no obviously wrong answers
+        10. Make sure the question tests understanding, not just memorization
+        11. Only refer to specific products if absolutely necessary
+        12. Questions should prioritize core Competencies: Identify the most critical knowledge and skills students must master
+        13. Questions should align with Course Purpose: Ensure objectives directly support the overarching goals of the course
+        14. Questions should Consider Long-term Value: Focus on enduring understandings that students will use beyond the course
+        Available source files: {', '.join([os.path.basename(src) for src in source_references])}
+        IMPORTANT: Every option MUST have feedback. This is required.
+        """
+        # Generate new question
+        # Different parameter handling for different model families
+        params = {
+            "model": model,
+            "response_model": MultipleChoiceQuestionFromFeedback,
+            "messages": [
+                {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
+                {"role": "user", "content": generation_prompt}
+            ]
+        }
+        # Add temperature parameter only if not using o-series models
+        if not TEMPERATURE_UNAVAILABLE.get(model, True):
+            params["temperature"] = temperature
+        completion = client.beta.chat.completions.parse(
+            model=model,
+            messages=[
+                {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
+                {"role": "user", "content": generation_prompt}
+            ],
+            temperature=temperature,
+            response_format=MultipleChoiceQuestionFromFeedback
+        )
+        response = completion.choices[0].message.parsed
+        # Set ID and source reference if not already set
+        response.id = 1
+        if not response.source_reference:
+            response.source_reference = [os.path.basename(src) for src in source_references]
+        # Set learning objective if not already set
+        if not response.learning_objective and extraction.learning_objective:
+            response.learning_objective = extraction.learning_objective
+        # Set feedback from the original feedback
+        response.feedback = extraction.feedback
+        # Verify all options have feedback
+        for i, option in enumerate(response.options):
+            if not option.feedback or option.feedback.strip() == "":
+                if option.is_correct:
+                    option.feedback = "Good job! This is the correct answer."
+                else:
+                    option.feedback = f"This answer is incorrect. Please review the material again."
+        return response
+    except Exception as e:
+        print(f"Error generating question: {e}")
+        # Create a fallback question
+        options = [
+            MultipleChoiceOption(
+                option_text=f"Option {chr(65+i)}",
+                is_correct=(i==0),
+                feedback=f"{'Correct' if i==0 else 'Incorrect'} answer."
+            ) for i in range(4)
+        ]
+        return MultipleChoiceQuestionFromFeedback(
+            id=1,
+            question_text=f"Question based on feedback: {feedback[:50]}...",
+            options=options,
+            learning_objective="Understanding key concepts from the course material",
+            source_reference=["unknown"],
+            feedback=extraction.feedback
+        )

quiz_generator/generator.py ADDED Viewed

	@@ -0,0 +1,89 @@

+from typing import List
+from openai import OpenAI
+from learning_objective_generator import LearningObjectiveGenerator
+from learning_objective_generator.grouping_and_ranking import group_base_learning_objectives
+from .question_generation import generate_multiple_choice_question
+from .question_improvement import (
+    should_regenerate_incorrect_answers, regenerate_incorrect_answers, judge_question_quality
+)
+from .question_ranking import rank_questions, group_questions
+from .feedback_questions import generate_multiple_choice_question_from_feedback
+from .assessment import generate_assessment, generate_questions_in_parallel, save_assessment_to_json
+class QuizGenerator:
+    """Simple orchestrator for quiz generation."""
+    def __init__(self, api_key: str, model: str = "gpt-5", temperature: float = 1.0):
+        self.client = OpenAI(api_key=api_key)
+        self.model = model
+        self.temperature = temperature
+        self.learning_objective_generator = LearningObjectiveGenerator(
+            api_key=api_key, model=model, temperature=temperature
+        )
+    def generate_base_learning_objectives(self, file_contents: List[str], num_objectives: int, incorrect_answer_model: str = None):
+        """Generate only base learning objectives (no grouping, no incorrect answers). This allows the UI to collect objectives from multiple runs before grouping."""
+        return self.learning_objective_generator.generate_base_learning_objectives(
+            file_contents, num_objectives
+        )
+    def generate_lo_incorrect_answer_options(self, file_contents, base_objectives, model_override=None):
+        """Generate incorrect answer options for the given base learning objectives (wrapper for LearningObjectiveGenerator)."""
+        return self.learning_objective_generator.generate_incorrect_answer_options(
+            file_contents, base_objectives, model_override
+        )
+    def group_base_learning_objectives(self, base_learning_objectives, file_contents: List[str]):
+        """Group base learning objectives and identify best in group."""
+        return group_base_learning_objectives(
+            self.client, self.model, self.temperature, base_learning_objectives, file_contents
+        )
+    def generate_multiple_choice_question(self, learning_objective, file_contents: List[str]):
+        return generate_multiple_choice_question(
+            self.client, self.model, self.temperature, learning_objective, file_contents
+        )
+    def should_regenerate_incorrect_answers(self, question, file_contents: List[str], model_name: str = "gpt-5-mini"):
+        return should_regenerate_incorrect_answers(
+            self.client, question, file_contents, model_name
+        )
+    def regenerate_incorrect_answers(self, questions, file_contents: List[str]):
+        return regenerate_incorrect_answers(
+            self.client, self.model, self.temperature, questions, file_contents
+        )
+    def rank_questions(self, questions, file_contents: List[str]):
+        return rank_questions(
+            self.client, self.model, self.temperature, questions, file_contents
+        )
+    def group_questions(self, questions, file_contents: List[str]):
+        return group_questions(
+            self.client, self.model, self.temperature, questions, file_contents
+        )
+    def generate_multiple_choice_question_from_feedback(self, feedback: str, file_contents: List[str]):
+        return generate_multiple_choice_question_from_feedback(
+            self.client, self.model, self.temperature, feedback, file_contents
+        )
+    def judge_question_quality(self, question):
+        return judge_question_quality(
+            self.client, self.model, self.temperature, question
+        )
+    def generate_assessment(self, file_contents: List[str], num_objectives: int):
+        return generate_assessment(
+            self.client, self.model, self.temperature,
+            self.learning_objective_generator, file_contents, num_objectives
+        )
+    def generate_questions_in_parallel(self, learning_objectives, file_contents: List[str]):
+        return generate_questions_in_parallel(
+            self.client, self.model, self.temperature, learning_objectives, file_contents
+        )
+    def save_assessment_to_json(self, assessment, output_path: str):
+        return save_assessment_to_json(assessment, output_path)

quiz_generator/question_generation.py ADDED Viewed

	@@ -0,0 +1,217 @@

+from typing import List
+from openai import OpenAI
+from models import MultipleChoiceQuestion, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
+from prompts.questions import (
+    GENERAL_QUALITY_STANDARDS, MULTIPLE_CHOICE_STANDARDS,
+    EXAMPLE_QUESTIONS, QUESTION_SPECIFIC_QUALITY_STANDARDS,
+    CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS,
+    ANSWER_FEEDBACK_QUALITY_STANDARDS,
+)
+from prompts.incorrect_answers import (
+    INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+)
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from ui.run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+def generate_multiple_choice_question(client: OpenAI,
+    model: str,
+    temperature: float,
+    learning_objective: 'RankedLearningObjective',
+    file_contents: List[str]) -> MultipleChoiceQuestion:
+        """
+        Generate a multiple choice question for a learning objective.
+        Args:
+            learning_objective: Learning objective to generate a question for
+            file_contents: List of file contents with source tags
+        Returns:
+            Generated multiple choice question
+        """
+        run_manager = _get_run_manager()
+        # Handle source references (could be string or list)
+        source_references = learning_objective.source_reference
+        if isinstance(source_references, str):
+            source_references = [source_references]
+        if run_manager:
+            run_manager.log(f"Looking for content from source files: {source_references}", level="DEBUG")
+        # Simply collect all content that matches any of the source references
+        combined_content = ""
+        for source_file in source_references:
+            source_found = False
+            for file_content in file_contents:
+                # Look for the XML source tag with the matching filename
+                if f"<source file='{source_file}'>" in file_content:
+                    if run_manager:
+                        run_manager.log(f"Found matching source content for {source_file}", level="DEBUG")
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    source_found = True
+                    break
+            # If no exact match found, try a more flexible match
+            if not source_found:
+                if run_manager:
+                    run_manager.log(f"No exact match for {source_file}, looking for partial matches", level="DEBUG")
+                for file_content in file_contents:
+                    if source_file in file_content:
+                        if run_manager:
+                            run_manager.log(f"Found partial match for {source_file}", level="DEBUG")
+                        if combined_content:
+                            combined_content += "\n\n"
+                        combined_content += file_content
+                        source_found = True
+                        break
+        # If still no matching content, use all file contents combined
+        if not combined_content:
+            if run_manager:
+                run_manager.log(f"No content found for any source files, using all content", level="DEBUG")
+            combined_content = "\n\n".join(file_contents)
+        # Add multi-source instruction if needed
+        multi_source_instruction = ""
+        if len(source_references) > 1:
+            multi_source_instruction = """
+        <IMPORTANT FOR MULTI-SOURCE QUESTIONS>
+        This learning objective spans multiple sources. Your question should:
+        1. Synthesize information across these sources
+        2. Test understanding of overarching themes or connections
+        3. Require knowledge from multiple sources to answer correctly
+        </IMPORTANT FOR MULTI-SOURCE QUESTIONS>
+        """
+        # Create the prompt
+        prompt = f"""
+        Create a multiple choice question based on the following learning objective:
+        <LEARNING OBJECTIVE>
+        {learning_objective.learning_objective}
+        </LEARNING OBJECTIVE>
+        The correct answer to this is
+        <CORRECT ANSWER>
+        {learning_objective.correct_answer}
+        </CORRECT ANSWER>
+        Follow these important instructions for writing the quiz question:
+        <INSTRUCTIONS>
+        <General Quality Standards>
+        {GENERAL_QUALITY_STANDARDS}
+        </General Quality Standards>
+        <Multiple Choice Specific Standards>
+        {MULTIPLE_CHOICE_STANDARDS}
+        </Multiple Choice Specific Standards>
+        <Example Questions>
+        {EXAMPLE_QUESTIONS}
+        </Example Questions>
+        <Question Specific Quality Standards>
+        {QUESTION_SPECIFIC_QUALITY_STANDARDS}
+        </Question Specific Quality Standards>
+        <Correct Answer Specific Quality Standards>
+        {CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS}
+        </Correct Answer Specific Quality Standards>
+        These are the incorrect answer options:
+        <INCORRECT_ANSWER_OPTIONS>
+        {learning_objective.incorrect_answer_options}
+        </INCORRECT_ANSWER_OPTIONS>
+        Incorrect answers should follow the following examples with explanations:
+        Here are some examples of high quality incorrect answer options for each learning objective:
+        <incorrect_answer_examples>
+        {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+        </incorrect_answer_examples>
+        IMPORTANT:
+        AVOID ABSOLUTE TERMS AND UNNECESSARY COMPARISONS
+        Don't use words like "always," "never,", "mainly", "exclusively", "primarily" or "rather than".
+        These words are absolute or extreme qualifiers and comparative terms that artificially limit or overgeneralize statements, creating false dichotomies or unsubstantiated hierarchies.
+        More words you should avoid are: All, every, entire, complete, none, nothing, no one, only, solely, merely, completely, totally, utterly, always, forever, constantly, never, impossible, must, mandatory, required, instead of, as opposed to, exclusively, purely
+        <Answer Feedback Quality Standards>
+        {ANSWER_FEEDBACK_QUALITY_STANDARDS}
+        </Answer Feedback Quality Standards>
+        </INSTRUCTIONS>
+        {multi_source_instruction}
+        Below the course content that the quiz question is based on:
+        <COURSE CONTENT>
+        {combined_content}
+        </COURSE CONTENT>
+        """
+        # Generate question using instructor
+        try:
+            params = {
+                "model": model,
+                "messages": [
+                    {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
+                    {"role": "user", "content": prompt}
+                ],
+                "response_format": MultipleChoiceQuestion
+            }
+            if not TEMPERATURE_UNAVAILABLE.get(model, True):
+                params["temperature"] = temperature
+            completion = client.beta.chat.completions.parse(**params)
+            response = completion.choices[0].message.parsed
+            # Set learning objective ID and source reference
+            response.id = learning_objective.id
+            response.learning_objective_id = learning_objective.id
+            response.learning_objective = learning_objective.learning_objective
+            response.source_reference = learning_objective.source_reference
+            # Verify all options have feedback
+            for i, option in enumerate(response.options):
+                if not option.feedback or option.feedback.strip() == "":
+                    if option.is_correct:
+                        option.feedback = "Good job! This is the correct answer."
+                    else:
+                        option.feedback = f"This answer is incorrect. Please review the material again."
+            return response
+        except Exception as e:
+            print(f"Error generating question: {e}")
+            # Create a fallback question
+            options = [
+                MultipleChoiceOption(
+                    option_text=f"Option {chr(65+i)}",
+                    is_correct=(i==0),
+                    feedback=f"{'Correct' if i==0 else 'Incorrect'} answer."
+                ) for i in range(4)
+            ]
+            return MultipleChoiceQuestion(
+                id=learning_objective.id,
+                question_text=f"Question for learning objective: {learning_objective.learning_objective}",
+                options=options,
+                learning_objective_id=learning_objective.id,
+                learning_objective=learning_objective.learning_objective,
+                source_reference=learning_objective.source_reference,
+            )

quiz_generator/question_improvement.py ADDED Viewed

	@@ -0,0 +1,578 @@

+from typing import List, Tuple, Optional
+from openai import OpenAI
+from models import MultipleChoiceQuestion, MultipleChoiceOption, TEMPERATURE_UNAVAILABLE
+from prompts.incorrect_answers import INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
+from prompts.questions import GENERAL_QUALITY_STANDARDS, MULTIPLE_CHOICE_STANDARDS, EXAMPLE_QUESTIONS
+import json
+import os
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from ui.run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+def should_regenerate_incorrect_answers(client: OpenAI,model: str, temperature: float, question: MultipleChoiceQuestion, file_contents: List[str], model_name: str = "gpt-5-mini") -> bool:
+    """
+    Check if a question needs regeneration of incorrect answer options using a lightweight model.
+    Args:
+        question: Question to check
+        file_contents: List of file contents with source tags
+        model_name: Model to use for checking (default: gpt-5-mini)
+    Returns:
+        Boolean indicating whether the question needs regeneration
+    """
+    print(f"Checking if question ID {question.id} needs incorrect answer regeneration using {model_name}")
+    # Format the question for display in the prompt
+    question_display = (
+        f"ID: {question.id}\n"
+        f"Question: {question.question_text}\n"
+        f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in question.options])}\n"
+        f"Learning Objective: {question.learning_objective}\n"
+        f"Learning Objective ID: {question.learning_objective_id}\n"
+        f"Correct Answer: {question.correct_answer}\n"
+        f"Source Reference: {question.source_reference}"
+    )
+    # Extract relevant content based on source references (simplified version)
+    combined_content = ""
+    if question.source_reference:
+        source_references = question.source_reference if isinstance(question.source_reference, list) else [question.source_reference]
+        for source_file in source_references:
+            for file_content in file_contents:
+                if f"<source file='{source_file}'>" in file_content:
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    break
+    # If no content found, use a sample of all content
+    if not combined_content:
+        combined_content = "\n\n".join(file_contents)  # Just use first two content files for efficiency
+    # Create a simplified prompt focused just on checking if regeneration is needed
+    check_prompt = f"""
+    Below is a multiple choice question. Evaluate ONLY the INCORRECT answer options against the below guidelines. Respond
+    only TRUE OR FALSE if it needs regeneration
+    {question_display}
+    Consider the course content to help you make informed decisions:
+    COURSE CONTENT:
+    {combined_content}
+    Here are some examples of high quality incorrect answer suggestions which you should follow:
+    <incorrect_answer_examples>
+    {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+    </incorrect_answer_examples>
+    Refer to the correct answer <correct_answer>{question.correct_answer}</correct_answer>.
+    Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
+    Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence long
+    {IMMEDIATE_RED_FLAGS}
+    """
+    # Call OpenAI API with the lightweight model
+    try:
+        params = {
+            "model": "gpt-5-mini",
+            "messages": [
+                {"role": "system", "content": "You are an expert educational assessment evaluator. You determine if incorrect answer options meet quality standards."},
+                {"role": "user", "content": check_prompt}
+            ],
+            #"temperature": 0.7
+        }
+        completion = client.chat.completions.create(**params)
+        response_text = completion.choices[0].message.content.strip().lower()
+        print(f"Checking response text output: {response_text}")
+        # Check if regeneration is needed
+        needs_regeneration = "true" in response_text
+        #needs_regeneration = True
+        print(f"Question ID {question.id} needs regeneration: {needs_regeneration} ({response_text})")
+        return needs_regeneration
+    except Exception as e:
+        print(f"Error checking regeneration need for question ID {question.id}: {str(e)}")
+        # If there's an error, assume regeneration is needed to be safe
+        return False
+def regenerate_incorrect_answers(client: OpenAI, model: str, temperature: float, questions: List[MultipleChoiceQuestion], file_contents: List[str]) -> List[MultipleChoiceQuestion]:
+    """
+    Regenerate incorrect answer options for questions.
+    Args:
+        questions: List of questions to improve
+        file_contents: List of file contents with source tags
+    Returns:
+        The same list of questions with improved incorrect answer options
+    """
+    print(f"Regenerating incorrect answer options for {len(questions)} questions")
+    for i, question in enumerate(questions):
+        # Check if this question needs regeneration
+        # if not self.should_regenerate_incorrect_answers(question, file_contents):
+        #     print(f"Question ID {question.id} does not need regeneration. Skipping.")
+        #     continue
+        # Extract relevant content based on source references
+        combined_content = ""
+        if question.source_reference:
+            source_references = question.source_reference if isinstance(question.source_reference, list) else [question.source_reference]
+            for source_file in source_references:
+                source_found = False
+                for file_content in file_contents:
+                    # Look for the XML source tag with the matching filename
+                    if f"<source file='{source_file}'>" in file_content:
+                        print(f"Found matching source content for {source_file}")
+                        if combined_content:
+                            combined_content += "\n\n"
+                        combined_content += file_content
+                        source_found = True
+                        break
+                # If no exact match found, try a more flexible match
+                if not source_found:
+                    print(f"No exact match for {source_file}, looking for partial matches")
+                    for file_content in file_contents:
+                        if source_file in file_content:
+                            print(f"Found partial match for {source_file}")
+                            if combined_content:
+                                combined_content += "\n\n"
+                            combined_content += file_content
+                            source_found = True
+                            break
+            # If still no matching content, use all file contents combined
+            if not combined_content:
+                print(f"No content found for any source files, using all content")
+                combined_content = "\n\n".join(file_contents)
+        else:
+            # If no source references, use all content
+            combined_content = "\n\n".join(file_contents)
+        # Find the correct option
+        correct_option = None
+        for opt in question.options:
+            if opt.is_correct:
+                correct_option = opt
+                break
+        if not correct_option:
+            print(f"Warning: No correct option found in question ID {question.id}. Skipping.")
+            continue
+        # Process each incorrect option individually
+        updated_options = [correct_option]  # Start with the correct option
+        options_regenerated = 0
+        for opt in question.options:
+            if opt.is_correct:
+                continue  # Skip the correct option, already added
+            # Check if this specific option needs regeneration
+            needs_regeneration, reason = should_regenerate_individual_option(client, model, temperature, question, opt, combined_content)
+            if needs_regeneration:
+                # Regenerate this specific option
+                print(f"Regenerating option '{opt.option_text}' for question ID {question.id}")
+                new_option = regenerate_individual_option(client, model, temperature, question, opt, correct_option, combined_content, reason)
+                if new_option:
+                    updated_options.append(new_option)
+                    options_regenerated += 1
+                else:
+                    # If regeneration failed, keep the original
+                    updated_options.append(opt)
+            else:
+                # Option doesn't need regeneration, keep as is
+                print(f"Option '{opt.option_text}' for question ID {question.id} does not need regeneration")
+                updated_options.append(opt)
+        # Update the question with the new options
+        questions[i].options = updated_options
+        print(f"Regenerated {options_regenerated} options for question ID {question.id}")
+    return questions
+def should_regenerate_individual_option(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion, option: MultipleChoiceOption, content: str) -> Tuple[bool, str]:
+    """
+    Check if a specific incorrect option needs regeneration.
+    Args:
+        question: The question containing the option
+        option: The specific option to check
+        content: The relevant content for context
+    Returns:
+        Tuple of (Boolean indicating whether the option needs regeneration, Reason for the decision)
+    """
+    print(f"Checking if option '{option.option_text}' needs regeneration")
+    # Format the question and option for display
+    question_display = (
+        f"Question: {question.question_text}\n"
+        f"Learning Objective: {question.learning_objective}\n"
+        f"Correct Answer: {question.correct_answer}\n"
+    )
+    option_display = (
+        f"Option Text: {option.option_text}\n"
+        f"Feedback: {option.feedback}\n"
+    )
+    # Create a simplified prompt focused just on checking this option
+    check_prompt = f"""
+    Below is a multiple choice question and ONE incorrect answer option. Evaluate ONLY THIS OPTION against the quality guidelines.
+    {question_display}
+    INCORRECT OPTION TO EVALUATE:
+    {option_display}
+    Consider the course content to help you make informed decisions:
+    COURSE CONTENT:
+    {content}
+    Here are some examples of high quality incorrect answer suggestions which you should follow:
+    <incorrect_answer_examples>
+    {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+    </incorrect_answer_examples>
+    Refer to the correct answer <correct_answer>{question.correct_answer}</correct_answer>.
+    Make sure incorrect answers match the correct answer in terms of length, complexity, phrasing, style, and subject matter.
+    Incorrect answers should be of approximate equal length to the correct answer, preferably one sentence long
+    {IMMEDIATE_RED_FLAGS}
+    TASK: Determine if this specific incorrect option needs improvement based on the guidelines.
+    Answer with ONLY "true" if improvements are needed or "false" if no improvements are needed. Follow this by a one sentence explanation of why it needs regeneration.
+    """
+    # Call OpenAI API with the lightweight model
+    try:
+        params = {
+            "model": "gpt-5-mini",
+            "messages": [
+                {"role": "system", "content": "You are an expert educational assessment evaluator. You determine if incorrect answer options meet quality standards."},
+                {"role": "user", "content": check_prompt}
+            ]
+        }
+        completion = client.chat.completions.create(**params)
+        response_text = completion.choices[0].message.content.strip().lower()
+        print(f"Checking option response: {response_text}")
+        # Check if regeneration is needed
+        needs_regeneration = "true" in response_text
+        # Extract reason if available (everything after true/false)
+        reason = "No specific reason provided"
+        if " " in response_text:
+            parts = response_text.split(" ", 1)
+            if len(parts) > 1:
+                reason = parts[1].strip()
+        print(f"Option '{option.option_text}' needs regeneration: {needs_regeneration}")
+        return needs_regeneration, reason
+    except Exception as e:
+        print(f"Error checking option regeneration need: {str(e)}")
+        # If there's an error, assume regeneration is not needed
+        return False, f"Error during evaluation: {str(e)}"
+def regenerate_individual_option(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion, option: MultipleChoiceOption,
+                                correct_option: MultipleChoiceOption, content: str, reason: str) -> Optional[MultipleChoiceOption]:
+    """
+    Regenerate a specific incorrect option.
+    Args:
+        question: The question containing the option
+        option: The specific option to regenerate
+        correct_option: The correct option for context
+        content: The relevant content for context
+        reason: Reason why the option needs regeneration
+    Returns:
+        A new MultipleChoiceOption or None if regeneration failed
+    """
+    print(f"Regenerating option '{option.option_text}'")
+    # Format the question and options for display
+    question_display = (
+        f"Question: {question.question_text}\n"
+        f"Learning Objective: {question.learning_objective}\n"
+        f"Correct Answer: {question.correct_answer}\n"
+        f"Correct Option: {correct_option.option_text}\n"
+        f"Correct Option Feedback: {correct_option.feedback}\n"
+    )
+    option_display = (
+        f"Incorrect Option to Improve: {option.option_text}\n"
+        f"Current Feedback: {option.feedback}\n"
+    )
+    # Create a prompt focused on regenerating this specific option
+    regeneration_prompt = f"""
+    Below is a multiple choice question with the CORRECT option and ONE INCORRECT option that needs improvement.
+    {question_display}
+    INCORRECT OPTION TO IMPROVE:
+    {option_display}
+    Consider the course content to help you make informed decisions:
+    COURSE CONTENT:
+    {content}
+    Here are some examples of high quality incorrect answer suggestions which you should follow:
+    <incorrect_answer_examples>
+    {INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION}
+    </incorrect_answer_examples>
+    Consider also the quality standards for writing options and feedback:
+    <General Quality Standards>
+    {GENERAL_QUALITY_STANDARDS}
+    </General Quality Standards>
+    These are some examples of questions and their answer options along with their feedback which you should follow:
+    <Example Questions>
+    {EXAMPLE_QUESTIONS}
+    </Example Questions>
+    <Additional Guidelines>
+        - be in the language and tone of the course.
+        - be at a similar level of difficulty or complexity as encountered in the course.
+        - assess only information from the course and not depend on information that was
+        not covered in the course.
+        - not attempt to teach something as part of the quiz.
+        - use clear and concise language
+        - not induce confusion
+        - provide a slight (not major) challenge.
+        - be easily interpreted and unambiguous.
+        - be well written in clear and concise language, proper grammar, good sentence
+        structure, and consistent formatting
+        - be thoughtful and specific rather than broad and ambiguous
+        - be complete in its wording such that understanding the question is not part
+        of the assessment
+        Incorrect answer feedback should:
+        - be informational and encouraging, not punitive.
+        - be a single sentence, concise and to the point.
+        - Do not say "Incorrect" or "Wrong".
+    </Additional Guidelines>
+    {IMMEDIATE_RED_FLAGS}
+    Return ONLY the improved incorrect option and its feedback in this exact JSON format:
+    {{"option_text": "Your improved incorrect option text here", "feedback": "Your improved feedback explaining why this option is incorrect"}}
+    The option_text must be factually incorrect but plausible, and the feedback must explain why it's incorrect.
+    """
+    # Call OpenAI API
+    try:
+        params = {
+            "model": model,
+            "messages": [
+                {"role": "system", "content": "You are an expert educational assessment creator specializing in creating high-quality multiple choice questions with detailed feedback for each option."},
+                {"role": "user", "content": regeneration_prompt}
+            ],
+            "response_format": {"type": "json_object"}
+        }
+        if not TEMPERATURE_UNAVAILABLE.get(model, True):
+            params["temperature"] = temperature
+        completion = client.chat.completions.create(**params)
+        response_text = completion.choices[0].message.content
+        # Parse the JSON response
+        try:
+            response_data = json.loads(response_text)
+            new_option_text = response_data.get("option_text", "")
+            new_feedback = response_data.get("feedback", "")
+            if not new_option_text or not new_feedback:
+                print(f"Error: Missing option_text or feedback in response")
+                return None
+            # Create a new option with the regenerated text and feedback
+            new_option = MultipleChoiceOption(
+                option_text=new_option_text,
+                is_correct=False,
+                feedback=new_feedback
+            )
+            #print(f"Successfully regenerated option: '{new_option_text}'")
+            # Log the regeneration for debugging
+            option_index = next((i for i, opt in enumerate(question.options) if opt.option_text == option.option_text), -1)
+            debug_dir = os.path.join("wrong_answer_debug")
+            os.makedirs(debug_dir, exist_ok=True)
+            # Create a log file for this question
+            log_file = os.path.join(debug_dir, f"question_{question.id}_option_{option_index}.txt")
+            # Format the log message
+            log_message = f"""
+Question ID: {question.id}
+Question: {question.question_text}
+REASON FOR REGENERATION:
+{reason}
+BEFORE:
+Option Text: {option.option_text}
+Feedback: {option.feedback}
+AFTER:
+Option Text: {new_option.option_text}
+Feedback: {new_option.feedback}
+"""
+            # Write to the log file
+            with open(log_file, "w") as f:
+                f.write(log_message)
+            # Also print to console
+            print(f"\n--- Regenerated Option for Question {question.id}, Option {option_index} ---")
+            print(f"BEFORE: {option.option_text}")
+            print(f"AFTER:  {new_option.option_text}")
+            print(f"Log saved to {log_file}")
+            return new_option
+        except json.JSONDecodeError as e:
+            print(f"Error parsing JSON response: {str(e)}")
+            print(f"Raw response: {response_text}")
+            return None
+    except Exception as e:
+        print(f"Error regenerating option: {str(e)}")
+        return None
+def judge_question_quality(client: OpenAI, model: str, temperature: float, question: MultipleChoiceQuestion) -> Tuple[bool, str, float]:
+        """
+        Judge the quality of a question based on quality standards.
+        Args:
+            question: Question to judge
+        Returns:
+            Tuple of (approved, feedback, score)
+        """
+        run_manager = _get_run_manager()
+        # Create the prompt
+        prompt = f"""
+        Evaluate the quality of the following multiple choice question based on the provided quality standards.
+        Question: {question.question_text}
+        Options:
+        {json.dumps([{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in question.options], indent=2)}
+        Learning Objective: The question is testing the following learning objective:
+        {question.learning_objective}
+        Quality Standards to evaluate against:
+        1. Alignment: The question should align with the learning objective and test understanding of course content.
+        2. Clarity: The question should be clear, unambiguous, and well-written.
+        3. Difficulty: The question should be challenging but fair for someone who has studied the material.
+        4. Options: The options should be plausible, with one clearly correct answer.
+        5. Feedback: Each option should have appropriate feedback that explains why it is correct or incorrect.
+        Provide:
+        1. Detailed feedback on the question's strengths and weaknesses. Two or three sentences
+        2. A final decision on whether to approve the question (true/false)
+        Format your response as a JSON object with the following fields:
+        {{
+            "feedback": string,
+            "approved": boolean
+        }}
+        """
+        # Generate judgment
+        # Different parameter handling for different model families
+        params = {
+            "model": model,
+            "messages": [
+                {"role": "system", "content": "You are an expert educational assessment evaluator."},
+                {"role": "user", "content": prompt}
+            ]
+        }
+        # Add temperature parameter only if not using o-series models
+        if not TEMPERATURE_UNAVAILABLE.get(model, True):
+            params["temperature"] = temperature
+        response = client.chat.completions.create(**params)
+        # Parse the response
+        try:
+            # Get the raw response content
+            raw_content = response.choices[0].message.content
+            if run_manager:
+                # Log full response in DEBUG for detailed tracking
+                run_manager.log(f"DEBUG - Raw judge response: {raw_content}", level="DEBUG")
+            # Try to extract JSON from the response if it's not pure JSON
+            # Sometimes the model includes explanatory text before or after the JSON
+            import re
+            json_match = re.search(r'\{[\s\S]*\}', raw_content)
+            if json_match:
+                json_str = json_match.group(0)
+                result = json.loads(json_str)
+            else:
+                # If no JSON pattern found, try the raw content
+                result = json.loads(raw_content)
+            return result["approved"], result["feedback"]
+        except Exception as e:
+            if run_manager:
+                run_manager.log(f"Error parsing judge response: {e}", level="ERROR")
+                run_manager.log(f"Raw response content: {response.choices[0].message.content[:200]}...", level="DEBUG")
+            # Return default values if parsing fails
+            return True, "Question meets basic quality standards"

quiz_generator/question_ranking.py ADDED Viewed

	@@ -0,0 +1,474 @@

+from typing import List
+from openai import OpenAI
+from models import MultipleChoiceQuestion, GroupedMultipleChoiceQuestion, RankedMultipleChoiceQuestion, RankedMultipleChoiceQuestionsResponse, GroupedMultipleChoiceQuestionsResponse
+from prompts.questions import RANK_QUESTIONS_PROMPT, GROUP_QUESTIONS_PROMPT
+import json
+def rank_questions(client: OpenAI, model: str, temperature: float, questions: List[GroupedMultipleChoiceQuestion], file_contents: List[str]) -> dict:
+    """
+    Rank multiple choice questions based on quality criteria.
+    Args:
+        questions: List of questions to rank
+        file_contents: List of file contents with source tags
+    Returns:
+        List of ranked questions with ranking explanations
+    """
+    try:
+        print(f"Ranking {len(questions)} questions")
+                # Separate out the ID=1 question (if present)
+        lo_one_question = None
+        questions_to_rank = []
+        for q in questions:
+            if q.learning_objective_id == 1:
+                lo_one_question = q
+            else:
+                questions_to_rank.append(q)
+        if not questions_to_rank:
+            return {"ranked": questions}  # Nothing to rank
+        # Format questions for display in the prompt
+        questions_display = "\n\n".join([
+            f"ID: {q.id}\n"
+            f"Question: {q.question_text}\n"
+            f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in q.options])}\n"
+            f"Learning Objective: {q.learning_objective}\n"
+            f"Learning Objective ID: {q.learning_objective_id}\n"
+            f"Correct Answer: {q.correct_answer}\n"
+            f"Source Reference: {q.source_reference}\n"
+            f"Judge Feedback: {getattr(q, 'judge_feedback', 'N/A')}\n"
+            f"Approved: {getattr(q, 'approved', 'N/A')}\n"
+            f"Group Members: {q.group_members}\n"
+            f"In Group: {q.in_group}\n"
+            f"Best in Group: {q.best_in_group}\n"
+            for q in questions_to_rank
+        ])
+        # Extract all unique source references from questions
+        all_source_references = set()
+        for q in questions:
+            if isinstance(q.source_reference, list):
+                all_source_references.update(q.source_reference)
+            else:
+                all_source_references.add(q.source_reference)
+        # Combine content from all source references
+        combined_content = ""
+        for source_file in all_source_references:
+            source_found = False
+            for file_content in file_contents:
+                # Look for the XML source tag with the matching filename
+                if f"<source file='{source_file}'>" in file_content:
+                    print(f"Found matching source content for {source_file}")
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    source_found = True
+                    break
+            # If no exact match found, try a more flexible match
+            if not source_found:
+                print(f"No exact match for {source_file}, looking for partial matches")
+                for file_content in file_contents:
+                    if source_file in file_content:
+                        print(f"Found partial match for {source_file}")
+                        if combined_content:
+                            combined_content += "\n\n"
+                        combined_content += file_content
+                        source_found = True
+                        break
+        # If still no matching content, use all file contents combined
+        if not combined_content:
+            print(f"No content found for any source files, using all content")
+            combined_content = "\n\n".join(file_contents)
+        # Create ranking prompt
+        ranking_prompt = f"""
+        {RANK_QUESTIONS_PROMPT}
+        Consider the questions' relevance with respect to the course content as well:
+        <course_content>
+        {combined_content}
+        </course_content>
+        Here are the questions to rank:
+        <questions>
+        {questions_display}
+        </questions>
+        For each question:
+        1. Assign a rank (1 = best, 2 = second best, etc.)
+        2. Provide a brief explanation for the ranking (2-3 sentences)
+        """
+        # Count tokens in the prompt
+        # try:
+        #     encoding = tiktoken.get_encoding("cl100k_base")
+        #     token_count = len(encoding.encode(ranking_prompt))
+        #     print(f"DEBUG - Ranking prompt token count: {token_count}")
+        #     estimated_output_tokens = len(questions_to_rank) * 250  # ~250 tokens per question in output
+        #     print(f"DEBUG - Estimated output tokens: {estimated_output_tokens}")
+        # except ImportError:
+        #     print("DEBUG - Tiktoken not installed, cannot count tokens")
+        # except Exception as e:
+        #     print(f"DEBUG - Error counting tokens: {str(e)}")
+        # # Create a simple list of dictionaries for the response
+        # class RankingItem(BaseModel):
+        #     id: int
+        #     rank: int
+        #     ranking_reasoning: str
+        # Call OpenAI API
+        print(f"DEBUG - Using model {model} for question ranking with temperature {temperature}")
+        print(f"DEBUG - Sending {len(questions)} questions to rank")
+        print(f"DEBUG - Question IDs being sent: {[q.id for q in questions]}")
+        system_prompt = "You are an expert educational content evaluator"
+        params = {
+            #"model": self.model,
+            "model": model,
+            "messages": [
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": ranking_prompt}
+            ],
+            "response_format": RankedMultipleChoiceQuestionsResponse
+        }
+        # if not is_o_series_model:
+        #     params["temperature"] = self.temperature
+        print(f"DEBUG - Making API call to rank questions")
+        completion = client.beta.chat.completions.parse(**params)
+        ranking_results = completion.choices[0].message.parsed.ranked_questions
+        print(f"DEBUG - API call successful")
+        print(f"Received {len(ranking_results)} ranking results")
+        print(f"DEBUG - Question IDs received in ranking: {[q.id for q in ranking_results]}")
+        # Check for missing questions
+        sent_ids = set(q.id for q in questions_to_rank)
+        received_ids = set(q.id for q in ranking_results)
+        missing_ids = sent_ids - received_ids
+        if missing_ids:
+            print(f"DEBUG - Missing questions with IDs: {missing_ids}")
+        # Always keep ID=1 as the first question if present
+        final_questions = []
+        if lo_one_question:
+            # Convert to RankedMultipleChoiceQuestion for consistency
+            lo_one_ranked = RankedMultipleChoiceQuestion(
+                **lo_one_question.model_dump(),
+                rank=1,
+                ranking_reasoning="First question, always rank 1"
+            )
+            final_questions.append(lo_one_ranked)
+        # Sort questions by their original rank and then reassign ranks sequentially
+        # If we have a learning_objective_id=1 question, start from rank 2, otherwise start from rank 1
+        start_rank = 2 if lo_one_question else 1
+        sorted_ranking_results = sorted(ranking_results, key=lambda x: x.rank)
+        # Assign sequential ranks in one go
+        for i, q in enumerate(sorted_ranking_results):
+            q.rank = i + start_rank
+        final_questions.extend(sorted_ranking_results)
+        # Ensure all questions have grouping information
+        for q in final_questions:
+            if not hasattr(q, "in_group") or q.in_group is None:
+                q.in_group = False
+            if not hasattr(q, "group_members") or q.group_members is None:
+                q.group_members = [q.id]
+            if not hasattr(q, "best_in_group") or q.best_in_group is None:
+                q.best_in_group = q.id == 1  # ID=1 is always best in group
+        return {
+            "ranked": final_questions,
+        }
+        # # Sort by rank
+        # ranked_questions = sorted(ranking_results, key=lambda x: x.rank)
+        # return ranked_questions
+    except Exception as e:
+        print(f"Error ranking questions: {str(e)}")
+        # Return original questions with default ranking
+        return [
+            RankedMultipleChoiceQuestion(
+            **q.model_dump(),
+            rank=i+1,
+            ranking_reasoning="Ranking failed"
+        ) for i, q in enumerate(questions)
+    ]
+def group_questions(client: OpenAI, model: str, temperature: float, questions: List[MultipleChoiceQuestion], file_contents: List[str]) -> dict:
+    """
+    Group multiple choice questions based on quality criteria.
+    Args:
+        questions: List of questions to group
+        file_contents: List of file contents with source tags
+    Returns:
+        List of ranked questions with ranking explanations
+    """
+    try:
+        print(f"Grouping {len(questions)} questions")
+                # Separate out the ID=1 question (if present)
+        if not questions:
+            return {"grouped": questions, "best_in_group": questions}  # Nothing to group
+        # Find all questions with learning_objective_id=1
+        lo_one_questions = [q for q in questions if q.learning_objective_id == 1]
+        if lo_one_questions:
+            print(f"Found {len(lo_one_questions)} questions with learning_objective_id=1")
+        # lo_one_question = None
+        # questions_to_group = []
+        # for q in questions:
+        #     if q.learning_objective_id == 1:
+        #         lo_one_question = q
+        #     else:
+        #         questions_to_group.append(q)
+        # if not questions_to_group:
+        #     return {"grouped": questions, "best_in_group": questions}  # Nothing to rank
+        # Format questions for display in the prompt
+        questions_display = "\n\n".join([
+            f"ID: {q.id}\n"
+            f"Question: {q.question_text}\n"
+            f"Options: {json.dumps([{'text': o.option_text, 'is_correct': o.is_correct, 'feedback': o.feedback} for o in q.options])}\n"
+            f"Learning Objective: {q.learning_objective}\n"
+            f"Learning Objective ID: {q.learning_objective_id}\n"
+            f"Correct Answer: {q.correct_answer}\n"
+            f"Source Reference: {q.source_reference}\n"
+            f"Judge Feedback: {getattr(q, 'judge_feedback', 'N/A')}\n"
+            f"Approved: {getattr(q, 'approved', 'N/A')}\n"
+            for q in questions
+        ])
+        # Extract all unique source references from questions
+        all_source_references = set()
+        for q in questions:
+            if isinstance(q.source_reference, list):
+                all_source_references.update(q.source_reference)
+            else:
+                all_source_references.add(q.source_reference)
+        # Combine content from all source references
+        combined_content = ""
+        for source_file in all_source_references:
+            source_found = False
+            for file_content in file_contents:
+                # Look for the XML source tag with the matching filename
+                if f"<source file='{source_file}'>" in file_content:
+                    print(f"Found matching source content for {source_file}")
+                    if combined_content:
+                        combined_content += "\n\n"
+                    combined_content += file_content
+                    source_found = True
+                    break
+            # If no exact match found, try a more flexible match
+            if not source_found:
+                print(f"No exact match for {source_file}, looking for partial matches")
+                for file_content in file_contents:
+                    if source_file in file_content:
+                        print(f"Found partial match for {source_file}")
+                        if combined_content:
+                            combined_content += "\n\n"
+                        combined_content += file_content
+                        source_found = True
+                        break
+        # If still no matching content, use all file contents combined
+        if not combined_content:
+            print(f"No content found for any source files, using all content")
+            combined_content = "\n\n".join(file_contents)
+        # Create ranking prompt
+        grouping_prompt = f"""
+        {GROUP_QUESTIONS_PROMPT}
+        For grouping, consider the questions' relevance with respect to the course content as well:
+        <course_content>
+        {combined_content}
+        </course_content>
+        Here are the questions to group:
+        <questions>
+        {questions_display}
+        </questions>
+        """
+        # # Count tokens in the prompt
+        # try:
+        #     encoding = tiktoken.get_encoding("cl100k_base")
+        #     token_count = len(encoding.encode(grouping_prompt))
+        #     print(f"DEBUG - Grouping prompt token count: {token_count}")
+        #     estimated_output_tokens = len(questions_to_rank) * 250  # ~250 tokens per question in output
+        #     print(f"DEBUG - Estimated output tokens: {estimated_output_tokens}")
+        # except ImportError:
+        #     print("DEBUG - Tiktoken not installed, cannot count tokens")
+        # except Exception as e:
+        #     print(f"DEBUG - Error counting tokens: {str(e)}")
+        # # Create a simple list of dictionaries for the response
+        # class RankingItem(BaseModel):
+        #     id: int
+        #     rank: int
+        #     ranking_reasoning: str
+        # Call OpenAI API
+        print(f"DEBUG - Using model {model} for question ranking with temperature {temperature}")
+        print(f"DEBUG - Sending {len(questions)} questions to group")
+        print(f"DEBUG - Question IDs being sent: {[q.id for q in questions]}")
+        system_prompt = "You are an expert educational content evaluator"
+        params = {
+            #"model": self.model,
+            "model": "gpt-5-mini",
+            "messages": [
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": grouping_prompt}
+            ],
+            "response_format": GroupedMultipleChoiceQuestionsResponse
+        }
+        # if not is_o_series_model:
+        #     params["temperature"] = self.temperature
+        print(f"DEBUG - Making API call to group questions")
+        completion = client.beta.chat.completions.parse(**params)
+        grouping_results = completion.choices[0].message.parsed.grouped_questions
+        print(f"DEBUG - API call successful")
+        print(f"Received {len(grouping_results)} grouping results")
+        print(f"DEBUG - Question IDs received in grouping: {[q.id for q in grouping_results]}")
+        # Check for missing questions
+        sent_ids = set(q.id for q in questions)
+        received_ids = set(q.id for q in grouping_results)
+        missing_ids = sent_ids - received_ids
+        if missing_ids:
+            print(f"DEBUG - Missing questions with IDs: {missing_ids}")
+        # Always keep ID=1 as the first question if present
+        final_questions = []
+        # if lo_one_question:
+        #     # Convert to GroupedMultipleChoiceQuestion for consistency
+        #     lo_one_grouped = GroupedMultipleChoiceQuestion(
+        #         id=lo_one_question.id,
+        #         question_text=lo_one_question.question_text,
+        #         options=lo_one_question.options,
+        #         learning_objective_id=lo_one_question.learning_objective_id,
+        #         learning_objective=lo_one_question.learning_objective,
+        #         correct_answer=lo_one_question.correct_answer,
+        #         source_reference=lo_one_question.source_reference,
+        #         judge_feedback=getattr(lo_one_question, "judge_feedback", None),
+        #         approved=getattr(lo_one_question, "approved", None),
+        #         #rank=1,
+        #         #ranking_reasoning="First question, always rank 1",
+        #         in_group=False,
+        #         group_members=[lo_one_question.id],
+        #         best_in_group=True
+        #     )
+        #     final_questions.append(lo_one_grouped)
+        # Add the rest of the questions in their ranked order
+        #sorted_grouping_results = sorted(grouping_results, key=lambda x: x.rank)
+        # Normalize best_in_group to Python bool
+        for q in grouping_results:
+            val = getattr(q, "best_in_group", False)
+            if isinstance(val, str):
+                q.best_in_group = val.lower() == "true"
+            elif isinstance(val, (bool, int)):
+                q.best_in_group = bool(val)
+            else:
+                q.best_in_group = False
+        # if lo_one_question:
+        #     final_questions[0].best_in_group = True
+        final_questions.extend(grouping_results)
+        # # Filter for best-in-group questions (including id==1 always)
+        # best_in_group_questions = [q for q in final_questions if (q.learning_objective_id == 1 and getattr(q, "best_in_group", False) is True)
+        # or getattr(q, "best_in_group", False) is True]
+        best_in_group_questions = [q for q in final_questions if getattr(q, "best_in_group", False) is True]
+                # Check if any learning objective ID=1 question is already in best_in_group
+        lo_one_in_best = any(q.learning_objective_id == 1 for q in best_in_group_questions)
+        # If no learning objective ID=1 question is in best_in_group, add the best one
+        if not lo_one_in_best and lo_one_questions:
+            print(f"No learning objective ID=1 question in best_in_group, adding one")
+            # Find the best question for learning objective ID=1
+            # First, check if any are already in a group
+            lo_one_in_group = [q for q in lo_one_questions if getattr(q, "in_group", False)]
+            if lo_one_in_group:
+                # Use the first one that's already in a group
+                best_lo_one = lo_one_in_group[0]
+            else:
+                # Otherwise, use the first one
+                best_lo_one = lo_one_questions[0]
+            print(f"Selected question ID={best_lo_one.id} for learning objective ID=1")
+            # Mark it as best_in_group
+            best_lo_one.best_in_group = True
+            # Make sure it has the other grouping attributes
+            if not hasattr(best_lo_one, "in_group") or best_lo_one.in_group is None:
+                best_lo_one.in_group = True
+            if not hasattr(best_lo_one, "group_members") or best_lo_one.group_members is None:
+                best_lo_one.group_members = [best_lo_one.id]
+            # Add it to best_in_group_questions if not already there
+            if best_lo_one.id not in [q.id for q in best_in_group_questions]:
+                best_in_group_questions.append(best_lo_one)
+            # Update it in final_questions if it's already there
+            for i, q in enumerate(final_questions):
+                if q.id == best_lo_one.id:
+                    final_questions[i] = best_lo_one
+                    break
+        return {
+            "grouped": final_questions,
+            "best_in_group": best_in_group_questions
+        }
+        # # Sort by rank
+        # ranked_questions = sorted(ranking_results, key=lambda x: x.rank)
+        # return ranked_questions
+    except Exception as e:
+        print(f"Error ranking questions: {str(e)}")
+        # Return original questions with default ranking
+        return {
+            "grouped": questions,
+            "best_in_group": [q for q in questions if getattr(q, "best_in_group", False) is True]
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+gradio
+openai
+pydantic
+python-dotenv
+nbformat

ui/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .app import create_ui
2	+
3	+ __all__ = ['create_ui']

ui/app.py ADDED Viewed

	@@ -0,0 +1,182 @@

+import gradio as gr
+from dotenv import load_dotenv
+from .objective_handlers import process_files, regenerate_objectives, process_files_and_generate_questions
+from .question_handlers import generate_questions
+from .edit_handlers import load_quiz_for_editing, accept_and_next, go_previous, save_and_download
+from .formatting import format_quiz_for_ui
+from .run_manager import get_run_manager
+from models import MODELS
+# Load environment variables
+load_dotenv()
+# Set to False to disable saving output files (folders, logs, JSON, markdown).
+# Tab 3 download of edited questions still works.
+SAVE_OUTPUTS = False
+def create_ui():
+    """Create the Gradio UI."""
+    get_run_manager().save_outputs = SAVE_OUTPUTS
+    with gr.Blocks(title="AI Course Assessment Generator") as app:
+        gr.Markdown("# AI Course Assessment Generator")
+        gr.Markdown("Upload course materials and generate learning objectives and quiz questions.")
+        with gr.Tab("Generate Learning Objectives"):
+            with gr.Row():
+                with gr.Column():
+                    files_input = gr.File(
+                        file_count="multiple",
+                        label="Upload Course Materials (.vtt, .srt, .ipynb, .md)",
+                        file_types=[".ipynb", ".vtt", ".srt", ".md"],
+                        type="filepath"
+                    )
+                    num_objectives = gr.Slider(minimum=1, maximum=20, value=4, step=1, label="Number of Learning Objectives per Run")
+                    num_runs = gr.Dropdown(
+                        choices=["1", "2", "3", "4", "5"],
+                        value="2",
+                        label="Number of Generation Runs"
+                    )
+                    model_dropdown = gr.Dropdown(
+                        choices=MODELS,
+                        value="gpt-5.2",
+                        label="Model"
+                    )
+                    incorrect_answer_model_dropdown = gr.Dropdown(
+                        choices=MODELS,
+                        value="gpt-5.2",
+                        label="Model for Incorrect Answer Suggestions"
+                    )
+                    temperature_dropdown = gr.Dropdown(
+                        choices=["0.0", "0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9", "1.0"],
+                        value="1.0",
+                        label="Temperature (0.0: Deterministic, 1.0: Creative)"
+                    )
+                    generate_button = gr.Button("Generate Learning Objectives")
+                    generate_all_button = gr.Button("Generate all", variant="primary")
+                with gr.Column():
+                    status_output = gr.Textbox(label="Status")
+                    objectives_output = gr.Textbox(label="Best-in-Group Learning Objectives", lines=10)
+                    grouped_output = gr.Textbox(label="All Grouped Learning Objectives", lines=10)
+                    raw_ungrouped_output = gr.Textbox(label="Raw Ungrouped Learning Objectives (Debug)", lines=10)
+                    feedback_input = gr.Textbox(label="Feedback on Learning Objectives")
+                    regenerate_button = gr.Button("Regenerate Learning Objectives Based on Feedback")
+        with gr.Tab("Generate Questions"):
+            with gr.Row():
+                with gr.Column():
+                    objectives_input = gr.Textbox(label="Learning Objectives JSON", lines=10, max_lines=10)
+                    model_dropdown_q = gr.Dropdown(
+                        choices=MODELS,
+                        value="gpt-5.2",
+                        label="Model"
+                    )
+                    temperature_dropdown_q = gr.Dropdown(
+                        choices=["0.0", "0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9", "1.0"],
+                        value="1.0",
+                        label="Temperature (0.0: Deterministic, 1.0: Creative)"
+                    )
+                    num_questions_slider = gr.Slider(minimum=1, maximum=10, value=10, step=1, label="Number of questions")
+                    num_runs_q = gr.Slider(minimum=1, maximum=5, value=2, step=1, label="Number of Question Generation Runs")
+                    generate_q_button = gr.Button("Generate Questions")
+                with gr.Column():
+                    status_q_output = gr.Textbox(label="Status")
+                    best_questions_output = gr.Textbox(label="Ranked Best-in-Group Questions", lines=10)
+                    all_questions_output = gr.Textbox(label="All Grouped Questions", lines=10)
+                    formatted_quiz_output = gr.Textbox(label="Formatted Quiz", lines=15)
+        with gr.Tab("Propose/Edit Question"):
+            # State for editing flow
+            questions_state = gr.State([])
+            index_state = gr.State(0)
+            edited_state = gr.State([])
+            with gr.Row():
+                with gr.Column():
+                    edit_status = gr.Textbox(label="Status", interactive=False)
+                    edit_button = gr.Button("Edit questions", variant="primary")
+                    question_editor = gr.Textbox(
+                        label="Question",
+                        lines=15,
+                        interactive=True,
+                        placeholder="Click 'Edit questions' to load the generated quiz."
+                    )
+                    with gr.Row():
+                        prev_button = gr.Button("Previous")
+                        next_button = gr.Button("Accept & Next", variant="primary")
+                    download_button = gr.Button("Download edited quiz")
+                    download_file = gr.File(label="Download", interactive=False)
+        # Set up event handlers
+        generate_button.click(
+            process_files,
+            inputs=[files_input, num_objectives, num_runs, model_dropdown, incorrect_answer_model_dropdown, temperature_dropdown],
+            outputs=[status_output, objectives_output, grouped_output, raw_ungrouped_output]
+        )
+        generate_all_button.click(
+            process_files_and_generate_questions,
+            inputs=[
+                files_input, num_objectives, num_runs, model_dropdown, incorrect_answer_model_dropdown, temperature_dropdown,
+                model_dropdown_q, temperature_dropdown_q, num_questions_slider, num_runs_q
+            ],
+            outputs=[
+                status_output, objectives_output, grouped_output, raw_ungrouped_output,
+                status_q_output, best_questions_output, all_questions_output, formatted_quiz_output
+            ]
+        )
+        regenerate_button.click(
+            regenerate_objectives,
+            inputs=[objectives_output, feedback_input, num_objectives, num_runs, model_dropdown, temperature_dropdown],
+            outputs=[status_output, objectives_output]
+        )
+        objectives_output.change(
+            lambda x: x,
+            inputs=[objectives_output],
+            outputs=[objectives_input]
+        )
+        generate_q_button.click(
+            generate_questions,
+            inputs=[objectives_input, model_dropdown_q, temperature_dropdown_q, num_questions_slider, num_runs_q],
+            outputs=[status_q_output, best_questions_output, all_questions_output, formatted_quiz_output]
+        )
+        best_questions_output.change(
+            format_quiz_for_ui,
+            inputs=[best_questions_output],
+            outputs=[formatted_quiz_output]
+        )
+        edit_button.click(
+            load_quiz_for_editing,
+            inputs=[formatted_quiz_output],
+            outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
+        )
+        next_button.click(
+            accept_and_next,
+            inputs=[question_editor, questions_state, index_state, edited_state],
+            outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
+        )
+        prev_button.click(
+            go_previous,
+            inputs=[question_editor, questions_state, index_state, edited_state],
+            outputs=[edit_status, question_editor, questions_state, index_state, edited_state, next_button]
+        )
+        download_button.click(
+            save_and_download,
+            inputs=[question_editor, questions_state, index_state, edited_state],
+            outputs=[edit_status, download_file]
+        )
+    return app
+if __name__ == "__main__":
+    app = create_ui()
+    app.launch()

ui/content_processor.py ADDED Viewed

	@@ -0,0 +1,186 @@

+import os
+import nbformat
+from typing import List, Dict, Any, Tuple
+def _get_run_manager():
+    """Get run manager if available, otherwise return None."""
+    try:
+        from .run_manager import get_run_manager
+        return get_run_manager()
+    except:
+        return None
+class ContentProcessor:
+    """Processes content from .vtt, .srt, .ipynb, and .md files."""
+    def __init__(self):
+        """Initialize the ContentProcessor."""
+        self.file_contents = []
+        self.run_manager = _get_run_manager()
+    def process_file(self, file_path: str) -> List[str]:
+        """
+        Process a file based on its extension and return the content.
+        Args:
+            file_path: Path to the file to process
+        Returns:
+            List containing the file content with source tags
+        """
+        _, ext = os.path.splitext(file_path)
+        if ext.lower() in ['.vtt', '.srt']:
+            return self._process_subtitle_file(file_path)
+        elif ext.lower() == '.ipynb':
+            return self._process_notebook_file(file_path)
+        elif ext.lower() == '.md':
+            return self._process_markdown_file(file_path)
+        else:
+            raise ValueError(f"Unsupported file type: {ext}")
+    def _process_subtitle_file(self, file_path: str) -> List[str]:
+        """Process a subtitle file (.vtt or .srt)."""
+        try:
+            filename = os.path.basename(file_path)
+            if self.run_manager:
+                self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
+            with open(file_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+            # Simple processing for subtitle files
+            # Remove timestamp lines and other metadata
+            lines = content.split('\n')
+            text_content = []
+            for line in lines:
+                # Skip empty lines, timestamp lines, and subtitle numbers
+                if (line.strip() and
+                    not line.strip().isdigit() and
+                    not '-->' in line and
+                    not line.strip().startswith('WEBVTT')):
+                    text_content.append(line.strip())
+            # Combine all text into a single content string
+            combined_text = "\n".join(text_content)
+            # Add XML source tags at the beginning and end of the content
+            tagged_content = f"<source file='{filename}'>\n{combined_text}\n</source>"
+            return [tagged_content]
+        except Exception as e:
+            if self.run_manager:
+                self.run_manager.log(f"Error processing subtitle file {file_path}: {e}", level="ERROR")
+            return []
+    def _process_markdown_file(self, file_path: str) -> List[str]:
+        """Process a Markdown file (.md)."""
+        try:
+            filename = os.path.basename(file_path)
+            if self.run_manager:
+                self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
+            with open(file_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+            # Add XML source tags at the beginning and end of the content
+            tagged_content = f"<source file='{filename}'>\n{content}\n</source>"
+            return [tagged_content]
+        except Exception as e:
+            if self.run_manager:
+                self.run_manager.log(f"Error processing markdown file {file_path}: {e}", level="ERROR")
+            return []
+    def _process_notebook_file(self, file_path: str) -> List[str]:
+        """Process a Jupyter notebook file (.ipynb)."""
+        try:
+            filename = os.path.basename(file_path)
+            if self.run_manager:
+                self.run_manager.log(f"Found source file: {filename}", level="DEBUG")
+            # First check if the file is valid JSON
+            try:
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    import json
+                    # Try to parse as JSON first
+                    json.load(f)
+            except json.JSONDecodeError as json_err:
+                if self.run_manager:
+                    self.run_manager.log(f"File {file_path} is not valid JSON: {json_err}", level="DEBUG")
+                # If it's not valid JSON, add it as plain text with a source tag
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    content = f.read()
+                    tagged_content = f"<source file='{filename}'>\n```\n{content}\n```\n</source>"
+                    return [tagged_content]
+            # If we get here, the file is valid JSON, try to parse as notebook
+            with open(file_path, 'r', encoding='utf-8') as f:
+                notebook = nbformat.read(f, as_version=4)
+            # Extract text from markdown and code cells
+            content_parts = []
+            for cell in notebook.cells:
+                if cell.cell_type == 'markdown':
+                    content_parts.append(f"[Markdown]\n{cell.source}")
+                elif cell.cell_type == 'code':
+                    content_parts.append(f"[Code]\n```python\n{cell.source}\n```")
+                    # # Include output if present
+                    # if hasattr(cell, 'outputs') and cell.outputs:
+                    #     for output in cell.outputs:
+                    #         if 'text' in output:
+                    #             content_parts.append(f"[Output]\n```\n{output.text}\n```")
+                    #         elif 'data' in output and 'text/plain' in output.data:
+                    #             content_parts.append(f"[Output]\n```\n{output.data['text/plain']}\n```")
+            # Combine all content into a single string
+            combined_content = "\n\n".join(content_parts)
+            # Add XML source tags at the beginning and end of the content
+            tagged_content = f"<source file='{filename}'>\n{combined_content}\n</source>"
+            return [tagged_content]
+        except Exception as e:
+            if self.run_manager:
+                self.run_manager.log(f"Error processing notebook file {file_path}: {e}", level="ERROR")
+            # Try to extract content as plain text if notebook parsing fails
+            try:
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    content = f.read()
+                    tagged_content = f"<source file='{filename}'>\n```\n{content}\n```\n</source>"
+                    return [tagged_content]
+            except Exception as read_err:
+                if self.run_manager:
+                    self.run_manager.log(f"Could not read file as text either: {read_err}", level="ERROR")
+                return []
+    def process_files(self, file_paths: List[str]) -> List[str]:
+        """
+        Process multiple files and combine their content.
+        Args:
+            file_paths: List of paths to files to process
+        Returns:
+            List of file contents with source tags
+        """
+        all_file_contents = []
+        for file_path in file_paths:
+            file_content = self.process_file(file_path)
+            all_file_contents.extend(file_content)
+        # Store the processed file contents
+        self.file_contents = all_file_contents
+        # The entire content of each file is used as context
+        # Each file's content is wrapped in XML source tags
+        # This approach ensures that the LLM has access to the complete context
+        return all_file_contents

ui/edit_handlers.py ADDED Viewed

	@@ -0,0 +1,197 @@

+import re
+import tempfile
+import gradio as gr
+from .run_manager import get_run_manager
+def _next_button_label(index, total):
+    """Return 'Accept & Finish' for the last question, 'Accept & Next' otherwise."""
+    if total > 0 and index >= total - 1:
+        return gr.update(value="Accept & Finish")
+    return gr.update(value="Accept & Next")
+def _parse_questions(md_content):
+    """Split formatted_quiz.md content into individual question blocks."""
+    parts = re.split(r'(?=\*\*Question \d+)', md_content.strip())
+    return [p.strip() for p in parts if p.strip()]
+def _parse_question_block(block_text):
+    """Parse a single markdown question block into structured data."""
+    prompt = ""
+    options = []
+    current_option = None
+    for line in block_text.split('\n'):
+        stripped = line.strip()
+        # Question text line (colon may be inside or outside the bold markers)
+        q_match = re.match(r'\*\*Question \d+.*?\*\*:?\s*(.+)', stripped)
+        if q_match:
+            prompt = q_match.group(1).strip()
+            continue
+        # Skip ranking reasoning
+        if stripped.startswith('Ranking Reasoning:'):
+            continue
+        # Option line: • A [Correct]: text  or  • A: text
+        opt_match = re.match(r'•\s*[A-D]\s*(\[Correct\])?\s*:\s*(.+)', stripped)
+        if opt_match:
+            if current_option:
+                options.append(current_option)
+            current_option = {
+                'answer': opt_match.group(2).strip(),
+                'isCorrect': opt_match.group(1) is not None,
+                'feedback': ''
+            }
+            continue
+        # Feedback line
+        fb_match = re.match(r'◦\s*Feedback:\s*(.+)', stripped)
+        if fb_match and current_option:
+            current_option['feedback'] = fb_match.group(1).strip()
+            continue
+    if current_option:
+        options.append(current_option)
+    return {'prompt': prompt, 'options': options}
+def _generate_yml(questions_data):
+    """Generate YAML quiz format from parsed question data."""
+    lines = [
+        "name: Quiz 1",
+        "passingThreshold: 4",
+        "estimatedTimeSec: 600",
+        "maxTrialsPer24Hrs: 3",
+        "courseSlug: course_Slug",
+        "insertAfterConclusion: true",
+        "RandomQuestionPosition: true",
+        "questions:",
+    ]
+    for q in questions_data:
+        lines.append("  - typeName: multipleChoice")
+        lines.append("    points: 1")
+        lines.append("    shuffle: true")
+        lines.append("    prompt: |-")
+        for prompt_line in q['prompt'].split('\n'):
+            lines.append(f"      {prompt_line}")
+        lines.append("    options:")
+        for opt in q['options']:
+            answer = opt['answer'].replace('"', '\\"')
+            is_correct = 'true' if opt['isCorrect'] else 'false'
+            lines.append(f'      - answer: "{answer}"')
+            lines.append(f"        isCorrect: {is_correct}")
+            lines.append(f"        feedback: {opt['feedback']}")
+    return '\n'.join(lines) + '\n'
+def load_quiz_for_editing(formatted_quiz_text=""):
+    """Load formatted quiz for editing. Tries disk first, falls back to UI text."""
+    run_manager = get_run_manager()
+    content = None
+    # Try loading from disk
+    quiz_path = run_manager.get_latest_formatted_quiz_path()
+    if quiz_path is not None:
+        with open(quiz_path, "r", encoding="utf-8") as f:
+            content = f.read()
+    # Fall back to the formatted quiz text from the UI
+    if not content and formatted_quiz_text:
+        content = formatted_quiz_text
+    if not content:
+        return (
+            "No formatted quiz found. Generate questions in the 'Generate Questions' tab first.",
+            "",
+            [],
+            0,
+            [],
+            gr.update(),
+        )
+    questions = _parse_questions(content)
+    if not questions:
+        return "The quiz file is empty.", "", [], 0, [], gr.update()
+    status = f"Question 1 of {len(questions)}"
+    edited = list(questions)  # start with originals
+    return status, questions[0], questions, 0, edited, _next_button_label(0, len(questions))
+def accept_and_next(current_text, questions, index, edited):
+    """Save current edit and advance to the next question."""
+    if not questions:
+        return "No quiz loaded.", "", questions, index, edited, gr.update()
+    # Save the current edit
+    edited[index] = current_text
+    if index + 1 < len(questions):
+        new_index = index + 1
+        status = f"Question {new_index + 1} of {len(questions)}"
+        return status, edited[new_index], questions, new_index, edited, _next_button_label(new_index, len(questions))
+    else:
+        # All questions reviewed
+        return (
+            f"All {len(questions)} questions reviewed. Click 'Download edited quiz' to save.",
+            current_text,
+            questions,
+            index,
+            edited,
+            gr.update(value="Accept & Finish"),
+        )
+def go_previous(current_text, questions, index, edited):
+    """Save current edit and go back to the previous question."""
+    if not questions:
+        return "No quiz loaded.", "", questions, index, edited, gr.update()
+    # Save the current edit before moving
+    edited[index] = current_text
+    if index > 0:
+        new_index = index - 1
+        status = f"Question {new_index + 1} of {len(questions)}"
+        return status, edited[new_index], questions, new_index, edited, _next_button_label(new_index, len(questions))
+    else:
+        return f"Question 1 of {len(questions)} (already at first question)", current_text, questions, index, edited, _next_button_label(index, len(questions))
+def save_and_download(current_text, questions, index, edited):
+    """Join edited questions, save to output folder, and return files for download."""
+    if not edited:
+        return "No edited questions to save.", None
+    # Save the current edit in case user didn't click accept
+    edited[index] = current_text
+    combined_md = "\n\n".join(edited) + "\n"
+    # Generate YAML
+    questions_data = [_parse_question_block(q) for q in edited]
+    yml_content = _generate_yml(questions_data)
+    # Save to output folder
+    run_manager = get_run_manager()
+    saved_path = run_manager.save_edited_quiz(combined_md, "formatted_quiz_edited.md")
+    run_manager.save_edited_quiz(yml_content, "formatted_quiz_edited.yml")
+    # Create temp files for Gradio download
+    tmp_md = tempfile.NamedTemporaryFile(delete=False, suffix=".md", mode="w", encoding="utf-8")
+    tmp_md.write(combined_md)
+    tmp_md.close()
+    tmp_yml = tempfile.NamedTemporaryFile(delete=False, suffix=".yml", mode="w", encoding="utf-8")
+    tmp_yml.write(yml_content)
+    tmp_yml.close()
+    status = f"Saved to {saved_path}" if saved_path else "Download ready."
+    return status, [tmp_md.name, tmp_yml.name]

ui/feedback_handlers.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import os
+import json
+from quiz_generator import QuizGenerator
+from .state import get_processed_contents
+def propose_question_handler(guidance, model_name, temperature):
+    """Generate a single question based on user guidance or feedback."""
+    if not get_processed_contents():
+        return "Please upload and process files in the 'Generate Learning Objectives' tab first.", None
+    if not os.getenv("OPENAI_API_KEY"):
+        return "OpenAI API key not found.", None
+    try:
+        quiz_generator = QuizGenerator(
+            api_key=os.getenv("OPENAI_API_KEY"),
+            model=model_name,
+            temperature=float(temperature)
+        )
+        question = quiz_generator.generate_multiple_choice_question_from_feedback(
+            guidance, get_processed_contents()
+        )
+        formatted_question = {
+            "id": question.id,
+            "question_text": question.question_text,
+            "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in question.options],
+            "learning_objective": question.learning_objective,
+            "source_reference": question.source_reference,
+            "feedback": question.feedback
+        }
+        return "Question generated successfully.", json.dumps(formatted_question, indent=2)
+    except Exception as e:
+        return f"Error: {str(e)}", None

ui/formatting.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import json
+def format_quiz_for_ui(questions_json):
+    """Format quiz questions for display in the UI."""
+    if not questions_json:
+        return "No questions to format."
+    try:
+        questions = json.loads(questions_json)
+        # Sort questions by rank if available
+        try:
+            questions = sorted(questions, key=lambda q: q.get('rank', 999))
+        except Exception as e:
+            print(f"Warning: Could not sort by rank: {e}")
+        formatted_output = ""
+        for i, question in enumerate(questions, 1):
+            # Add question with rank if available
+            rank_info = ""
+            if 'rank' in question:
+                rank_info = f" [Rank: {question['rank']}]"
+            formatted_output += f"**Question {i}{rank_info}:** {question['question_text']}\n\n"
+            # Add ranking reasoning if available
+            if 'ranking_reasoning' in question:
+                formatted_output += f"Ranking Reasoning: {question['ranking_reasoning']}\n\n"
+            options = question['options']
+            option_letters = ['A', 'B', 'C', 'D']
+            # Add each option with its letter
+            for j, option in enumerate(options):
+                letter = option_letters[j]
+                correct_marker = " [Correct]" if option['is_correct'] else ""
+                formatted_output += f"\t• {letter}{correct_marker}: {option['text']}\n"
+                formatted_output += f"\t  ◦ Feedback: {option['feedback']}\n\n"
+            formatted_output += "\n"
+        return formatted_output
+    except Exception as e:
+        return f"Error formatting quiz: {str(e)}"

ui/objective_handlers.py ADDED Viewed

	@@ -0,0 +1,403 @@

+import os
+import json
+import shutil
+from typing import List
+from models.learning_objectives import LearningObjective
+from .content_processor import ContentProcessor
+from quiz_generator import QuizGenerator
+from .state import get_processed_contents, set_processed_contents, set_learning_objectives
+from .run_manager import get_run_manager
+from .question_handlers import generate_questions
+def process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature):
+    """Process uploaded files and generate learning objectives."""
+    run_manager = get_run_manager()
+    # Input validation
+    if not files:
+        return "Please upload at least one file.", None, None, None
+    if not os.getenv("OPENAI_API_KEY"):
+        return "OpenAI API key not found. Please set the OPENAI_API_KEY environment variable.", None, None, None
+    # Extract file paths
+    file_paths = _extract_file_paths(files)
+    if not file_paths:
+        return "No valid files found. Please upload valid .ipynb, .vtt, .srt, or .md files.", None, None, None
+    # Start run and logging
+    run_id = run_manager.start_objective_run(
+        files=file_paths,
+        num_objectives=num_objectives,
+        num_runs=num_runs,
+        model=model_name,
+        incorrect_answer_model=incorrect_answer_model_name,
+        temperature=temperature
+    )
+    run_manager.log(f"Processing {len(file_paths)} files: {[os.path.basename(f) for f in file_paths]}", level="DEBUG")
+    # Process files
+    processor = ContentProcessor()
+    file_contents = processor.process_files(file_paths)
+    if not file_contents:
+        run_manager.log("No content extracted from the uploaded files", level="ERROR")
+        return "No content extracted from the uploaded files.", None, None, None
+    run_manager.log(f"Successfully extracted content from {len(file_contents)} files", level="INFO")
+    # Store file contents for later use
+    set_processed_contents(file_contents)
+    # Generate learning objectives
+    run_manager.log(f"Creating QuizGenerator with model={model_name}, temperature={temperature}", level="INFO")
+    quiz_generator = QuizGenerator(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        model=model_name,
+        temperature=float(temperature)
+    )
+    all_learning_objectives = _generate_multiple_runs(
+        quiz_generator, file_contents, num_objectives, num_runs, incorrect_answer_model_name, run_manager
+    )
+    # Group and rank objectives
+    grouped_result = _group_base_objectives_add_incorrect_answers(
+        quiz_generator, all_learning_objectives, file_contents, incorrect_answer_model_name, run_manager
+    )
+    # Format results for display
+    formatted_results = _format_objective_results(grouped_result, all_learning_objectives, num_objectives, run_manager)
+    # Store results
+    set_learning_objectives(grouped_result["all_grouped"])
+    # Save outputs to files
+    params = {
+        "files": [os.path.basename(f) for f in file_paths],
+        "num_objectives": num_objectives,
+        "num_runs": num_runs,
+        "model": model_name,
+        "incorrect_answer_model": incorrect_answer_model_name,
+        "temperature": temperature
+    }
+    run_manager.save_objectives_outputs(
+        best_in_group=formatted_results[1],
+        all_grouped=formatted_results[2],
+        raw_ungrouped=formatted_results[3],
+        params=params
+    )
+    # End run
+    run_manager.end_run(run_type="Learning Objectives")
+    return formatted_results
+def regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature):
+    """Regenerate learning objectives based on feedback."""
+    if not get_processed_contents():
+        return "No processed content available. Please upload files first.", objectives_json, objectives_json
+    if not os.getenv("OPENAI_API_KEY"):
+        return "OpenAI API key not found.", objectives_json, objectives_json
+    if not feedback:
+        return "Please provide feedback to regenerate learning objectives.", objectives_json, objectives_json
+    # Add feedback to file contents
+    file_contents_with_feedback = get_processed_contents().copy()
+    file_contents_with_feedback.append(f"FEEDBACK ON PREVIOUS OBJECTIVES: {feedback}")
+    # Generate with feedback
+    quiz_generator = QuizGenerator(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        model=model_name,
+        temperature=float(temperature)
+    )
+    try:
+        # Generate multiple runs of learning objectives with feedback
+        all_learning_objectives = _generate_multiple_runs(
+            quiz_generator,
+            file_contents_with_feedback,
+            num_objectives,
+            num_runs,
+            model_name  # Use the same model for incorrect answer suggestions
+        )
+        # Group and rank the objectives
+        grouping_result = _group_base_objectives_add_incorrect_answers(quiz_generator, all_base_learning_objectives, file_contents_with_feedback, model_name)
+        # Get the results
+        grouped_objectives = grouping_result["all_grouped"]
+        best_in_group_objectives = grouping_result["best_in_group"]
+        # Convert to JSON
+        grouped_objectives_json = json.dumps([obj.dict() for obj in grouped_objectives])
+        best_in_group_json = json.dumps([obj.dict() for obj in best_in_group_objectives])
+        return f"Generated {len(all_learning_objectives)} learning objectives, {len(best_in_group_objectives)} unique after grouping.", grouped_objectives_json, best_in_group_json
+    except Exception as e:
+        print(f"Error regenerating learning objectives: {e}")
+        import traceback
+        traceback.print_exc()
+        return f"Error regenerating learning objectives: {str(e)}", objectives_json, objectives_json
+def _extract_file_paths(files):
+    """Extract file paths from different input formats."""
+    file_paths = []
+    if isinstance(files, list):
+        for file in files:
+            if file and os.path.exists(file):
+                file_paths.append(file)
+    elif isinstance(files, str) and os.path.exists(files):
+        file_paths.append(files)
+    elif hasattr(files, 'name') and os.path.exists(files.name):
+        file_paths.append(files.name)
+    return file_paths
+def _generate_multiple_runs(quiz_generator, file_contents, num_objectives, num_runs, incorrect_answer_model_name, run_manager):
+    """Generate learning objectives across multiple runs."""
+    all_learning_objectives = []
+    num_runs_int = int(num_runs)
+    for run in range(num_runs_int):
+        run_manager.log(f"Starting generation run {run+1}/{num_runs_int}", level="INFO")
+        # Generate base learning objectives without grouping or incorrect answers
+        learning_objectives = quiz_generator.generate_base_learning_objectives(
+            file_contents, num_objectives, incorrect_answer_model_name
+        )
+        run_manager.log(f"Generated {len(learning_objectives)} learning objectives in run {run+1}", level="INFO")
+        # Assign temporary IDs
+        for i, obj in enumerate(learning_objectives):
+            obj.id = 1000 * (run + 1) + (i + 1)
+        all_learning_objectives.extend(learning_objectives)
+    run_manager.log(f"Total learning objectives from all runs: {len(all_learning_objectives)}", level="INFO")
+    return all_learning_objectives
+def _group_base_objectives_add_incorrect_answers(quiz_generator, all_base_learning_objectives, file_contents, incorrect_answer_model_name=None, run_manager=None):
+    """Group base learning objectives and add incorrect answers to best-in-group objectives."""
+    run_manager.log("Grouping base learning objectives...", level="INFO")
+    grouping_result = quiz_generator.group_base_learning_objectives(all_base_learning_objectives, file_contents)
+    grouped_objectives = grouping_result["all_grouped"]
+    best_in_group_objectives = grouping_result["best_in_group"]
+    run_manager.log(f"Grouped into {len(best_in_group_objectives)} best-in-group objectives", level="INFO")
+    # Find and reassign the best first objective to ID=1
+    _reassign_objective_ids(grouped_objectives, run_manager)
+    # Step 1: Generate incorrect answer suggestions only for best-in-group objectives
+    run_manager.log("Generating incorrect answer options only for best-in-group objectives...", level="INFO")
+    enhanced_best_in_group = quiz_generator.generate_lo_incorrect_answer_options(
+        file_contents, best_in_group_objectives, incorrect_answer_model_name
+    )
+    run_manager.log("Generated incorrect answer options", level="INFO")
+    # Clear debug directory for incorrect answer regeneration logs
+    debug_dir = os.path.join("incorrect_suggestion_debug")
+    if os.path.exists(debug_dir):
+        shutil.rmtree(debug_dir)
+    os.makedirs(debug_dir, exist_ok=True)
+    # Step 2: Run the improvement workflow on the generated incorrect answers
+    run_manager.log("Improving incorrect answer options for best-in-group objectives...", level="INFO")
+    improved_best_in_group = quiz_generator.learning_objective_generator.regenerate_incorrect_answers(
+        enhanced_best_in_group, file_contents
+    )
+    run_manager.log("Completed improvement of incorrect answer options", level="INFO")
+    # Create a map of best-in-group objectives by ID for easy lookup
+    best_in_group_map = {obj.id: obj for obj in improved_best_in_group}
+    # Process all grouped objectives
+    final_grouped_objectives = []
+    for grouped_obj in grouped_objectives:
+        if getattr(grouped_obj, "best_in_group", False):
+            # For best-in-group objectives, use the enhanced version with incorrect answers
+            if grouped_obj.id in best_in_group_map:
+                final_grouped_objectives.append(best_in_group_map[grouped_obj.id])
+            else:
+                # This shouldn't happen, but just in case
+                final_grouped_objectives.append(grouped_obj)
+        else:
+            # For non-best-in-group objectives, ensure they have empty incorrect answers
+            final_grouped_objectives.append(LearningObjective(
+                id=grouped_obj.id,
+                learning_objective=grouped_obj.learning_objective,
+                source_reference=grouped_obj.source_reference,
+                correct_answer=grouped_obj.correct_answer,
+                incorrect_answer_options=[],  # Empty list for non-best-in-group
+                in_group=getattr(grouped_obj, 'in_group', None),
+                group_members=getattr(grouped_obj, 'group_members', None),
+                best_in_group=getattr(grouped_obj, 'best_in_group', None)
+            ))
+    return {
+        "all_grouped": final_grouped_objectives,
+        "best_in_group": improved_best_in_group
+    }
+def _reassign_objective_ids(grouped_objectives, run_manager):
+    """Reassign IDs to ensure best first objective gets ID=1."""
+    # Find best first objective
+    best_first_objective = None
+    # First identify all groups containing objectives with IDs ending in 001
+    groups_with_001 = {}
+    for obj in grouped_objectives:
+        if obj.id % 1000 == 1:  # ID ends in 001
+            group_members = getattr(obj, "group_members", [obj.id])
+            for member_id in group_members:
+                if member_id not in groups_with_001:
+                    groups_with_001[member_id] = True
+    # Now find the best_in_group objective from these groups
+    for obj in grouped_objectives:
+        obj_id = getattr(obj, "id", 0)
+        group_members = getattr(obj, "group_members", [obj_id])
+        # Check if this objective is in a group with 001 objectives
+        is_in_001_group = any(member_id in groups_with_001 for member_id in group_members)
+        if is_in_001_group and getattr(obj, "best_in_group", False):
+            best_first_objective = obj
+            run_manager.log(f"Found best_in_group objective in a 001 group with ID={obj.id}", level="DEBUG")
+            break
+    # If no best_in_group from 001 groups found, fall back to the first 001 objective
+    if not best_first_objective:
+        for obj in grouped_objectives:
+            if obj.id % 1000 == 1:  # First objective from a run
+                best_first_objective = obj
+                run_manager.log(f"No best_in_group from 001 groups found, using first 001 with ID={obj.id}", level="DEBUG")
+                break
+    # Reassign IDs
+    id_counter = 2
+    if best_first_objective:
+        best_first_objective.id = 1
+        run_manager.log(f"Reassigned primary objective to ID=1", level="INFO")
+    for obj in grouped_objectives:
+        if obj is best_first_objective:
+            continue
+        obj.id = id_counter
+        id_counter += 1
+def _format_objective_results(grouped_result, all_learning_objectives, num_objectives, run_manager):
+    """Format objective results for display."""
+    sorted_best_in_group = sorted(grouped_result["best_in_group"], key=lambda obj: obj.id)
+    sorted_all_grouped = sorted(grouped_result["all_grouped"], key=lambda obj: obj.id)
+    # Limit best-in-group to the requested number of objectives
+    sorted_best_in_group = sorted_best_in_group[:num_objectives]
+    run_manager.log("Formatting objective results for display", level="INFO")
+    run_manager.log(f"Best-in-group objectives limited to top {len(sorted_best_in_group)} (requested: {num_objectives})", level="INFO")
+    # Format best-in-group
+    formatted_best_in_group = []
+    for obj in sorted_best_in_group:
+        formatted_best_in_group.append({
+            "id": obj.id,
+            "learning_objective": obj.learning_objective,
+            "source_reference": obj.source_reference,
+            "correct_answer": obj.correct_answer,
+            "incorrect_answer_options": getattr(obj, 'incorrect_answer_options', None),
+            "in_group": getattr(obj, 'in_group', None),
+            "group_members": getattr(obj, 'group_members', None),
+            "best_in_group": getattr(obj, 'best_in_group', None)
+        })
+    # Format grouped
+    formatted_grouped = []
+    for obj in sorted_all_grouped:
+        formatted_grouped.append({
+            "id": obj.id,
+            "learning_objective": obj.learning_objective,
+            "source_reference": obj.source_reference,
+            "correct_answer": obj.correct_answer,
+            "incorrect_answer_options": getattr(obj, 'incorrect_answer_options', None),
+            "in_group": getattr(obj, 'in_group', None),
+            "group_members": getattr(obj, 'group_members', None),
+            "best_in_group": getattr(obj, 'best_in_group', None)
+        })
+    # Format unranked
+    formatted_unranked = []
+    for obj in all_learning_objectives:
+        formatted_unranked.append({
+            "id": obj.id,
+            "learning_objective": obj.learning_objective,
+            "source_reference": obj.source_reference,
+            "correct_answer": obj.correct_answer
+        })
+    run_manager.log(f"Formatted {len(formatted_best_in_group)} best-in-group, {len(formatted_grouped)} grouped, {len(formatted_unranked)} raw objectives", level="INFO")
+    return (
+        f"Generated and grouped {len(formatted_best_in_group)} unique learning objectives successfully. Saved to run: {run_manager.get_current_run_id()}",
+        json.dumps(formatted_best_in_group, indent=2),
+        json.dumps(formatted_grouped, indent=2),
+        json.dumps(formatted_unranked, indent=2)
+    )
+def process_files_and_generate_questions(files, num_objectives, num_runs, model_name, incorrect_answer_model_name,
+                                        temperature, model_name_q, temperature_q, num_questions, num_runs_q):
+    """Process files, generate learning objectives, and then generate questions in one flow."""
+    # First, generate learning objectives
+    obj_results = process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)
+    # obj_results contains: (status, objectives_output, grouped_output, raw_ungrouped_output)
+    status_obj, objectives_output, grouped_output, raw_ungrouped_output = obj_results
+    # Check if objectives generation failed
+    if not objectives_output or objectives_output is None:
+        # Return error status for objectives and empty values for questions
+        return (
+            status_obj,  # status_output
+            objectives_output,  # objectives_output
+            grouped_output,  # grouped_output
+            raw_ungrouped_output,  # raw_ungrouped_output
+            "Learning objectives generation failed. Cannot proceed with questions.",  # status_q_output
+            None,  # best_questions_output
+            None,  # all_questions_output
+            None   # formatted_quiz_output
+        )
+    # Now generate questions using the objectives
+    question_results = generate_questions(objectives_output, model_name_q, temperature_q, num_questions, num_runs_q)
+    # question_results contains: (status_q, best_questions_output, all_questions_output, formatted_quiz_output)
+    status_q, best_questions_output, all_questions_output, formatted_quiz_output = question_results
+    # Combine the status messages
+    combined_status = f"{status_obj}\n\nThen:\n{status_q}"
+    # Return all 8 outputs
+    return (
+        combined_status,  # status_output
+        objectives_output,  # objectives_output
+        grouped_output,  # grouped_output
+        raw_ungrouped_output,  # raw_ungrouped_output
+        status_q,  # status_q_output
+        best_questions_output,  # best_questions_output
+        all_questions_output,  # all_questions_output
+        formatted_quiz_output  # formatted_quiz_output
+    )

ui/question_handlers.py ADDED Viewed

	@@ -0,0 +1,245 @@

+import os
+import json
+import shutil
+from typing import List
+from quiz_generator import QuizGenerator
+from models import LearningObjective
+from .state import get_processed_contents
+from .formatting import format_quiz_for_ui
+from .run_manager import get_run_manager
+def generate_questions(objectives_json, model_name, temperature, num_questions, num_runs):
+    """Generate questions based on approved learning objectives."""
+    run_manager = get_run_manager()
+    # Input validation
+    if not objectives_json:
+        return "No learning objectives provided.", None, None, None
+    if not os.getenv("OPENAI_API_KEY"):
+        return "OpenAI API key not found.", None, None, None
+    if not get_processed_contents():
+        return "No processed content available. Please go back to the first tab and upload files.", None, None, None
+    # Parse and create learning objectives
+    learning_objectives = _parse_learning_objectives(objectives_json)
+    if not learning_objectives:
+        run_manager.log("Invalid learning objectives JSON", level="ERROR")
+        return "Invalid learning objectives JSON.", None, None, None
+    # Start question run
+    run_id = run_manager.start_question_run(
+        objectives_count=len(learning_objectives),
+        model=model_name,
+        temperature=temperature,
+        num_questions=int(num_questions),
+        num_runs=int(num_runs)
+    )
+    run_manager.log(f"Parsed {len(learning_objectives)} learning objectives", level="INFO")
+    run_manager.log(f"Target total questions: {num_questions}", level="INFO")
+    # Generate questions
+    run_manager.log(f"Creating QuizGenerator with model={model_name}, temperature={temperature}", level="INFO")
+    quiz_generator = QuizGenerator(
+        api_key=os.getenv("OPENAI_API_KEY"),
+        model=model_name,
+        temperature=float(temperature)
+    )
+    all_questions = _generate_questions_multiple_runs(
+        quiz_generator, learning_objectives, int(num_questions), num_runs, run_manager
+    )
+    # Group and rank questions
+    results = _group_and_rank_questions(quiz_generator, all_questions, run_manager)
+    # Improve incorrect answers
+    #_improve_incorrect_answers(quiz_generator, results["best_in_group_ranked"])
+    # Format results
+    formatted_results = _format_question_results(results, int(num_questions), run_manager)
+    # Save outputs to files
+    params = {
+        "objectives_count": len(learning_objectives),
+        "model": model_name,
+        "temperature": temperature,
+        "num_questions": int(num_questions),
+        "num_runs": int(num_runs)
+    }
+    run_manager.save_questions_outputs(
+        best_ranked=formatted_results[1],
+        all_grouped=formatted_results[2],
+        formatted_quiz=formatted_results[3],
+        params=params
+    )
+    # End run
+    run_manager.end_run(run_type="Questions")
+    return formatted_results
+def _parse_learning_objectives(objectives_json):
+    """Parse learning objectives from JSON."""
+    try:
+        objectives_data = json.loads(objectives_json)
+        learning_objectives = []
+        for obj_data in objectives_data:
+            obj = LearningObjective(
+                id=obj_data["id"],
+                learning_objective=obj_data["learning_objective"],
+                source_reference=obj_data["source_reference"],
+                correct_answer=obj_data["correct_answer"],
+                incorrect_answer_options=obj_data["incorrect_answer_options"]
+            )
+            learning_objectives.append(obj)
+        return learning_objectives
+    except json.JSONDecodeError:
+        return None
+def _generate_questions_multiple_runs(quiz_generator, learning_objectives, num_questions, num_runs, run_manager):
+    """Generate questions across multiple runs with proportional distribution."""
+    all_questions = []
+    num_runs_int = int(num_runs)
+    num_objectives = len(learning_objectives)
+    # Calculate proportional distribution of questions across objectives
+    distribution = _calculate_proportional_distribution(num_questions, num_objectives)
+    run_manager.log(f"Question distribution across {num_objectives} objectives: {distribution}", level="INFO")
+    # Select which objectives to use based on distribution
+    objectives_to_use = []
+    for i, count in enumerate(distribution):
+        if count > 0 and i < len(learning_objectives):
+            objectives_to_use.append((learning_objectives[i], count))
+    run_manager.log(f"Using {len(objectives_to_use)} learning objectives for question generation", level="INFO")
+    for run in range(num_runs_int):
+        run_manager.log(f"Starting question generation run {run+1}/{num_runs_int}", level="INFO")
+        # Generate questions for each selected objective with its assigned count
+        for obj, question_count in objectives_to_use:
+            run_manager.log(f"Generating {question_count} question(s) for objective {obj.id}: {obj.learning_objective[:80]}...", level="INFO")
+            for q_num in range(question_count):
+                run_questions = quiz_generator.generate_questions_in_parallel(
+                    [obj], get_processed_contents()
+                )
+                if run_questions:
+                    run_manager.log(f"Generated question {q_num+1}/{question_count} for objective {obj.id}", level="DEBUG")
+                    all_questions.extend(run_questions)
+        run_manager.log(f"Generated {len(all_questions)} questions so far in run {run+1}", level="INFO")
+    # Assign unique IDs
+    for i, q in enumerate(all_questions):
+        q.id = i + 1
+    run_manager.log(f"Total questions from all runs: {len(all_questions)}", level="INFO")
+    return all_questions
+def _calculate_proportional_distribution(num_questions, num_objectives):
+    """Calculate how to distribute N questions across M objectives proportionally."""
+    if num_questions <= 0 or num_objectives <= 0:
+        return []
+    # If we have more objectives than questions, only use as many objectives as we have questions
+    if num_questions < num_objectives:
+        distribution = [1] * num_questions + [0] * (num_objectives - num_questions)
+        return distribution
+    # Calculate base questions per objective and remainder
+    base_per_objective = num_questions // num_objectives
+    remainder = num_questions % num_objectives
+    # Distribute evenly, giving extra questions to the first 'remainder' objectives
+    distribution = [base_per_objective + (1 if i < remainder else 0) for i in range(num_objectives)]
+    return distribution
+def _group_and_rank_questions(quiz_generator, all_questions, run_manager):
+    """Group and rank questions."""
+    run_manager.log(f"Grouping {len(all_questions)} questions by similarity...", level="INFO")
+    grouping_result = quiz_generator.group_questions(all_questions, get_processed_contents())
+    run_manager.log(f"Grouped into {len(grouping_result['best_in_group'])} best-in-group questions", level="INFO")
+    # Rank ALL grouped questions (not just best-in-group) to ensure we have enough questions for selection
+    run_manager.log(f"Ranking all {len(grouping_result['grouped'])} grouped questions...", level="INFO")
+    ranking_result = quiz_generator.rank_questions(grouping_result['grouped'], get_processed_contents())
+    run_manager.log("Completed ranking of questions", level="INFO")
+    return {
+        "grouped": grouping_result["grouped"],
+        "all_ranked": ranking_result["ranked"]
+    }
+def _improve_incorrect_answers(quiz_generator, questions):
+    """Improve incorrect answer options."""
+    # Clear debug directory
+    debug_dir = os.path.join("wrong_answer_debug")
+    if os.path.exists(debug_dir):
+        shutil.rmtree(debug_dir)
+    os.makedirs(debug_dir, exist_ok=True)
+    quiz_generator.regenerate_incorrect_answers(questions, get_processed_contents())
+def _format_question_results(results, num_questions, run_manager):
+    """Format question results for display."""
+    run_manager.log("Formatting question results for display", level="INFO")
+    # Format all ranked questions (these will be the top N questions from all grouped questions)
+    formatted_best_questions = []
+    for q in results["all_ranked"]:
+        formatted_best_questions.append({
+            "id": q.id,
+            "question_text": q.question_text,
+            "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in q.options],
+            "learning_objective_id": q.learning_objective_id,
+            "learning_objective": q.learning_objective,
+            "correct_answer": q.correct_answer,
+            "source_reference": q.source_reference,
+            "rank": getattr(q, "rank", None),
+            "ranking_reasoning": getattr(q, "ranking_reasoning", None),
+            "in_group": getattr(q, "in_group", None),
+            "group_members": getattr(q, "group_members", None),
+            "best_in_group": getattr(q, "best_in_group", None)
+        })
+    # Format all grouped questions
+    formatted_all_questions = []
+    for q in results["grouped"]:
+        formatted_all_questions.append({
+            "id": q.id,
+            "question_text": q.question_text,
+            "options": [{"text": opt.option_text, "is_correct": opt.is_correct, "feedback": opt.feedback} for opt in q.options],
+            "learning_objective_id": q.learning_objective_id,
+            "learning_objective": q.learning_objective,
+            "correct_answer": q.correct_answer,
+            "source_reference": q.source_reference,
+            "in_group": getattr(q, "in_group", None),
+            "group_members": getattr(q, "group_members", None),
+            "best_in_group": getattr(q, "best_in_group", None)
+        })
+    # Limit formatted quiz and best-ranked to the requested number of questions
+    formatted_best_questions_limited = formatted_best_questions[:num_questions]
+    formatted_quiz = format_quiz_for_ui(json.dumps(formatted_best_questions_limited, indent=2))
+    run_manager.log(f"Formatted {len(formatted_best_questions)} best-ranked, {len(formatted_all_questions)} grouped questions", level="INFO")
+    run_manager.log(f"Best-ranked and formatted quiz limited to top {len(formatted_best_questions_limited)} questions (requested: {num_questions})", level="INFO")
+    return (
+        f"Generated and ranked {len(formatted_best_questions_limited)} unique questions successfully. Saved to run: {run_manager.get_current_run_id()}/{run_manager.get_current_question_run_id()}",
+        json.dumps(formatted_best_questions_limited, indent=2),
+        json.dumps(formatted_all_questions, indent=2),
+        formatted_quiz
+    )

ui/run_manager.py ADDED Viewed

	@@ -0,0 +1,323 @@

+"""Run manager for tracking test runs and managing output folders."""
+import os
+import json
+import time
+from datetime import datetime
+from typing import Dict, Any, Optional, List
+from pathlib import Path
+class RunManager:
+    """Manages test runs, folders, and logging."""
+    def __init__(self, base_dir: str = "results", save_outputs: bool = True):
+        self.base_dir = base_dir
+        self.save_outputs = save_outputs
+        self.current_run_id: Optional[str] = None
+        self.current_run_dir: Optional[str] = None
+        self.current_question_run_id: Optional[str] = None  # Track current question run ID
+        self.log_file: Optional[str] = None
+        self.run_start_time: Optional[float] = None
+        self.last_objective_params: Optional[Dict[str, Any]] = None
+        # Create base results directory
+        if self.save_outputs:
+            os.makedirs(self.base_dir, exist_ok=True)
+    def _get_next_run_id(self) -> str:
+        """Generate the next unique run ID."""
+        existing_runs = [d for d in os.listdir(self.base_dir)
+                        if d.startswith("test_id") and os.path.isdir(os.path.join(self.base_dir, d))]
+        if not existing_runs:
+            return "test_id00001"
+        # Extract numbers and find max
+        numbers = []
+        for run in existing_runs:
+            try:
+                num = int(run.replace("test_id", ""))
+                numbers.append(num)
+            except ValueError:
+                continue
+        next_num = max(numbers) + 1 if numbers else 1
+        return f"test_id{next_num:05d}"
+    def _create_run_structure(self, run_id: str) -> str:
+        """Create folder structure for a run."""
+        run_dir = os.path.join(self.base_dir, run_id)
+        if self.save_outputs:
+            os.makedirs(run_dir, exist_ok=True)
+            os.makedirs(os.path.join(run_dir, "learning objectives"), exist_ok=True)
+            os.makedirs(os.path.join(run_dir, "questions"), exist_ok=True)
+        return run_dir
+    def _get_next_question_run_id(self) -> str:
+        """Generate the next unique question run ID for the current test run."""
+        if self.current_run_dir is None:
+            return "q_run_001"
+        questions_dir = os.path.join(self.current_run_dir, "questions")
+        if not os.path.exists(questions_dir):
+            return "q_run_001"
+        # Find existing question run folders
+        existing_q_runs = [d for d in os.listdir(questions_dir)
+                          if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))]
+        if not existing_q_runs:
+            return "q_run_001"
+        # Extract numbers and find max
+        numbers = []
+        for run in existing_q_runs:
+            try:
+                num = int(run.replace("q_run_", ""))
+                numbers.append(num)
+            except ValueError:
+                continue
+        next_num = max(numbers) + 1 if numbers else 1
+        return f"q_run_{next_num:03d}"
+    def _params_changed(self, new_params: Dict[str, Any]) -> bool:
+        """Check if objective generation parameters have changed."""
+        if self.last_objective_params is None:
+            return True
+        # Compare relevant parameters
+        keys_to_compare = ["files", "num_objectives", "num_runs", "model",
+                          "incorrect_answer_model", "temperature"]
+        for key in keys_to_compare:
+            if new_params.get(key) != self.last_objective_params.get(key):
+                return True
+        return False
+    def start_objective_run(self, files: List[str], num_objectives: int, num_runs: str,
+                           model: str, incorrect_answer_model: str, temperature: str) -> str:
+        """
+        Start a new objective generation run or continue existing one.
+        Returns the run ID.
+        """
+        params = {
+            "files": sorted(files),  # Sort for consistent comparison
+            "num_objectives": num_objectives,
+            "num_runs": num_runs,
+            "model": model,
+            "incorrect_answer_model": incorrect_answer_model,
+            "temperature": temperature
+        }
+        # Check if we need a new run
+        if self._params_changed(params):
+            # Create new run
+            self.current_run_id = self._get_next_run_id()
+            self.current_run_dir = self._create_run_structure(self.current_run_id)
+            self.log_file = os.path.join(self.current_run_dir, "log.log")
+            self.last_objective_params = params
+            # Log header
+            self.log(f"=== New Learning Objectives Run: {self.current_run_id} ===", level="INFO")
+            self.log(f"Inputs: {[os.path.basename(f) for f in files]}", level="INFO")
+            self.log("Variables:", level="INFO")
+            self.log(f"  Number of Learning Objectives per Run: {num_objectives}", level="INFO")
+            self.log(f"  Number of Generation Runs: {num_runs}", level="INFO")
+            self.log(f"  Model: {model}", level="INFO")
+            self.log(f"  Model for Incorrect Answer Suggestions: {incorrect_answer_model}", level="INFO")
+            self.log(f"  Temperature (0.0: Deterministic, 1.0: Creative): {temperature}", level="INFO")
+            self.log("", level="INFO")  # Blank line
+        else:
+            # Continue existing run
+            self.log("", level="INFO")  # Blank line
+            self.log(f"=== Continuing Learning Objectives Run: {self.current_run_id} ===", level="INFO")
+        self.run_start_time = time.time()
+        return self.current_run_id
+    def start_question_run(self, objectives_count: int, model: str,
+                          temperature: str, num_questions: int, num_runs: int) -> str:
+        """
+        Start a question generation run (continues logging to same run).
+        Returns the run ID.
+        """
+        if self.current_run_id is None:
+            # No objective run exists, create new run
+            self.current_run_id = self._get_next_run_id()
+            self.current_run_dir = self._create_run_structure(self.current_run_id)
+            self.log_file = os.path.join(self.current_run_dir, "log.log")
+            self.log(f"=== New Questions Run: {self.current_run_id} ===", level="INFO")
+        else:
+            self.log("", level="INFO")  # Blank line
+            self.log(f"=== Generate Questions Run ===", level="INFO")
+        # Get next question run ID for this test run
+        self.current_question_run_id = self._get_next_question_run_id()
+        self.log(f"Question Run ID: {self.current_question_run_id}", level="INFO")
+        self.log("Variables:", level="INFO")
+        self.log(f"  Number of Learning Objectives: {objectives_count}", level="INFO")
+        self.log(f"  Number of Questions to Generate: {num_questions}", level="INFO")
+        self.log(f"  Model: {model}", level="INFO")
+        self.log(f"  Temperature (0.0: Deterministic, 1.0: Creative): {temperature}", level="INFO")
+        self.log(f"  Number of Question Generation Runs: {num_runs}", level="INFO")
+        self.log("", level="INFO")  # Blank line
+        self.run_start_time = time.time()
+        return self.current_run_id
+    def log(self, message: str, level: str = "INFO"):
+        """Write a log message with timestamp."""
+        # Always print to console
+        print(f"[{level}] {message}")
+        if not self.save_outputs or self.log_file is None:
+            return
+        timestamp = datetime.now().strftime("%m/%d %H:%M:%S")
+        log_line = f"[{timestamp}][{level}] {message}\n"
+        with open(self.log_file, "a", encoding="utf-8") as f:
+            f.write(log_line)
+    def end_run(self, run_type: str = "Learning Objectives"):
+        """End the current run and log total time."""
+        if self.run_start_time is None:
+            return
+        elapsed = time.time() - self.run_start_time
+        self.log(f"Total time for {run_type}: +{elapsed:.0f}s", level="INFO")
+        self.log("", level="INFO")  # Blank line
+    def save_objectives_outputs(self, best_in_group: str, all_grouped: str,
+                               raw_ungrouped: str, params: Dict[str, Any]):
+        """Save learning objectives outputs to files."""
+        if not self.save_outputs or self.current_run_dir is None:
+            return
+        obj_dir = os.path.join(self.current_run_dir, "learning objectives")
+        # Save JSON outputs
+        with open(os.path.join(obj_dir, "best_in_group.json"), "w", encoding="utf-8") as f:
+            f.write(best_in_group)
+        with open(os.path.join(obj_dir, "all_grouped.json"), "w", encoding="utf-8") as f:
+            f.write(all_grouped)
+        with open(os.path.join(obj_dir, "raw_ungrouped.json"), "w", encoding="utf-8") as f:
+            f.write(raw_ungrouped)
+        # Save input parameters
+        with open(os.path.join(obj_dir, "input_parameters.json"), "w", encoding="utf-8") as f:
+            json.dump(params, f, indent=2)
+        # Save best-in-group learning objectives as Markdown
+        try:
+            objectives_data = json.loads(best_in_group)
+            md_content = "# Learning Objectives\n\n"
+            for i, obj in enumerate(objectives_data, 1):
+                learning_objective = obj.get("learning_objective", "")
+                md_content += f"{i}. {learning_objective}\n"
+            with open(os.path.join(obj_dir, "best_in_group.md"), "w", encoding="utf-8") as f:
+                f.write(md_content)
+        except Exception as e:
+            self.log(f"Error creating markdown output: {e}", level="ERROR")
+        self.log(f"Saved learning objectives outputs to {obj_dir}", level="INFO")
+    def save_questions_outputs(self, best_ranked: str, all_grouped: str,
+                              formatted_quiz: str, params: Dict[str, Any]):
+        """Save questions outputs to files in a numbered subfolder."""
+        if not self.save_outputs or self.current_run_dir is None:
+            return
+        # Create subfolder for this question run
+        q_base_dir = os.path.join(self.current_run_dir, "questions")
+        q_run_dir = os.path.join(q_base_dir, self.current_question_run_id if self.current_question_run_id else "q_run_001")
+        os.makedirs(q_run_dir, exist_ok=True)
+        # Save JSON outputs
+        with open(os.path.join(q_run_dir, "best_ranked.json"), "w", encoding="utf-8") as f:
+            f.write(best_ranked)
+        with open(os.path.join(q_run_dir, "all_grouped.json"), "w", encoding="utf-8") as f:
+            f.write(all_grouped)
+        # Save formatted quiz as markdown
+        with open(os.path.join(q_run_dir, "formatted_quiz.md"), "w", encoding="utf-8") as f:
+            f.write(formatted_quiz)
+        # Save input parameters
+        with open(os.path.join(q_run_dir, "input_parameters.json"), "w", encoding="utf-8") as f:
+            json.dump(params, f, indent=2)
+        self.log(f"Saved questions outputs to {q_run_dir}", level="INFO")
+    def get_current_run_id(self) -> Optional[str]:
+        """Get the current run ID."""
+        return self.current_run_id
+    def get_current_run_dir(self) -> Optional[str]:
+        """Get the current run directory."""
+        return self.current_run_dir
+    def get_current_question_run_id(self) -> Optional[str]:
+        """Get the current question run ID."""
+        return self.current_question_run_id
+    def get_latest_formatted_quiz_path(self) -> Optional[str]:
+        """Find the formatted_quiz.md from the latest question run."""
+        if self.current_run_dir is None:
+            return None
+        questions_dir = os.path.join(self.current_run_dir, "questions")
+        if not os.path.exists(questions_dir):
+            return None
+        q_runs = sorted([
+            d for d in os.listdir(questions_dir)
+            if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))
+        ])
+        if not q_runs:
+            return None
+        quiz_path = os.path.join(questions_dir, q_runs[-1], "formatted_quiz.md")
+        return quiz_path if os.path.exists(quiz_path) else None
+    def save_edited_quiz(self, content: str, filename: str = "formatted_quiz_edited.md") -> Optional[str]:
+        """Save edited quiz to the latest question run folder."""
+        if not self.save_outputs or self.current_run_dir is None:
+            return None
+        questions_dir = os.path.join(self.current_run_dir, "questions")
+        if not os.path.exists(questions_dir):
+            return None
+        q_runs = sorted([
+            d for d in os.listdir(questions_dir)
+            if d.startswith("q_run_") and os.path.isdir(os.path.join(questions_dir, d))
+        ])
+        if not q_runs:
+            return None
+        output_path = os.path.join(questions_dir, q_runs[-1], filename)
+        with open(output_path, "w", encoding="utf-8") as f:
+            f.write(content)
+        self.log(f"Saved edited quiz to {output_path}", level="INFO")
+        return output_path
+# Global run manager instance
+_run_manager = None
+def get_run_manager() -> RunManager:
+    """Get or create the global run manager instance."""
+    global _run_manager
+    if _run_manager is None:
+        _run_manager = RunManager()
+    return _run_manager

ui/state.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""Global state management for the UI."""
+# Global variables to store processed content and generated objectives
+processed_file_contents = []
+generated_learning_objectives = []
+def get_processed_contents():
+    """Get the current processed file contents."""
+    return processed_file_contents
+def set_processed_contents(contents):
+    """Set the processed file contents."""
+    global processed_file_contents
+    processed_file_contents = contents
+def get_learning_objectives():
+    """Get the current learning objectives."""
+    return generated_learning_objectives
+def set_learning_objectives(objectives):
+    """Set the learning objectives."""
+    global generated_learning_objectives
+    generated_learning_objectives = objectives
+def clear_state():
+    """Clear all state."""
+    global processed_file_contents, generated_learning_objectives
+    processed_file_contents = []
+    generated_learning_objectives = []