# AI Course Assessment Generator - Functionality Report ## Table of Contents 1. [Overview](#overview) 2. [System Architecture](#system-architecture) 3. [Data Models](#data-models) 4. [Application Entry Point](#application-entry-point) 5. [User Interface Structure](#user-interface-structure) 6. [Complete Workflow](#complete-workflow) 7. [Detailed Component Functionality](#detailed-component-functionality) 8. [Quality Standards and Prompts](#quality-standards-and-prompts) --- ## Overview The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels. ### Key Capabilities - **Multi-format Content Processing**: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks) - **AI-Powered Generation**: Uses OpenAI's GPT models with configurable parameters - **Quality Assurance**: Implements LLM-based quality assessment and ranking - **Source Tracking**: Maintains XML-tagged references from source materials to generated content - **Iterative Improvement**: Supports feedback-based regeneration and enhancement - **Parallel Processing**: Generates questions concurrently for improved performance --- ## System Architecture ### Architectural Patterns #### 1. **Orchestrator Pattern** Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly. #### 2. **Modular Prompt System** The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks. #### 3. **Structured Output Generation** All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API. #### 4. **Source Tracking via XML Tags** Content is wrapped in XML tags (e.g., `content`) throughout the pipeline to maintain traceability from source files to generated questions. ### Technology Stack - **Python 3.8+** - **Gradio 5.29.0+**: Web-based UI framework - **Pydantic 2.8.0+**: Data validation and schema management - **OpenAI 1.52.0+**: LLM API integration - **Instructor 1.7.9+**: Structured output generation - **nbformat 5.9.2**: Jupyter notebook parsing - **python-dotenv 1.0.0**: Environment variable management --- ## Data Models ### Learning Objectives Progression The system uses a hierarchical progression of learning objective models: #### 1. **BaseLearningObjectiveWithoutCorrectAnswer** ```python - id: int - learning_objective: str - source_reference: Union[List[str], str] ``` Initial generation without correct answers. #### 2. **BaseLearningObjective** ```python - id: int - learning_objective: str - source_reference: Union[List[str], str] - correct_answer: str ``` Base objectives with correct answers added. #### 3. **LearningObjective** ```python - id: int - learning_objective: str - source_reference: Union[List[str], str] - correct_answer: str - incorrect_answer_options: Union[List[str], str] - in_group: Optional[bool] - group_members: Optional[List[int]] - best_in_group: Optional[bool] ``` Enhanced with incorrect answer suggestions and grouping metadata. #### 4. **GroupedLearningObjective** ```python (All fields from LearningObjective) - in_group: bool (required) - group_members: List[int] (required) - best_in_group: bool (required) ``` Fully grouped and ranked objectives. ### Question Models Progression #### 1. **MultipleChoiceOption** ```python - option_text: str - is_correct: bool - feedback: str ``` #### 2. **MultipleChoiceQuestion** ```python - id: int - question_text: str - options: List[MultipleChoiceOption] - learning_objective_id: int - learning_objective: str - correct_answer: str - source_reference: Union[List[str], str] - judge_feedback: Optional[str] - approved: Optional[bool] ``` #### 3. **RankedMultipleChoiceQuestion** ```python (All fields from MultipleChoiceQuestion) - rank: int - ranking_reasoning: str - in_group: bool - group_members: List[int] - best_in_group: bool ``` #### 4. **Assessment** ```python - learning_objectives: List[LearningObjective] - questions: List[RankedMultipleChoiceQuestion] ``` Final output containing both objectives and questions. ### Configuration Models #### **MODELS** Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]` #### **TEMPERATURE_UNAVAILABLE** Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings). --- ## Application Entry Point ### `app.py` The root-level entry point that: 1. Loads environment variables from `.env` file 2. Checks for `OPENAI_API_KEY` presence 3. Creates the Gradio UI via `ui.app.create_ui()` 4. Launches the web interface at `http://127.0.0.1:7860` ```python # Workflow: load_dotenv() → Check API key → create_ui() → app.launch() ``` --- ## User Interface Structure ### `ui/app.py` - Gradio Interface The UI is organized into **3 main tabs**: #### **Tab 1: Generate Learning Objectives** **Input Components:** - File uploader (accepts `.ipynb`, `.vtt`, `.srt`) - Number of objectives per run (slider: 1-20, default: 3) - Number of generation runs (dropdown: 1-5, default: 3) - Model selection (dropdown, default: "gpt-5") - Incorrect answer model selection (dropdown, default: "gpt-5") - Temperature setting (dropdown: 0.0-1.0, default: 1.0) - Generate button - Feedback input textbox - Regenerate button **Output Components:** - Status textbox - Best-in-Group Learning Objectives (JSON) - All Grouped Learning Objectives (JSON) - Raw Ungrouped Learning Objectives (JSON) - for debugging **Event Handler:** `process_files()` from `objective_handlers.py` #### **Tab 2: Generate Questions** **Input Components:** - Learning Objectives JSON (auto-populated from Tab 1) - Model selection - Temperature setting - Number of question generation runs (slider: 1-5, default: 1) - Generate Questions button **Output Components:** - Status textbox - Ranked Best-in-Group Questions (JSON) - All Grouped Questions (JSON) - Formatted Quiz (human-readable format) **Event Handler:** `generate_questions()` from `question_handlers.py` #### **Tab 3: Propose/Edit Question** **Input Components:** - Question guidance/feedback textbox - Model selection - Temperature setting - Generate Question button **Output Components:** - Status textbox - Generated Question (JSON) **Event Handler:** `propose_question_handler()` from `feedback_handlers.py` --- ## Complete Workflow ### Phase 1: File Upload and Content Processing #### Step 1.1: File Upload User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface. #### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`) ```python # Handles different input formats: - List of file paths - Single file path string - File objects with .name attribute ``` #### Step 1.3: Content Processing (`ui/content_processor.py`) **For Subtitle Files (`.vtt`, `.srt`):** ```python 1. Read file with UTF-8 encoding 2. Split into lines 3. Filter out: - Empty lines - Numeric timestamp indicators - Lines containing '-->' (timestamps) - 'WEBVTT' header lines 4. Combine remaining text lines 5. Wrap in XML tags: content ``` **For Jupyter Notebooks (`.ipynb`):** ```python 1. Validate JSON format 2. Parse with nbformat.read() 3. Extract from cells: - Markdown cells: [Markdown]\n{content} - Code cells: [Code]\n```python\n{content}\n``` 4. Combine all cell content 5. Wrap in XML tags: content ``` **Error Handling:** - Invalid JSON: Wraps raw content in code blocks - Parsing failures: Falls back to plain text extraction - All errors logged to console #### Step 1.4: State Storage Processed content stored in global state (`ui/state.py`): ```python processed_file_contents = [tagged_content_1, tagged_content_2, ...] ``` ### Phase 2: Learning Objective Generation #### Step 2.1: Multi-Run Base Generation **Process:** `objective_handlers._generate_multiple_runs()` For each run (user-specified, typically 3 runs): 1. **Call:** `QuizGenerator.generate_base_learning_objectives()` 2. **Workflow:** ``` generate_base_learning_objectives() ↓ generate_base_learning_objectives_without_correct_answers() → Creates prompt with: - BASE_LEARNING_OBJECTIVES_PROMPT - BLOOMS_TAXONOMY_LEVELS - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS - Combined file contents → Calls OpenAI API with structured output → Returns List[BaseLearningObjectiveWithoutCorrectAnswer] ↓ generate_correct_answers_for_objectives() → For each objective: - Creates prompt with objective and course content - Calls OpenAI API (unstructured text response) - Extracts correct answer → Returns List[BaseLearningObjective] ``` 3. **ID Assignment:** ```python # Temporary IDs by run: Run 1: 1001, 1002, 1003 Run 2: 2001, 2002, 2003 Run 3: 3001, 3002, 3003 ``` 4. **Aggregation:** All objectives from all runs combined into single list. **Example:** 3 runs × 3 objectives = 9 total base objectives #### Step 2.2: Grouping and Ranking **Process:** `objective_handlers._group_base_objectives_add_incorrect_answers()` **Step 2.2.1: Group Base Objectives** ```python QuizGenerator.group_base_learning_objectives() ↓ learning_objective_generator/grouping_and_ranking.py → group_base_learning_objectives() ``` **Grouping Logic:** 1. Creates prompt containing: - Original generation criteria - All base objectives with IDs - Course content for context - Grouping instructions 2. **Special Rule:** All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective) 3. **LLM Call:** - Model: `gpt-5-mini` - Response format: `GroupedBaseLearningObjectivesResponse` - Returns: Grouped objectives with metadata 4. **Output Structure:** ```python { "all_grouped": [all objectives with group metadata], "best_in_group": [objectives marked as best in their groups] } ``` **Step 2.2.2: ID Reassignment** (`_reassign_objective_ids()`) ```python 1. Find best objective from the 001 group 2. Assign it ID = 1 3. Assign remaining objectives IDs starting from 2 ``` **Step 2.2.3: Generate Incorrect Answer Options** Only for **best-in-group** objectives: ```python QuizGenerator.generate_lo_incorrect_answer_options() ↓ learning_objective_generator/enhancement.py → generate_incorrect_answer_options() ``` **Process:** 1. For each best-in-group objective: - Creates prompt with: - Objective and correct answer - INCORRECT_ANSWER_PROMPT guidelines - INCORRECT_ANSWER_EXAMPLES - Course content - Calls OpenAI API (with optional model override) - Generates 5 plausible incorrect answer options 2. **Returns:** `List[LearningObjective]` with incorrect_answer_options populated **Step 2.2.4: Improve Incorrect Answers** ```python learning_objective_generator.regenerate_incorrect_answers() ↓ learning_objective_generator/suggestion_improvement.py ``` **Quality Check Process:** 1. For each objective's incorrect answers: - Checks for red flags (contradictory phrases, absolute terms) - Examples of red flags: - "but not necessarily" - "at the expense of" - "rather than" - "always", "never", "exclusively" 2. If problems found: - Logs issue to `incorrect_suggestion_debug/` directory - Regenerates incorrect answers with additional constraints - Updates objective with improved answers **Step 2.2.5: Final Assembly** Creates final list where: - Best-in-group objectives have enhanced incorrect answers - Non-best-in-group objectives have empty `incorrect_answer_options: []` #### Step 2.3: Display Results **Three output formats:** 1. **Best-in-Group Objectives** (primary output): - Only objectives marked as best_in_group - Includes incorrect answer options - Sorted by ID - Formatted as JSON 2. **All Grouped Objectives**: - All objectives with grouping metadata - Shows group_members arrays - Best-in-group flags visible 3. **Raw Ungrouped** (debug): - Original objectives from all runs - No grouping metadata - Original temporary IDs #### Step 2.4: State Update ```python set_learning_objectives(grouped_result["all_grouped"]) set_processed_contents(file_contents) # Already set, but persisted ``` ### Phase 3: Question Generation #### Step 3.1: Parse Learning Objectives **Process:** `question_handlers._parse_learning_objectives()` ```python 1. Parse JSON from Tab 1 output 2. Create LearningObjective objects from dictionaries 3. Validate required fields 4. Return List[LearningObjective] ``` #### Step 3.2: Multi-Run Question Generation **Process:** `question_handlers._generate_questions_multiple_runs()` For each run (user-specified, typically 1 run): ```python QuizGenerator.generate_questions_in_parallel() ↓ quiz_generator/assessment.py → generate_questions_in_parallel() ``` **Parallel Generation Process:** 1. **Thread Pool Setup:** ```python max_workers = min(len(learning_objectives), 5) ThreadPoolExecutor(max_workers=max_workers) ``` 2. **For Each Learning Objective (in parallel):** **Step 3.2.1: Question Generation** (`quiz_generator/question_generation.py`) ```python generate_multiple_choice_question() ``` **a) Source Content Matching:** ```python - Extract source_reference from objective - Search file_contents for matching XML tags - Exact match: - Fallback: Partial filename match - Last resort: Use all file contents combined ``` **b) Multi-Source Handling:** ```python if len(source_references) > 1: Add special instruction: "Question should synthesize information across sources" ``` **c) Prompt Construction:** ```python Combines: - Learning objective - Correct answer - Incorrect answer options from objective - GENERAL_QUALITY_STANDARDS - MULTIPLE_CHOICE_STANDARDS - EXAMPLE_QUESTIONS - QUESTION_SPECIFIC_QUALITY_STANDARDS - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION - ANSWER_FEEDBACK_QUALITY_STANDARDS - Matched course content ``` **d) API Call:** ```python - Model: User-selected (default: gpt-5) - Temperature: User-selected (if supported by model) - Response format: MultipleChoiceQuestion - Returns: Question with 4 options, each with feedback ``` **e) Post-Processing:** ```python - Set question ID = learning_objective ID - Verify all options have feedback - Add default feedback if missing ``` **Step 3.2.2: Quality Assessment** (`quiz_generator/question_improvement.py`) ```python judge_question_quality() ``` **Quality Judging Process:** ```python 1. Creates evaluation prompt with: - Question text and all options - Quality criteria from prompts - Evaluation instructions 2. LLM evaluates question for: - Clarity and unambiguity - Alignment with learning objective - Quality of incorrect options - Feedback quality - Appropriate difficulty 3. Returns: - approved: bool - feedback: str (reasoning for judgment) 4. Updates question: question.approved = approved question.judge_feedback = feedback ``` 3. **Results Collection:** ```python - Questions collected as futures complete - IDs assigned sequentially across runs - All questions aggregated into single list ``` **Example:** 3 objectives × 1 run = 3 questions generated in parallel #### Step 3.3: Grouping Questions **Process:** `quiz_generator/question_ranking.py → group_questions()` ```python 1. Creates prompt with: - All generated questions - Grouping instructions - Example format 2. LLM identifies: - Questions testing same concept (same learning_objective_id) - Groups of similar questions - Best question in each group 3. Model: gpt-5-mini Response format: GroupedMultipleChoiceQuestionsResponse 4. Returns: { "grouped": [all questions with group metadata], "best_in_group": [best questions from each group] } ``` #### Step 3.4: Ranking Questions **Process:** `quiz_generator/question_ranking.py → rank_questions()` **Only ranks best-in-group questions:** ```python 1. Creates prompt with: - RANK_QUESTIONS_PROMPT - All quality standards - Best-in-group questions only - Course content for context 2. Ranking Criteria: - Question clarity and unambiguity - Alignment with learning objective - Quality of incorrect options - Feedback quality - Appropriate difficulty (prefers simple English) - Adherence to all guidelines - Avoidance of absolute terms 3. Special Instructions: - NEVER change question with ID=1 - Each question gets unique rank (2, 3, 4, ...) - Rank 1 is reserved - All questions must be returned 4. Model: User-selected Response format: RankedMultipleChoiceQuestionsResponse 5. Returns: { "ranked": [questions with rank and ranking_reasoning] } ``` #### Step 3.5: Format Results **Process:** `question_handlers._format_question_results()` **Three outputs:** 1. **Best-in-Group Ranked Questions:** ```python - Sorted by rank - Includes all question data - Includes rank and ranking_reasoning - Includes group metadata - Formatted as JSON ``` 2. **All Grouped Questions:** ```python - All questions with group metadata - No ranking information - Shows which questions are in groups - Formatted as JSON ``` 3. **Formatted Quiz:** ```python format_quiz_for_ui() creates human-readable format: **Question 1 [Rank: 2]:** What is... Ranking Reasoning: ... • A [Correct]: Option text ◦ Feedback: Correct feedback • B: Option text ◦ Feedback: Incorrect feedback [continues for all questions] ``` ### Phase 4: Custom Question Generation (Optional) **Tab 3 Workflow:** #### Step 4.1: User Input User provides: - Free-form guidance/feedback text - Model selection - Temperature setting #### Step 4.2: Generation **Process:** `feedback_handlers.propose_question_handler()` ```python QuizGenerator.generate_multiple_choice_question_from_feedback() ↓ quiz_generator/feedback_questions.py ``` **Workflow:** ```python 1. Retrieves processed file contents from state 2. Creates prompt combining: - User feedback/guidance - All quality standards - Course content - Generation criteria 3. Model generates: - Single question - With learning objective inferred from guidance - 4 options with feedback - Source references 4. Returns: MultipleChoiceQuestionFromFeedback object (includes user feedback as metadata) 5. Formatted as JSON for display ``` ### Phase 5: Assessment Export (Automated) The final assessment can be saved using: ```python QuizGenerator.save_assessment_to_json() ↓ quiz_generator/assessment.py → save_assessment_to_json() ``` **Process:** ```python 1. Convert Assessment object to dictionary assessment_dict = assessment.model_dump() 2. Write to JSON file with indent=2 Default filename: "assessment.json" 3. Contains: - All learning objectives (best-in-group) - All ranked questions - Complete metadata ``` --- ## Detailed Component Functionality ### Content Processor (`ui/content_processor.py`) **Class: `ContentProcessor`** **Methods:** 1. **`process_files(file_paths: List[str]) -> List[str]`** - Main entry point for processing multiple files - Returns list of XML-tagged content strings - Stores results in `self.file_contents` 2. **`process_file(file_path: str) -> List[str]`** - Routes to appropriate handler based on file extension - Returns single-item list with tagged content 3. **`_process_subtitle_file(file_path: str) -> List[str]`** - Filters out timestamps and metadata - Preserves actual subtitle text - Wraps in `` tags 4. **`_process_notebook_file(file_path: str) -> List[str]`** - Validates JSON structure - Parses with nbformat - Extracts markdown and code cells - Falls back to raw text on parsing errors - Wraps in `` tags ### Learning Objective Generator (`learning_objective_generator/`) #### **generator.py - LearningObjectiveGenerator Class** **Orchestrator that delegates to specialized modules:** **Methods:** 1. **`generate_base_learning_objectives()`** - Delegates to `base_generation.py` - Returns base objectives with correct answers 2. **`group_base_learning_objectives()`** - Delegates to `grouping_and_ranking.py` - Groups similar objectives - Identifies best in each group 3. **`generate_incorrect_answer_options()`** - Delegates to `enhancement.py` - Adds 5 incorrect answer suggestions per objective 4. **`regenerate_incorrect_answers()`** - Delegates to `suggestion_improvement.py` - Quality-checks and improves incorrect answers 5. **`generate_and_group_learning_objectives()`** - Complete workflow method - Combines: base generation → grouping → incorrect answers - Returns dict with all_grouped and best_in_group #### **base_generation.py** **Key Functions:** **`generate_base_learning_objectives()`** - Wrapper that calls two separate functions - First: Generate objectives without correct answers - Second: Generate correct answers for those objectives **`generate_base_learning_objectives_without_correct_answers()`** **Process:** ```python 1. Extract source filenames from XML tags 2. Combine all file contents 3. Create prompt with: - BASE_LEARNING_OBJECTIVES_PROMPT - BLOOMS_TAXONOMY_LEVELS - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS - Course content 4. API call: - Model: User-selected - Temperature: User-selected (if supported) - Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse 5. Post-process: - Assign sequential IDs - Normalize source_reference (extract basenames) 6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer] ``` **`generate_correct_answers_for_objectives()`** **Process:** ```python 1. For each objective without answer: - Create prompt with objective + course content - Call OpenAI API (text response, not structured) - Extract correct answer - Create BaseLearningObjective with answer 2. Error handling: Add "[Error generating correct answer]" on failure 3. Returns: List[BaseLearningObjective] ``` **Quality Guidelines in Prompt:** - Objectives must be assessable via multiple-choice - Start with action verbs (identify, describe, define, list, compare) - One goal per objective - Derived directly from course content - Tool/framework agnostic (focus on principles, not specific implementations) - First objective should be relatively easy recall question - Avoid objectives about "building" or "creating" (not MC-assessable) #### **grouping_and_ranking.py** **Key Functions:** **`group_base_learning_objectives()`** **Process:** ```python 1. Format objectives for display in prompt 2. Create grouping prompt with: - Original generation criteria - All base objectives - Course content - Grouping instructions 3. Special rule: - All objectives with IDs ending in 1 grouped together - Best one selected from this group - Will become primary objective (ID=1) 4. API call: - Model: "gpt-5-mini" (hardcoded for efficiency) - Response format: GroupedBaseLearningObjectivesResponse 5. Post-process: - Normalize best_in_group to Python bool - Filter for best-in-group objectives 6. Returns: { "all_grouped": List[GroupedBaseLearningObjective], "best_in_group": List[GroupedBaseLearningObjective] } ``` **Grouping Criteria:** - Topic overlap - Similarity of concepts - Quality based on original generation criteria - Clarity and specificity - Alignment with course content #### **enhancement.py** **Key Function: `generate_incorrect_answer_options()`** **Process:** ```python 1. For each base objective: - Create prompt with: - Learning objective and correct answer - INCORRECT_ANSWER_PROMPT (detailed guidelines) - INCORRECT_ANSWER_EXAMPLES - Course content - Request 5 plausible incorrect options 2. API call: - Model: model_override or default - Temperature: User-selected (if supported) - Response format: LearningObjective (includes incorrect_answer_options) 3. Returns: List[LearningObjective] with all fields populated ``` **Incorrect Answer Quality Principles:** - Create common misunderstandings - Maintain identical structure to correct answer - Use course terminology correctly but in wrong contexts - Include partially correct information - Avoid obviously wrong answers - Mirror detail level and style of correct answer - Avoid absolute terms ("always", "never", "exclusively") - Avoid contradictory second clauses #### **suggestion_improvement.py** **Key Function: `regenerate_incorrect_answers()`** **Process:** ```python 1. For each learning objective: - Call should_regenerate_incorrect_answers() 2. should_regenerate_incorrect_answers(): - Creates evaluation prompt with: - Objective and all incorrect options - IMMEDIATE_RED_FLAGS checklist - RULES_FOR_SECOND_CLAUSES - LLM evaluates each option - Returns: needs_regeneration: bool 3. If regeneration needed: - Logs to incorrect_suggestion_debug/{id}.txt - Creates new prompt with additional constraints - Regenerates incorrect answers - Validates again 4. Returns: List[LearningObjective] with improved incorrect answers ``` **Red Flags Checked:** - Contradictory second clauses ("but not necessarily") - Explicit negations ("without automating") - Opposite descriptions ("fixed steps" for flexible systems) - Absolute/comparative terms - Hedging that creates limitations - Trade-off language creating false dichotomies ### Quiz Generator (`quiz_generator/`) #### **generator.py - QuizGenerator Class** **Orchestrator with LearningObjectiveGenerator embedded:** **Initialization:** ```python def __init__(self, api_key, model="gpt-5", temperature=1.0): self.client = OpenAI(api_key=api_key) self.model = model self.temperature = temperature self.learning_objective_generator = LearningObjectiveGenerator( api_key=api_key, model=model, temperature=temperature ) ``` **Methods (delegates to specialized modules):** 1. **`generate_base_learning_objectives()`** → delegates to LearningObjectiveGenerator 2. **`generate_lo_incorrect_answer_options()`** → delegates to LearningObjectiveGenerator 3. **`group_base_learning_objectives()`** → delegates to grouping_and_ranking.py 4. **`generate_multiple_choice_question()`** → delegates to question_generation.py 5. **`generate_questions_in_parallel()`** → delegates to assessment.py 6. **`group_questions()`** → delegates to question_ranking.py 7. **`rank_questions()`** → delegates to question_ranking.py 8. **`judge_question_quality()`** → delegates to question_improvement.py 9. **`regenerate_incorrect_answers()`** → delegates to question_improvement.py 10. **`generate_multiple_choice_question_from_feedback()`** → delegates to feedback_questions.py 11. **`save_assessment_to_json()`** → delegates to assessment.py #### **question_generation.py** **Key Function: `generate_multiple_choice_question()`** **Detailed Process:** **1. Source Content Matching:** ```python source_references = learning_objective.source_reference if isinstance(source_references, str): source_references = [source_references] combined_content = "" for source_file in source_references: # Try exact match: for file_content in file_contents: if f"" in file_content: combined_content += file_content break # Fallback: partial match if not found: for file_content in file_contents: if source_file in file_content: combined_content += file_content break # Last resort: use all content if not combined_content: combined_content = "\n\n".join(file_contents) ``` **2. Multi-Source Instruction:** ```python if len(source_references) > 1: Add special instruction: "This learning objective spans multiple sources. Your question should: 1. Synthesize information across these sources 2. Test understanding of overarching themes 3. Require knowledge from multiple sources" ``` **3. Prompt Construction:** Combines extensive quality standards: ```python - Learning objective - Correct answer - Incorrect answer options from objective - GENERAL_QUALITY_STANDARDS - MULTIPLE_CHOICE_STANDARDS - EXAMPLE_QUESTIONS - QUESTION_SPECIFIC_QUALITY_STANDARDS - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION - ANSWER_FEEDBACK_QUALITY_STANDARDS - Multi-source instruction (if applicable) - Matched course content ``` **4. API Call:** ```python params = { "model": model, "messages": [ {"role": "system", "content": "Expert educational assessment creator"}, {"role": "user", "content": prompt} ], "response_format": MultipleChoiceQuestion } if not TEMPERATURE_UNAVAILABLE.get(model, True): params["temperature"] = temperature response = client.beta.chat.completions.parse(**params) ``` **5. Post-Processing:** ```python - Set response.id = learning_objective.id - Set response.learning_objective_id = learning_objective.id - Set response.learning_objective = learning_objective.learning_objective - Set response.source_reference = learning_objective.source_reference - Verify all options have feedback - Add default feedback if missing ``` **6. Error Handling:** ```python On exception: - Create fallback question with 4 generic options - Include error message in question_text - Mark as questionable quality ``` #### **question_ranking.py** **Key Functions:** **`group_questions(questions, file_contents)`** **Process:** ```python 1. Create prompt with: - GROUP_QUESTIONS_PROMPT - All questions with complete data - Grouping instructions 2. Grouping Logic: - Questions with same learning_objective_id are similar - Group by topic overlap - Mark best_in_group within each group - Single-member groups: best_in_group = true by default 3. API call: - Model: User-selected - Response format: GroupedMultipleChoiceQuestionsResponse 4. Critical Instructions: - MUST return ALL questions - Each question must have group metadata - best_in_group set appropriately 5. Returns: { "grouped": List[GroupedMultipleChoiceQuestion], "best_in_group": [questions where best_in_group=true] } ``` **`rank_questions(questions, file_contents)`** **Process:** ```python 1. Create prompt with: - RANK_QUESTIONS_PROMPT - ALL quality standards (comprehensive) - Best-in-group questions only - Course content 2. Ranking Criteria (from prompt): - Question clarity and unambiguity - Alignment with learning objective - Quality of incorrect options - Feedback quality - Appropriate difficulty (simple English preferred) - Adherence to all guidelines - Avoidance of problematic words/phrases 3. Special Instructions: - DO NOT change question with ID=1 - Rank starting from 2 (rank 1 reserved) - Each question gets unique rank - Must return ALL questions 4. API call: - Model: User-selected - Response format: RankedMultipleChoiceQuestionsResponse 5. Returns: { "ranked": List[RankedMultipleChoiceQuestion] (includes rank and ranking_reasoning for each) } ``` **Simple vs Complex English Examples (from ranking criteria):** ``` Simple: "AI engineers create computer programs that can learn from data" Complex: "AI engineering practitioners architect computational paradigms exhibiting autonomous erudition capabilities" ``` #### **question_improvement.py** **Key Functions:** **`judge_question_quality(client, model, temperature, question)`** **Process:** ```python 1. Create evaluation prompt with: - Question text - All options with feedback - Quality criteria - Evaluation instructions 2. LLM evaluates: - Clarity and lack of ambiguity - Alignment with learning objective - Quality of distractors (incorrect options) - Feedback quality and helpfulness - Appropriate difficulty level - Adherence to all standards 3. API call: - Unstructured text response - LLM returns: APPROVED or NOT APPROVED + reasoning 4. Parsing: approved = "APPROVED" in response.upper() feedback = full response text 5. Returns: (approved: bool, feedback: str) ``` **`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`** **Process:** ```python 1. Extract incorrect options from question 2. Create evaluation prompt with: - Each incorrect option - IMMEDIATE_RED_FLAGS checklist - Course content for context 3. LLM checks each option for: - Contradictory second clauses - Explicit negations - Absolute terms - Opposite descriptions - Trade-off language 4. Returns: needs_regeneration: bool 5. If true: - Log to wrong_answer_debug/ directory - Provides detailed feedback on issues ``` **`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`** **Process:** ```python 1. For each question: - Check if regeneration needed - If yes: a. Create new prompt with stricter constraints b. Include original question for context c. Add specific rules about avoiding red flags d. Regenerate options e. Validate again - If no: keep original 2. Returns: List of questions with improved incorrect answers ``` #### **feedback_questions.py** **Key Function: `generate_multiple_choice_question_from_feedback()`** **Process:** ```python 1. Accept user feedback/guidance as free-form text 2. Create prompt combining: - User feedback - All quality standards - Course content - Standard generation criteria 3. LLM infers: - Learning objective from feedback - Appropriate question - 4 options with feedback - Source references 4. API call: - Model: User-selected - Response format: MultipleChoiceQuestionFromFeedback 5. Includes user feedback as metadata in response 6. Returns: Single question object ``` #### **assessment.py** **Key Functions:** **`generate_questions_in_parallel()`** **Parallel Processing Details:** ```python 1. Setup: max_workers = min(len(learning_objectives), 5) # Limits to 5 concurrent threads 2. Thread Pool Executor: with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor: 3. For each objective (in separate thread): Worker function: def generate_question_for_objective(objective, idx): - Generate question - Judge quality - Update with approval and feedback - Handle errors gracefully - Return complete question 4. Submit all tasks: future_to_idx = { executor.submit(generate_question_for_objective, obj, i): i for i, obj in enumerate(learning_objectives) } 5. Collect results as completed: for future in concurrent.futures.as_completed(future_to_idx): question = future.result() questions.append(question) print progress 6. Error handling: - Individual failures don't stop other threads - Placeholder questions created on error - All errors logged 7. Returns: List[MultipleChoiceQuestion] with quality judgments ``` **`save_assessment_to_json(assessment, output_path)`** ```python 1. Convert Pydantic model to dict: assessment_dict = assessment.model_dump() 2. Write to JSON file: with open(output_path, "w") as f: json.dump(assessment_dict, f, indent=2) 3. File contains: { "learning_objectives": [...], "questions": [...] } ``` ### State Management (`ui/state.py`) **Global State Variables:** ```python processed_file_contents = [] # List of XML-tagged content strings generated_learning_objectives = [] # List of learning objective objects ``` **Functions:** - `get_processed_contents()` → retrieves file contents - `set_processed_contents(contents)` → stores file contents - `get_learning_objectives()` → retrieves objectives - `set_learning_objectives(objectives)` → stores objectives - `clear_state()` → resets both variables **Purpose:** - Persists data between UI tabs - Allows Tab 2 to access content processed in Tab 1 - Allows Tab 3 to access content for custom questions - Enables regeneration with feedback ### UI Handlers #### **objective_handlers.py** **`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`** **Complete Workflow:** ```python 1. Validate inputs (files exist, API key present) 2. Extract file paths from Gradio file objects 3. Process files → get XML-tagged content 4. Store in state 5. Create QuizGenerator 6. Generate multiple runs of base objectives 7. Group and rank objectives 8. Generate incorrect answers for best-in-group 9. Improve incorrect answers 10. Reassign IDs (best from 001 group → ID=1) 11. Format results for display 12. Store in state 13. Return 4 outputs: status, best-in-group, all-grouped, raw ``` **`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`** **Workflow:** ```python 1. Retrieve processed contents from state 2. Append feedback to content: file_contents_with_feedback.append(f"FEEDBACK: {feedback}") 3. Generate new objectives with feedback context 4. Group and rank 5. Return regenerated objectives ``` **`_reassign_objective_ids(grouped_objectives)`** **ID Assignment Logic:** ```python 1. Find all objectives with IDs ending in 001 (1001, 2001, etc.) 2. Identify their groups 3. Find best_in_group objective from these groups 4. Assign it ID = 1 5. Assign all other objectives sequential IDs starting from 2 ``` **`_format_objective_results(grouped_result, all_learning_objectives)`** **Formatting:** ```python 1. Sort by ID 2. Create dictionaries from Pydantic objects 3. Include all metadata fields 4. Convert to JSON with indent=2 5. Return 3 formatted outputs + status message ``` #### **question_handlers.py** **`generate_questions(objectives_json, model_name, temperature, num_runs)`** **Complete Workflow:** ```python 1. Validate inputs 2. Parse objectives JSON → create LearningObjective objects 3. Retrieve processed contents from state 4. Create QuizGenerator 5. Generate questions (multiple runs in parallel) 6. Group questions by similarity 7. Rank best-in-group questions 8. Optionally improve incorrect answers (currently commented out) 9. Format results 10. Return 4 outputs: status, best-ranked, all-grouped, formatted ``` **`_generate_questions_multiple_runs()`** ```python For each run: 1. Call generate_questions_in_parallel() 2. Assign unique IDs across runs: start_id = len(all_questions) + 1 for i, q in enumerate(run_questions): q.id = start_id + i 3. Aggregate all questions ``` **`_group_and_rank_questions()`** ```python 1. Group all questions → get grouped and best_in_group 2. Rank only best_in_group questions 3. Return: { "grouped": all with group metadata, "best_in_group_ranked": best with ranks } ``` #### **feedback_handlers.py** **`propose_question_handler(guidance, model_name, temperature)`** **Workflow:** ```python 1. Validate state (processed contents available) 2. Create QuizGenerator 3. Call generate_multiple_choice_question_from_feedback() - Passes user guidance and course content - LLM infers learning objective - Generates complete question 4. Format as JSON 5. Return status and question JSON ``` ### Formatting Utilities (`ui/formatting.py`) **`format_quiz_for_ui(questions_json)`** **Process:** ```python 1. Parse JSON to list of question dictionaries 2. Sort by rank if available 3. For each question: - Add header: "**Question N [Rank: X]:** {question_text}" - Add ranking reasoning if available - For each option: - Add letter (A, B, C, D) - Mark correct option - Include option text - Include feedback indented 4. Return formatted string with markdown ``` **Output Example:** ``` **Question 1 [Rank: 2]:** What is the primary purpose of AI agents? Ranking Reasoning: Clear question that tests fundamental understanding... • A [Correct]: To automate tasks and make decisions ◦ Feedback: Correct! AI agents are designed to automate tasks... • B: To replace human workers entirely ◦ Feedback: While AI agents can automate tasks, they are not... [continues...] ``` --- ## Quality Standards and Prompts ### Learning Objectives Quality Standards **From `prompts/learning_objectives.py`:** **BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:** 1. **Assessability:** - Must be testable via multiple-choice questions - Cannot be about "building", "creating", "developing" - Should use verbs like: identify, list, describe, define, compare 2. **Specificity:** - One goal per objective - Don't combine multiple action verbs - Example of what NOT to do: "identify X and explain Y" 3. **Source Alignment:** - Derived DIRECTLY from course content - No topics not covered in content - Appropriate difficulty level for course 4. **Independence:** - Each objective stands alone - No dependencies on other objectives - No context required from other objectives 5. **Focus:** - Address "why" over "what" when possible - Critical knowledge over trivial facts - Principles over specific implementation details 6. **Tool/Framework Agnosticism:** - Don't mention specific tools/frameworks - Focus on underlying principles - Example: Don't ask about "Pandas DataFrame methods", ask about "data filtering concepts" 7. **First Objective Rule:** - Should be relatively easy recall question - Address main topic/concept of course - Format: "Identify what X is" or "Explain why X is important" 8. **Answer Length:** - Aim for ≤20 words in correct answer - Avoid unnecessary elaboration - No compound sentences with extra consequences **BLOOMS_TAXONOMY_LEVELS:** Levels from lowest to highest: - **Recall:** Retention of key concepts (not trivialities) - **Comprehension:** Connect ideas, demonstrate understanding - **Application:** Apply concept to new but similar scenario - **Analysis:** Examine parts, determine relationships, make inferences - **Evaluation:** Make judgments requiring critical thinking **LEARNING_OBJECTIVE_EXAMPLES:** Includes 7 high-quality examples with: - Appropriate action verbs - Clear learning objectives - Concise correct answers (mostly <20 words) - Multiple source references - Framework-agnostic language ### Question Quality Standards **From `prompts/questions.py`:** **GENERAL_QUALITY_STANDARDS:** - Overall goal: Set learner up for success - Perfect score attainable for thoughtful students - Aligned with course content - Aligned with learning objective and correct answer - No references to manual intervention (software/AI course) **MULTIPLE_CHOICE_STANDARDS:** - **EXACTLY ONE** correct answer per question - Clear, unambiguous correct answer - Plausible distractors representing common misconceptions - Not obviously wrong distractors - All options similar length and detail - Mutually exclusive options - Avoid "all/none of the above" - Typically 4 options (A, B, C, D) - Don't start feedback with "Correct" or "Incorrect" **QUESTION_SPECIFIC_QUALITY_STANDARDS:** Questions must: - Match language and tone of course - Match difficulty level of course - Assess only course information - Not teach as part of quiz - Use clear, concise language - Not induce confusion - Provide slight (not major) challenge - Be easily interpreted and unambiguous - Have proper grammar and sentence structure - Be thoughtful and specific (not broad and ambiguous) - Be complete in wording (understanding question shouldn't be part of assessment) **CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:** Correct answers must: - Be factually correct and unambiguous - Match course language and tone - Be complete sentences - Match course difficulty level - Contain only course information - Not teach during quiz - Use clear, concise language - Be thoughtful and specific - Be complete (identifying correct answer shouldn't require interpretation) **INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:** Incorrect answers should: - Represent reasonable potential misconceptions - Sound plausible to non-experts - Require thought even from diligent learners - Not be obviously wrong - Use incorrect_answer_suggestions from objective (as starting point) **Avoid:** - Obviously wrong options anyone can eliminate - Absolute terms: "always", "never", "only", "exclusively" - Phrases like "used exclusively for scenarios where..." **ANSWER_FEEDBACK_QUALITY_STANDARDS:** **For Incorrect Answers:** - Be informational and encouraging (not punitive) - Single sentence, concise - Do NOT say "Incorrect" or "Wrong" **For Correct Answers:** - Be informational and encouraging - Single sentence, concise - Do NOT say "Correct!" (redundant after "Correct: " prefix) ### Incorrect Answer Generation Guidelines **From `prompts/incorrect_answers.py`:** **Core Principles:** 1. **Create Common Misunderstandings:** - Represent how students actually misunderstand - Confuse related concepts - Mix up terminology 2. **Maintain Identical Structure:** - Match grammatical pattern of correct answer - Same length and complexity - Same formatting style 3. **Use Course Terminology Correctly but in Wrong Contexts:** - Apply correct terms incorrectly - Confuse with related concepts - Example: Describe backpropagation but actually describe forward propagation 4. **Include Partially Correct Information:** - First part correct, second part wrong - Correct process but wrong application - Correct concept but incomplete 5. **Avoid Obviously Wrong Answers:** - No contradictions with basic knowledge - Not immediately eliminable - Require course knowledge to reject 6. **Mirror Detail Level and Style:** - Match technical depth - Match tone - Same level of specificity 7. **For Lists, Maintain Consistency:** - Same number of items - Same format - Mix some correct with incorrect items 8. **AVOID ABSOLUTE TERMS:** - "always", "never", "exclusively", "primarily" - "all", "every", "none", "nothing", "only" - "must", "required", "impossible" - "rather than", "as opposed to", "instead of" **IMMEDIATE_RED_FLAGS** (triggers regeneration): **Contradictory Second Clauses:** - "but not necessarily" - "at the expense of" - "rather than [core concept]" - "ensuring X rather than Y" - "without necessarily" - "but has no impact on" - "but cannot", "but prevents", "but limits" **Explicit Negations:** - "without automating", "without incorporating" - "preventing [main benefit]" - "limiting [main capability]" **Opposite Descriptions:** - "fixed steps" (for flexible systems) - "manual intervention" (for automation) - "simple question answering" (for complex processing) **Hedging Creating Limitations:** - "sometimes", "occasionally", "might" - "to some extent", "partially", "somewhat" **INCORRECT_ANSWER_EXAMPLES:** Includes 10 detailed examples showing: - Learning objective - Correct answer - 3 plausible incorrect suggestions - Explanation of why each is plausible but wrong - Consistent formatting across all options ### Ranking and Grouping **RANK_QUESTIONS_PROMPT:** **Criteria:** 1. Question clarity and unambiguity 2. Alignment with learning objective 3. Quality of incorrect options 4. Quality of feedback 5. Appropriate difficulty (simple English preferred) 6. Adherence to all guidelines **Critical Instructions:** - DO NOT change question with ID=1 - Rank starting from 2 - Each question unique rank - Must return ALL questions - No omissions - No duplicate ranks **Simple vs Complex English:** ``` Simple: "AI engineers create computer programs that learn from data" Complex: "AI engineering practitioners architect computational paradigms exhibiting autonomous erudition capabilities" ``` **GROUP_QUESTIONS_PROMPT:** **Grouping Logic:** - Questions with same learning_objective_id are similar - Identify topic overlap - Mark best_in_group within each group - Single-member groups: best_in_group = true **Critical Instructions:** - Must return ALL questions - Each question needs group metadata - No omissions - Best in group marked appropriately --- ## Summary of Data Flow ### Complete End-to-End Flow ``` User Uploads Files ↓ ContentProcessor extracts and tags content ↓ [Stored in global state] ↓ Generate Base Objectives (multiple runs) ↓ Group Base Objectives (by similarity) ↓ Generate Incorrect Answers (for best-in-group only) ↓ Improve Incorrect Answers (quality check) ↓ Reassign IDs (best from 001 group → ID=1) ↓ [Objectives displayed in UI, stored in state] ↓ Generate Questions (parallel, multiple runs) ↓ Judge Question Quality (parallel) ↓ Group Questions (by similarity) ↓ Rank Questions (best-in-group only) ↓ [Questions displayed in UI] ↓ Format for Display ↓ Export to JSON (optional) ``` ### Key Optimization Strategies 1. **Multiple Generation Runs:** - Generates variety of objectives/questions - Grouping identifies best versions - Reduces risk of poor quality individual outputs 2. **Hierarchical Processing:** - Generate base → Group → Enhance → Improve - Only enhances best candidates (saves API calls) - Progressive refinement 3. **Parallel Processing:** - Questions generated concurrently (up to 5 threads) - Significant time savings for multiple objectives - Independent evaluations 4. **Quality Gating:** - LLM judges question quality - Checks for red flags in incorrect answers - Regenerates problematic content 5. **Source Tracking:** - XML tags preserve origin - Questions link back to source materials - Enables accurate content matching 6. **Modular Prompts:** - Reusable quality standards - Consistent across all generations - Easy to update centrally --- ## Configuration and Customization ### Available Models **Configured in `models/config.py`:** ```python MODELS = [ "o3-mini", "o1", # Reasoning models (no temperature) "gpt-4.1", "gpt-4o", # GPT-4 variants "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", # Legacy "gpt-5", # Latest (no temperature) "gpt-5-mini", # Efficient (no temperature) "gpt-5-nano" # Ultra-efficient (no temperature) ] ``` **Temperature Support:** - Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature - Other models: Temperature 0.0 to 1.0 **Model Selection Strategy:** - **Base objectives:** User-selected (default: gpt-5) - **Grouping:** Hardcoded gpt-5-mini (efficiency) - **Incorrect answers:** Separate user selection (default: gpt-5) - **Questions:** User-selected (default: gpt-5) - **Quality judging:** User-selected or gpt-5-mini ### Environment Variables **Required:** ``` OPENAI_API_KEY=your_api_key_here ``` **Configured via `.env` file in project root** ### Customization Points 1. **Quality Standards:** - Edit `prompts/learning_objectives.py` - Edit `prompts/questions.py` - Edit `prompts/incorrect_answers.py` - Changes apply to all future generations 2. **Example Questions/Objectives:** - Modify LEARNING_OBJECTIVE_EXAMPLES - Modify EXAMPLE_QUESTIONS - Modify INCORRECT_ANSWER_EXAMPLES - LLM learns from these examples 3. **Generation Parameters:** - Number of objectives per run - Number of runs (variety) - Temperature (creativity vs consistency) - Model selection (quality vs cost/speed) 4. **Parallel Processing:** - `max_workers` in assessment.py - Currently: min(len(objectives), 5) - Adjust for your rate limits 5. **Output Formats:** - Modify `formatting.py` for display - Assessment JSON structure in `models/assessment.py` --- ## Error Handling and Resilience ### Content Processing Errors - **Invalid JSON notebooks:** Falls back to raw text - **Parse failures:** Wraps in code blocks, continues - **Missing files:** Logged, skipped - **Encoding issues:** UTF-8 fallback ### Generation Errors - **API failures:** Logged with traceback - **Structured output parse errors:** Fallback responses created - **Missing required fields:** Default values assigned - **Validation errors:** Caught and logged ### Parallel Processing Errors - **Individual thread failures:** Don't stop other threads - **Placeholder questions:** Created on error - **Complete error details:** Logged for debugging - **Graceful degradation:** Partial results returned ### Quality Check Failures - **Regeneration failures:** Original kept with warning - **Judge unavailable:** Questions marked unapproved - **Validation failures:** Detailed logs in debug directories --- ## Debug and Logging ### Debug Directories 1. **`incorrect_suggestion_debug/`** - Created during objective enhancement - Contains logs of problematic incorrect answers - Format: `{objective_id}.txt` - Includes: Original suggestions, identified issues, regeneration attempts 2. **`wrong_answer_debug/`** - Created during question improvement - Logs question-level incorrect answer issues - Regeneration history ### Console Logging **Extensive logging throughout:** - File processing status - Generation progress (run numbers) - Parallel thread activity (thread IDs) - API call results - Error messages with tracebacks - Timing information (start/end times) **Example Log Output:** ``` DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt'] DEBUG - Found source file: file1.vtt Generating 3 learning objectives from 3 files Successfully generated 3 learning objectives without correct answers Generated correct answer for objective 1 Grouping 9 base learning objectives Received 9 grouped results Generating incorrect answer options only for best-in-group objectives... PARALLEL: Starting ThreadPoolExecutor with 3 workers PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective... Question generation completed in 45.23 seconds ``` --- ## Performance Considerations ### API Call Optimization **Calls per Workflow:** For 3 objectives × 3 runs = 9 base objectives: 1. **Learning Objectives:** - Base generation: 3 calls (one per run) - Correct answers: 9 calls (one per objective) - Grouping: 1 call - Incorrect answers: ~3 calls (best-in-group only) - Improvement checks: ~3 calls - **Total: ~19 calls** 2. **Questions (for 3 objectives × 1 run):** - Question generation: 3 calls (parallel) - Quality judging: 3 calls (parallel) - Grouping: 1 call - Ranking: 1 call - **Total: ~8 calls** **Total for complete workflow: ~27 API calls** ### Time Estimates **Typical Execution Times:** - File processing: <1 second - Objective generation (3×3): 30-60 seconds - Question generation (3×1): 20-40 seconds (with parallelization) - **Total: 1-2 minutes for small course** **Factors Affecting Speed:** - Model selection (gpt-5 slower than gpt-5-mini) - Number of runs - Number of objectives/questions - API rate limits - Network latency - Parallel worker count ### Cost Optimization **Strategies:** 1. Use gpt-5-mini for grouping/ranking (hardcoded) 2. Reduce number of runs (trade-off: variety) 3. Generate fewer objectives initially 4. Use faster models for initial exploration 5. Use premium models for final production --- ## Conclusion The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs: - **Modular architecture** for maintainability - **Structured output generation** for reliability - **Quality-driven iterative refinement** for excellence - **Parallel processing** for efficiency - **Comprehensive error handling** for resilience The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.