Spaces:
Sleeping
Sleeping
| # AI Course Assessment Generator - Functionality Report | |
| ## Table of Contents | |
| 1. [Overview](#overview) | |
| 2. [System Architecture](#system-architecture) | |
| 3. [Data Models](#data-models) | |
| 4. [Application Entry Point](#application-entry-point) | |
| 5. [User Interface Structure](#user-interface-structure) | |
| 6. [Complete Workflow](#complete-workflow) | |
| 7. [Detailed Component Functionality](#detailed-component-functionality) | |
| 8. [Quality Standards and Prompts](#quality-standards-and-prompts) | |
| --- | |
| ## Overview | |
| The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels. | |
| ### Key Capabilities | |
| - **Multi-format Content Processing**: Accepts `.vtt`, `.srt` (subtitle files), and `.ipynb` (Jupyter notebooks) | |
| - **AI-Powered Generation**: Uses OpenAI's GPT models with configurable parameters | |
| - **Quality Assurance**: Implements LLM-based quality assessment and ranking | |
| - **Source Tracking**: Maintains XML-tagged references from source materials to generated content | |
| - **Iterative Improvement**: Supports feedback-based regeneration and enhancement | |
| - **Parallel Processing**: Generates questions concurrently for improved performance | |
| --- | |
| ## System Architecture | |
| ### Architectural Patterns | |
| #### 1. **Orchestrator Pattern** | |
| Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly. | |
| #### 2. **Modular Prompt System** | |
| The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks. | |
| #### 3. **Structured Output Generation** | |
| All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats using OpenAI's structured output API. | |
| #### 4. **Source Tracking via XML Tags** | |
| Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions. | |
| ### Technology Stack | |
| - **Python 3.8+** | |
| - **Gradio 5.29.0+**: Web-based UI framework | |
| - **Pydantic 2.8.0+**: Data validation and schema management | |
| - **OpenAI 1.52.0+**: LLM API integration | |
| - **Instructor 1.7.9+**: Structured output generation | |
| - **nbformat 5.9.2**: Jupyter notebook parsing | |
| - **python-dotenv 1.0.0**: Environment variable management | |
| --- | |
| ## Data Models | |
| ### Learning Objectives Progression | |
| The system uses a hierarchical progression of learning objective models: | |
| #### 1. **BaseLearningObjectiveWithoutCorrectAnswer** | |
| ```python | |
| - id: int | |
| - learning_objective: str | |
| - source_reference: Union[List[str], str] | |
| ``` | |
| Initial generation without correct answers. | |
| #### 2. **BaseLearningObjective** | |
| ```python | |
| - id: int | |
| - learning_objective: str | |
| - source_reference: Union[List[str], str] | |
| - correct_answer: str | |
| ``` | |
| Base objectives with correct answers added. | |
| #### 3. **LearningObjective** | |
| ```python | |
| - id: int | |
| - learning_objective: str | |
| - source_reference: Union[List[str], str] | |
| - correct_answer: str | |
| - incorrect_answer_options: Union[List[str], str] | |
| - in_group: Optional[bool] | |
| - group_members: Optional[List[int]] | |
| - best_in_group: Optional[bool] | |
| ``` | |
| Enhanced with incorrect answer suggestions and grouping metadata. | |
| #### 4. **GroupedLearningObjective** | |
| ```python | |
| (All fields from LearningObjective) | |
| - in_group: bool (required) | |
| - group_members: List[int] (required) | |
| - best_in_group: bool (required) | |
| ``` | |
| Fully grouped and ranked objectives. | |
| ### Question Models Progression | |
| #### 1. **MultipleChoiceOption** | |
| ```python | |
| - option_text: str | |
| - is_correct: bool | |
| - feedback: str | |
| ``` | |
| #### 2. **MultipleChoiceQuestion** | |
| ```python | |
| - id: int | |
| - question_text: str | |
| - options: List[MultipleChoiceOption] | |
| - learning_objective_id: int | |
| - learning_objective: str | |
| - correct_answer: str | |
| - source_reference: Union[List[str], str] | |
| - judge_feedback: Optional[str] | |
| - approved: Optional[bool] | |
| ``` | |
| #### 3. **RankedMultipleChoiceQuestion** | |
| ```python | |
| (All fields from MultipleChoiceQuestion) | |
| - rank: int | |
| - ranking_reasoning: str | |
| - in_group: bool | |
| - group_members: List[int] | |
| - best_in_group: bool | |
| ``` | |
| #### 4. **Assessment** | |
| ```python | |
| - learning_objectives: List[LearningObjective] | |
| - questions: List[RankedMultipleChoiceQuestion] | |
| ``` | |
| Final output containing both objectives and questions. | |
| ### Configuration Models | |
| #### **MODELS** | |
| Available OpenAI models: `["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]` | |
| #### **TEMPERATURE_UNAVAILABLE** | |
| Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings). | |
| --- | |
| ## Application Entry Point | |
| ### `app.py` | |
| The root-level entry point that: | |
| 1. Loads environment variables from `.env` file | |
| 2. Checks for `OPENAI_API_KEY` presence | |
| 3. Creates the Gradio UI via `ui.app.create_ui()` | |
| 4. Launches the web interface at `http://127.0.0.1:7860` | |
| ```python | |
| # Workflow: | |
| load_dotenv() → Check API key → create_ui() → app.launch() | |
| ``` | |
| --- | |
| ## User Interface Structure | |
| ### `ui/app.py` - Gradio Interface | |
| The UI is organized into **3 main tabs**: | |
| #### **Tab 1: Generate Learning Objectives** | |
| **Input Components:** | |
| - File uploader (accepts `.ipynb`, `.vtt`, `.srt`) | |
| - Number of objectives per run (slider: 1-20, default: 3) | |
| - Number of generation runs (dropdown: 1-5, default: 3) | |
| - Model selection (dropdown, default: "gpt-5") | |
| - Incorrect answer model selection (dropdown, default: "gpt-5") | |
| - Temperature setting (dropdown: 0.0-1.0, default: 1.0) | |
| - Generate button | |
| - Feedback input textbox | |
| - Regenerate button | |
| **Output Components:** | |
| - Status textbox | |
| - Best-in-Group Learning Objectives (JSON) | |
| - All Grouped Learning Objectives (JSON) | |
| - Raw Ungrouped Learning Objectives (JSON) - for debugging | |
| **Event Handler:** `process_files()` from `objective_handlers.py` | |
| #### **Tab 2: Generate Questions** | |
| **Input Components:** | |
| - Learning Objectives JSON (auto-populated from Tab 1) | |
| - Model selection | |
| - Temperature setting | |
| - Number of question generation runs (slider: 1-5, default: 1) | |
| - Generate Questions button | |
| **Output Components:** | |
| - Status textbox | |
| - Ranked Best-in-Group Questions (JSON) | |
| - All Grouped Questions (JSON) | |
| - Formatted Quiz (human-readable format) | |
| **Event Handler:** `generate_questions()` from `question_handlers.py` | |
| #### **Tab 3: Propose/Edit Question** | |
| **Input Components:** | |
| - Question guidance/feedback textbox | |
| - Model selection | |
| - Temperature setting | |
| - Generate Question button | |
| **Output Components:** | |
| - Status textbox | |
| - Generated Question (JSON) | |
| **Event Handler:** `propose_question_handler()` from `feedback_handlers.py` | |
| --- | |
| ## Complete Workflow | |
| ### Phase 1: File Upload and Content Processing | |
| #### Step 1.1: File Upload | |
| User uploads one or more files (`.vtt`, `.srt`, `.ipynb`) through the Gradio interface. | |
| #### Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`) | |
| ```python | |
| # Handles different input formats: | |
| - List of file paths | |
| - Single file path string | |
| - File objects with .name attribute | |
| ``` | |
| #### Step 1.3: Content Processing (`ui/content_processor.py`) | |
| **For Subtitle Files (`.vtt`, `.srt`):** | |
| ```python | |
| 1. Read file with UTF-8 encoding | |
| 2. Split into lines | |
| 3. Filter out: | |
| - Empty lines | |
| - Numeric timestamp indicators | |
| - Lines containing '-->' (timestamps) | |
| - 'WEBVTT' header lines | |
| 4. Combine remaining text lines | |
| 5. Wrap in XML tags: <source file='filename.vtt'>content</source> | |
| ``` | |
| **For Jupyter Notebooks (`.ipynb`):** | |
| ```python | |
| 1. Validate JSON format | |
| 2. Parse with nbformat.read() | |
| 3. Extract from cells: | |
| - Markdown cells: [Markdown]\n{content} | |
| - Code cells: [Code]\n```python\n{content}\n``` | |
| 4. Combine all cell content | |
| 5. Wrap in XML tags: <source file='filename.ipynb'>content</source> | |
| ``` | |
| **Error Handling:** | |
| - Invalid JSON: Wraps raw content in code blocks | |
| - Parsing failures: Falls back to plain text extraction | |
| - All errors logged to console | |
| #### Step 1.4: State Storage | |
| Processed content stored in global state (`ui/state.py`): | |
| ```python | |
| processed_file_contents = [tagged_content_1, tagged_content_2, ...] | |
| ``` | |
| ### Phase 2: Learning Objective Generation | |
| #### Step 2.1: Multi-Run Base Generation | |
| **Process:** `objective_handlers._generate_multiple_runs()` | |
| For each run (user-specified, typically 3 runs): | |
| 1. **Call:** `QuizGenerator.generate_base_learning_objectives()` | |
| 2. **Workflow:** | |
| ``` | |
| generate_base_learning_objectives() | |
| ↓ | |
| generate_base_learning_objectives_without_correct_answers() | |
| → Creates prompt with: | |
| - BASE_LEARNING_OBJECTIVES_PROMPT | |
| - BLOOMS_TAXONOMY_LEVELS | |
| - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS | |
| - Combined file contents | |
| → Calls OpenAI API with structured output | |
| → Returns List[BaseLearningObjectiveWithoutCorrectAnswer] | |
| ↓ | |
| generate_correct_answers_for_objectives() | |
| → For each objective: | |
| - Creates prompt with objective and course content | |
| - Calls OpenAI API (unstructured text response) | |
| - Extracts correct answer | |
| → Returns List[BaseLearningObjective] | |
| ``` | |
| 3. **ID Assignment:** | |
| ```python | |
| # Temporary IDs by run: | |
| Run 1: 1001, 1002, 1003 | |
| Run 2: 2001, 2002, 2003 | |
| Run 3: 3001, 3002, 3003 | |
| ``` | |
| 4. **Aggregation:** | |
| All objectives from all runs combined into single list. | |
| **Example:** 3 runs × 3 objectives = 9 total base objectives | |
| #### Step 2.2: Grouping and Ranking | |
| **Process:** `objective_handlers._group_base_objectives_add_incorrect_answers()` | |
| **Step 2.2.1: Group Base Objectives** | |
| ```python | |
| QuizGenerator.group_base_learning_objectives() | |
| ↓ | |
| learning_objective_generator/grouping_and_ranking.py | |
| → group_base_learning_objectives() | |
| ``` | |
| **Grouping Logic:** | |
| 1. Creates prompt containing: | |
| - Original generation criteria | |
| - All base objectives with IDs | |
| - Course content for context | |
| - Grouping instructions | |
| 2. **Special Rule:** All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective) | |
| 3. **LLM Call:** | |
| - Model: `gpt-5-mini` | |
| - Response format: `GroupedBaseLearningObjectivesResponse` | |
| - Returns: Grouped objectives with metadata | |
| 4. **Output Structure:** | |
| ```python | |
| { | |
| "all_grouped": [all objectives with group metadata], | |
| "best_in_group": [objectives marked as best in their groups] | |
| } | |
| ``` | |
| **Step 2.2.2: ID Reassignment** (`_reassign_objective_ids()`) | |
| ```python | |
| 1. Find best objective from the 001 group | |
| 2. Assign it ID = 1 | |
| 3. Assign remaining objectives IDs starting from 2 | |
| ``` | |
| **Step 2.2.3: Generate Incorrect Answer Options** | |
| Only for **best-in-group** objectives: | |
| ```python | |
| QuizGenerator.generate_lo_incorrect_answer_options() | |
| ↓ | |
| learning_objective_generator/enhancement.py | |
| → generate_incorrect_answer_options() | |
| ``` | |
| **Process:** | |
| 1. For each best-in-group objective: | |
| - Creates prompt with: | |
| - Objective and correct answer | |
| - INCORRECT_ANSWER_PROMPT guidelines | |
| - INCORRECT_ANSWER_EXAMPLES | |
| - Course content | |
| - Calls OpenAI API (with optional model override) | |
| - Generates 5 plausible incorrect answer options | |
| 2. **Returns:** `List[LearningObjective]` with incorrect_answer_options populated | |
| **Step 2.2.4: Improve Incorrect Answers** | |
| ```python | |
| learning_objective_generator.regenerate_incorrect_answers() | |
| ↓ | |
| learning_objective_generator/suggestion_improvement.py | |
| ``` | |
| **Quality Check Process:** | |
| 1. For each objective's incorrect answers: | |
| - Checks for red flags (contradictory phrases, absolute terms) | |
| - Examples of red flags: | |
| - "but not necessarily" | |
| - "at the expense of" | |
| - "rather than" | |
| - "always", "never", "exclusively" | |
| 2. If problems found: | |
| - Logs issue to `incorrect_suggestion_debug/` directory | |
| - Regenerates incorrect answers with additional constraints | |
| - Updates objective with improved answers | |
| **Step 2.2.5: Final Assembly** | |
| Creates final list where: | |
| - Best-in-group objectives have enhanced incorrect answers | |
| - Non-best-in-group objectives have empty `incorrect_answer_options: []` | |
| #### Step 2.3: Display Results | |
| **Three output formats:** | |
| 1. **Best-in-Group Objectives** (primary output): | |
| - Only objectives marked as best_in_group | |
| - Includes incorrect answer options | |
| - Sorted by ID | |
| - Formatted as JSON | |
| 2. **All Grouped Objectives**: | |
| - All objectives with grouping metadata | |
| - Shows group_members arrays | |
| - Best-in-group flags visible | |
| 3. **Raw Ungrouped** (debug): | |
| - Original objectives from all runs | |
| - No grouping metadata | |
| - Original temporary IDs | |
| #### Step 2.4: State Update | |
| ```python | |
| set_learning_objectives(grouped_result["all_grouped"]) | |
| set_processed_contents(file_contents) # Already set, but persisted | |
| ``` | |
| ### Phase 3: Question Generation | |
| #### Step 3.1: Parse Learning Objectives | |
| **Process:** `question_handlers._parse_learning_objectives()` | |
| ```python | |
| 1. Parse JSON from Tab 1 output | |
| 2. Create LearningObjective objects from dictionaries | |
| 3. Validate required fields | |
| 4. Return List[LearningObjective] | |
| ``` | |
| #### Step 3.2: Multi-Run Question Generation | |
| **Process:** `question_handlers._generate_questions_multiple_runs()` | |
| For each run (user-specified, typically 1 run): | |
| ```python | |
| QuizGenerator.generate_questions_in_parallel() | |
| ↓ | |
| quiz_generator/assessment.py | |
| → generate_questions_in_parallel() | |
| ``` | |
| **Parallel Generation Process:** | |
| 1. **Thread Pool Setup:** | |
| ```python | |
| max_workers = min(len(learning_objectives), 5) | |
| ThreadPoolExecutor(max_workers=max_workers) | |
| ``` | |
| 2. **For Each Learning Objective (in parallel):** | |
| **Step 3.2.1: Question Generation** (`quiz_generator/question_generation.py`) | |
| ```python | |
| generate_multiple_choice_question() | |
| ``` | |
| **a) Source Content Matching:** | |
| ```python | |
| - Extract source_reference from objective | |
| - Search file_contents for matching XML tags | |
| - Exact match: <source file='filename.vtt'> | |
| - Fallback: Partial filename match | |
| - Last resort: Use all file contents combined | |
| ``` | |
| **b) Multi-Source Handling:** | |
| ```python | |
| if len(source_references) > 1: | |
| Add special instruction: | |
| "Question should synthesize information across sources" | |
| ``` | |
| **c) Prompt Construction:** | |
| ```python | |
| Combines: | |
| - Learning objective | |
| - Correct answer | |
| - Incorrect answer options from objective | |
| - GENERAL_QUALITY_STANDARDS | |
| - MULTIPLE_CHOICE_STANDARDS | |
| - EXAMPLE_QUESTIONS | |
| - QUESTION_SPECIFIC_QUALITY_STANDARDS | |
| - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS | |
| - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION | |
| - ANSWER_FEEDBACK_QUALITY_STANDARDS | |
| - Matched course content | |
| ``` | |
| **d) API Call:** | |
| ```python | |
| - Model: User-selected (default: gpt-5) | |
| - Temperature: User-selected (if supported by model) | |
| - Response format: MultipleChoiceQuestion | |
| - Returns: Question with 4 options, each with feedback | |
| ``` | |
| **e) Post-Processing:** | |
| ```python | |
| - Set question ID = learning_objective ID | |
| - Verify all options have feedback | |
| - Add default feedback if missing | |
| ``` | |
| **Step 3.2.2: Quality Assessment** (`quiz_generator/question_improvement.py`) | |
| ```python | |
| judge_question_quality() | |
| ``` | |
| **Quality Judging Process:** | |
| ```python | |
| 1. Creates evaluation prompt with: | |
| - Question text and all options | |
| - Quality criteria from prompts | |
| - Evaluation instructions | |
| 2. LLM evaluates question for: | |
| - Clarity and unambiguity | |
| - Alignment with learning objective | |
| - Quality of incorrect options | |
| - Feedback quality | |
| - Appropriate difficulty | |
| 3. Returns: | |
| - approved: bool | |
| - feedback: str (reasoning for judgment) | |
| 4. Updates question: | |
| question.approved = approved | |
| question.judge_feedback = feedback | |
| ``` | |
| 3. **Results Collection:** | |
| ```python | |
| - Questions collected as futures complete | |
| - IDs assigned sequentially across runs | |
| - All questions aggregated into single list | |
| ``` | |
| **Example:** 3 objectives × 1 run = 3 questions generated in parallel | |
| #### Step 3.3: Grouping Questions | |
| **Process:** `quiz_generator/question_ranking.py → group_questions()` | |
| ```python | |
| 1. Creates prompt with: | |
| - All generated questions | |
| - Grouping instructions | |
| - Example format | |
| 2. LLM identifies: | |
| - Questions testing same concept (same learning_objective_id) | |
| - Groups of similar questions | |
| - Best question in each group | |
| 3. Model: gpt-5-mini | |
| Response format: GroupedMultipleChoiceQuestionsResponse | |
| 4. Returns: | |
| { | |
| "grouped": [all questions with group metadata], | |
| "best_in_group": [best questions from each group] | |
| } | |
| ``` | |
| #### Step 3.4: Ranking Questions | |
| **Process:** `quiz_generator/question_ranking.py → rank_questions()` | |
| **Only ranks best-in-group questions:** | |
| ```python | |
| 1. Creates prompt with: | |
| - RANK_QUESTIONS_PROMPT | |
| - All quality standards | |
| - Best-in-group questions only | |
| - Course content for context | |
| 2. Ranking Criteria: | |
| - Question clarity and unambiguity | |
| - Alignment with learning objective | |
| - Quality of incorrect options | |
| - Feedback quality | |
| - Appropriate difficulty (prefers simple English) | |
| - Adherence to all guidelines | |
| - Avoidance of absolute terms | |
| 3. Special Instructions: | |
| - NEVER change question with ID=1 | |
| - Each question gets unique rank (2, 3, 4, ...) | |
| - Rank 1 is reserved | |
| - All questions must be returned | |
| 4. Model: User-selected | |
| Response format: RankedMultipleChoiceQuestionsResponse | |
| 5. Returns: | |
| { | |
| "ranked": [questions with rank and ranking_reasoning] | |
| } | |
| ``` | |
| #### Step 3.5: Format Results | |
| **Process:** `question_handlers._format_question_results()` | |
| **Three outputs:** | |
| 1. **Best-in-Group Ranked Questions:** | |
| ```python | |
| - Sorted by rank | |
| - Includes all question data | |
| - Includes rank and ranking_reasoning | |
| - Includes group metadata | |
| - Formatted as JSON | |
| ``` | |
| 2. **All Grouped Questions:** | |
| ```python | |
| - All questions with group metadata | |
| - No ranking information | |
| - Shows which questions are in groups | |
| - Formatted as JSON | |
| ``` | |
| 3. **Formatted Quiz:** | |
| ```python | |
| format_quiz_for_ui() creates human-readable format: | |
| **Question 1 [Rank: 2]:** What is... | |
| Ranking Reasoning: ... | |
| • A [Correct]: Option text | |
| ◦ Feedback: Correct feedback | |
| • B: Option text | |
| ◦ Feedback: Incorrect feedback | |
| [continues for all questions] | |
| ``` | |
| ### Phase 4: Custom Question Generation (Optional) | |
| **Tab 3 Workflow:** | |
| #### Step 4.1: User Input | |
| User provides: | |
| - Free-form guidance/feedback text | |
| - Model selection | |
| - Temperature setting | |
| #### Step 4.2: Generation | |
| **Process:** `feedback_handlers.propose_question_handler()` | |
| ```python | |
| QuizGenerator.generate_multiple_choice_question_from_feedback() | |
| ↓ | |
| quiz_generator/feedback_questions.py | |
| ``` | |
| **Workflow:** | |
| ```python | |
| 1. Retrieves processed file contents from state | |
| 2. Creates prompt combining: | |
| - User feedback/guidance | |
| - All quality standards | |
| - Course content | |
| - Generation criteria | |
| 3. Model generates: | |
| - Single question | |
| - With learning objective inferred from guidance | |
| - 4 options with feedback | |
| - Source references | |
| 4. Returns: MultipleChoiceQuestionFromFeedback object | |
| (includes user feedback as metadata) | |
| 5. Formatted as JSON for display | |
| ``` | |
| ### Phase 5: Assessment Export (Automated) | |
| The final assessment can be saved using: | |
| ```python | |
| QuizGenerator.save_assessment_to_json() | |
| ↓ | |
| quiz_generator/assessment.py → save_assessment_to_json() | |
| ``` | |
| **Process:** | |
| ```python | |
| 1. Convert Assessment object to dictionary | |
| assessment_dict = assessment.model_dump() | |
| 2. Write to JSON file with indent=2 | |
| Default filename: "assessment.json" | |
| 3. Contains: | |
| - All learning objectives (best-in-group) | |
| - All ranked questions | |
| - Complete metadata | |
| ``` | |
| --- | |
| ## Detailed Component Functionality | |
| ### Content Processor (`ui/content_processor.py`) | |
| **Class: `ContentProcessor`** | |
| **Methods:** | |
| 1. **`process_files(file_paths: List[str]) -> List[str]`** | |
| - Main entry point for processing multiple files | |
| - Returns list of XML-tagged content strings | |
| - Stores results in `self.file_contents` | |
| 2. **`process_file(file_path: str) -> List[str]`** | |
| - Routes to appropriate handler based on file extension | |
| - Returns single-item list with tagged content | |
| 3. **`_process_subtitle_file(file_path: str) -> List[str]`** | |
| - Filters out timestamps and metadata | |
| - Preserves actual subtitle text | |
| - Wraps in `<source file='...'>` tags | |
| 4. **`_process_notebook_file(file_path: str) -> List[str]`** | |
| - Validates JSON structure | |
| - Parses with nbformat | |
| - Extracts markdown and code cells | |
| - Falls back to raw text on parsing errors | |
| - Wraps in `<source file='...'>` tags | |
| ### Learning Objective Generator (`learning_objective_generator/`) | |
| #### **generator.py - LearningObjectiveGenerator Class** | |
| **Orchestrator that delegates to specialized modules:** | |
| **Methods:** | |
| 1. **`generate_base_learning_objectives()`** | |
| - Delegates to `base_generation.py` | |
| - Returns base objectives with correct answers | |
| 2. **`group_base_learning_objectives()`** | |
| - Delegates to `grouping_and_ranking.py` | |
| - Groups similar objectives | |
| - Identifies best in each group | |
| 3. **`generate_incorrect_answer_options()`** | |
| - Delegates to `enhancement.py` | |
| - Adds 5 incorrect answer suggestions per objective | |
| 4. **`regenerate_incorrect_answers()`** | |
| - Delegates to `suggestion_improvement.py` | |
| - Quality-checks and improves incorrect answers | |
| 5. **`generate_and_group_learning_objectives()`** | |
| - Complete workflow method | |
| - Combines: base generation → grouping → incorrect answers | |
| - Returns dict with all_grouped and best_in_group | |
| #### **base_generation.py** | |
| **Key Functions:** | |
| **`generate_base_learning_objectives()`** | |
| - Wrapper that calls two separate functions | |
| - First: Generate objectives without correct answers | |
| - Second: Generate correct answers for those objectives | |
| **`generate_base_learning_objectives_without_correct_answers()`** | |
| **Process:** | |
| ```python | |
| 1. Extract source filenames from XML tags | |
| 2. Combine all file contents | |
| 3. Create prompt with: | |
| - BASE_LEARNING_OBJECTIVES_PROMPT | |
| - BLOOMS_TAXONOMY_LEVELS | |
| - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS | |
| - Course content | |
| 4. API call: | |
| - Model: User-selected | |
| - Temperature: User-selected (if supported) | |
| - Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse | |
| 5. Post-process: | |
| - Assign sequential IDs | |
| - Normalize source_reference (extract basenames) | |
| 6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer] | |
| ``` | |
| **`generate_correct_answers_for_objectives()`** | |
| **Process:** | |
| ```python | |
| 1. For each objective without answer: | |
| - Create prompt with objective + course content | |
| - Call OpenAI API (text response, not structured) | |
| - Extract correct answer | |
| - Create BaseLearningObjective with answer | |
| 2. Error handling: Add "[Error generating correct answer]" on failure | |
| 3. Returns: List[BaseLearningObjective] | |
| ``` | |
| **Quality Guidelines in Prompt:** | |
| - Objectives must be assessable via multiple-choice | |
| - Start with action verbs (identify, describe, define, list, compare) | |
| - One goal per objective | |
| - Derived directly from course content | |
| - Tool/framework agnostic (focus on principles, not specific implementations) | |
| - First objective should be relatively easy recall question | |
| - Avoid objectives about "building" or "creating" (not MC-assessable) | |
| #### **grouping_and_ranking.py** | |
| **Key Functions:** | |
| **`group_base_learning_objectives()`** | |
| **Process:** | |
| ```python | |
| 1. Format objectives for display in prompt | |
| 2. Create grouping prompt with: | |
| - Original generation criteria | |
| - All base objectives | |
| - Course content | |
| - Grouping instructions | |
| 3. Special rule: | |
| - All objectives with IDs ending in 1 grouped together | |
| - Best one selected from this group | |
| - Will become primary objective (ID=1) | |
| 4. API call: | |
| - Model: "gpt-5-mini" (hardcoded for efficiency) | |
| - Response format: GroupedBaseLearningObjectivesResponse | |
| 5. Post-process: | |
| - Normalize best_in_group to Python bool | |
| - Filter for best-in-group objectives | |
| 6. Returns: | |
| { | |
| "all_grouped": List[GroupedBaseLearningObjective], | |
| "best_in_group": List[GroupedBaseLearningObjective] | |
| } | |
| ``` | |
| **Grouping Criteria:** | |
| - Topic overlap | |
| - Similarity of concepts | |
| - Quality based on original generation criteria | |
| - Clarity and specificity | |
| - Alignment with course content | |
| #### **enhancement.py** | |
| **Key Function: `generate_incorrect_answer_options()`** | |
| **Process:** | |
| ```python | |
| 1. For each base objective: | |
| - Create prompt with: | |
| - Learning objective and correct answer | |
| - INCORRECT_ANSWER_PROMPT (detailed guidelines) | |
| - INCORRECT_ANSWER_EXAMPLES | |
| - Course content | |
| - Request 5 plausible incorrect options | |
| 2. API call: | |
| - Model: model_override or default | |
| - Temperature: User-selected (if supported) | |
| - Response format: LearningObjective (includes incorrect_answer_options) | |
| 3. Returns: List[LearningObjective] with all fields populated | |
| ``` | |
| **Incorrect Answer Quality Principles:** | |
| - Create common misunderstandings | |
| - Maintain identical structure to correct answer | |
| - Use course terminology correctly but in wrong contexts | |
| - Include partially correct information | |
| - Avoid obviously wrong answers | |
| - Mirror detail level and style of correct answer | |
| - Avoid absolute terms ("always", "never", "exclusively") | |
| - Avoid contradictory second clauses | |
| #### **suggestion_improvement.py** | |
| **Key Function: `regenerate_incorrect_answers()`** | |
| **Process:** | |
| ```python | |
| 1. For each learning objective: | |
| - Call should_regenerate_incorrect_answers() | |
| 2. should_regenerate_incorrect_answers(): | |
| - Creates evaluation prompt with: | |
| - Objective and all incorrect options | |
| - IMMEDIATE_RED_FLAGS checklist | |
| - RULES_FOR_SECOND_CLAUSES | |
| - LLM evaluates each option | |
| - Returns: needs_regeneration: bool | |
| 3. If regeneration needed: | |
| - Logs to incorrect_suggestion_debug/{id}.txt | |
| - Creates new prompt with additional constraints | |
| - Regenerates incorrect answers | |
| - Validates again | |
| 4. Returns: List[LearningObjective] with improved incorrect answers | |
| ``` | |
| **Red Flags Checked:** | |
| - Contradictory second clauses ("but not necessarily") | |
| - Explicit negations ("without automating") | |
| - Opposite descriptions ("fixed steps" for flexible systems) | |
| - Absolute/comparative terms | |
| - Hedging that creates limitations | |
| - Trade-off language creating false dichotomies | |
| ### Quiz Generator (`quiz_generator/`) | |
| #### **generator.py - QuizGenerator Class** | |
| **Orchestrator with LearningObjectiveGenerator embedded:** | |
| **Initialization:** | |
| ```python | |
| def __init__(self, api_key, model="gpt-5", temperature=1.0): | |
| self.client = OpenAI(api_key=api_key) | |
| self.model = model | |
| self.temperature = temperature | |
| self.learning_objective_generator = LearningObjectiveGenerator( | |
| api_key=api_key, model=model, temperature=temperature | |
| ) | |
| ``` | |
| **Methods (delegates to specialized modules):** | |
| 1. **`generate_base_learning_objectives()`** → delegates to LearningObjectiveGenerator | |
| 2. **`generate_lo_incorrect_answer_options()`** → delegates to LearningObjectiveGenerator | |
| 3. **`group_base_learning_objectives()`** → delegates to grouping_and_ranking.py | |
| 4. **`generate_multiple_choice_question()`** → delegates to question_generation.py | |
| 5. **`generate_questions_in_parallel()`** → delegates to assessment.py | |
| 6. **`group_questions()`** → delegates to question_ranking.py | |
| 7. **`rank_questions()`** → delegates to question_ranking.py | |
| 8. **`judge_question_quality()`** → delegates to question_improvement.py | |
| 9. **`regenerate_incorrect_answers()`** → delegates to question_improvement.py | |
| 10. **`generate_multiple_choice_question_from_feedback()`** → delegates to feedback_questions.py | |
| 11. **`save_assessment_to_json()`** → delegates to assessment.py | |
| #### **question_generation.py** | |
| **Key Function: `generate_multiple_choice_question()`** | |
| **Detailed Process:** | |
| **1. Source Content Matching:** | |
| ```python | |
| source_references = learning_objective.source_reference | |
| if isinstance(source_references, str): | |
| source_references = [source_references] | |
| combined_content = "" | |
| for source_file in source_references: | |
| # Try exact match: <source file='filename'> | |
| for file_content in file_contents: | |
| if f"<source file='{source_file}'>" in file_content: | |
| combined_content += file_content | |
| break | |
| # Fallback: partial match | |
| if not found: | |
| for file_content in file_contents: | |
| if source_file in file_content: | |
| combined_content += file_content | |
| break | |
| # Last resort: use all content | |
| if not combined_content: | |
| combined_content = "\n\n".join(file_contents) | |
| ``` | |
| **2. Multi-Source Instruction:** | |
| ```python | |
| if len(source_references) > 1: | |
| Add special instruction: | |
| "This learning objective spans multiple sources. | |
| Your question should: | |
| 1. Synthesize information across these sources | |
| 2. Test understanding of overarching themes | |
| 3. Require knowledge from multiple sources" | |
| ``` | |
| **3. Prompt Construction:** | |
| Combines extensive quality standards: | |
| ```python | |
| - Learning objective | |
| - Correct answer | |
| - Incorrect answer options from objective | |
| - GENERAL_QUALITY_STANDARDS | |
| - MULTIPLE_CHOICE_STANDARDS | |
| - EXAMPLE_QUESTIONS | |
| - QUESTION_SPECIFIC_QUALITY_STANDARDS | |
| - CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS | |
| - INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION | |
| - ANSWER_FEEDBACK_QUALITY_STANDARDS | |
| - Multi-source instruction (if applicable) | |
| - Matched course content | |
| ``` | |
| **4. API Call:** | |
| ```python | |
| params = { | |
| "model": model, | |
| "messages": [ | |
| {"role": "system", "content": "Expert educational assessment creator"}, | |
| {"role": "user", "content": prompt} | |
| ], | |
| "response_format": MultipleChoiceQuestion | |
| } | |
| if not TEMPERATURE_UNAVAILABLE.get(model, True): | |
| params["temperature"] = temperature | |
| response = client.beta.chat.completions.parse(**params) | |
| ``` | |
| **5. Post-Processing:** | |
| ```python | |
| - Set response.id = learning_objective.id | |
| - Set response.learning_objective_id = learning_objective.id | |
| - Set response.learning_objective = learning_objective.learning_objective | |
| - Set response.source_reference = learning_objective.source_reference | |
| - Verify all options have feedback | |
| - Add default feedback if missing | |
| ``` | |
| **6. Error Handling:** | |
| ```python | |
| On exception: | |
| - Create fallback question with 4 generic options | |
| - Include error message in question_text | |
| - Mark as questionable quality | |
| ``` | |
| #### **question_ranking.py** | |
| **Key Functions:** | |
| **`group_questions(questions, file_contents)`** | |
| **Process:** | |
| ```python | |
| 1. Create prompt with: | |
| - GROUP_QUESTIONS_PROMPT | |
| - All questions with complete data | |
| - Grouping instructions | |
| 2. Grouping Logic: | |
| - Questions with same learning_objective_id are similar | |
| - Group by topic overlap | |
| - Mark best_in_group within each group | |
| - Single-member groups: best_in_group = true by default | |
| 3. API call: | |
| - Model: User-selected | |
| - Response format: GroupedMultipleChoiceQuestionsResponse | |
| 4. Critical Instructions: | |
| - MUST return ALL questions | |
| - Each question must have group metadata | |
| - best_in_group set appropriately | |
| 5. Returns: | |
| { | |
| "grouped": List[GroupedMultipleChoiceQuestion], | |
| "best_in_group": [questions where best_in_group=true] | |
| } | |
| ``` | |
| **`rank_questions(questions, file_contents)`** | |
| **Process:** | |
| ```python | |
| 1. Create prompt with: | |
| - RANK_QUESTIONS_PROMPT | |
| - ALL quality standards (comprehensive) | |
| - Best-in-group questions only | |
| - Course content | |
| 2. Ranking Criteria (from prompt): | |
| - Question clarity and unambiguity | |
| - Alignment with learning objective | |
| - Quality of incorrect options | |
| - Feedback quality | |
| - Appropriate difficulty (simple English preferred) | |
| - Adherence to all guidelines | |
| - Avoidance of problematic words/phrases | |
| 3. Special Instructions: | |
| - DO NOT change question with ID=1 | |
| - Rank starting from 2 (rank 1 reserved) | |
| - Each question gets unique rank | |
| - Must return ALL questions | |
| 4. API call: | |
| - Model: User-selected | |
| - Response format: RankedMultipleChoiceQuestionsResponse | |
| 5. Returns: | |
| { | |
| "ranked": List[RankedMultipleChoiceQuestion] | |
| (includes rank and ranking_reasoning for each) | |
| } | |
| ``` | |
| **Simple vs Complex English Examples (from ranking criteria):** | |
| ``` | |
| Simple: "AI engineers create computer programs that can learn from data" | |
| Complex: "AI engineering practitioners architect computational paradigms | |
| exhibiting autonomous erudition capabilities" | |
| ``` | |
| #### **question_improvement.py** | |
| **Key Functions:** | |
| **`judge_question_quality(client, model, temperature, question)`** | |
| **Process:** | |
| ```python | |
| 1. Create evaluation prompt with: | |
| - Question text | |
| - All options with feedback | |
| - Quality criteria | |
| - Evaluation instructions | |
| 2. LLM evaluates: | |
| - Clarity and lack of ambiguity | |
| - Alignment with learning objective | |
| - Quality of distractors (incorrect options) | |
| - Feedback quality and helpfulness | |
| - Appropriate difficulty level | |
| - Adherence to all standards | |
| 3. API call: | |
| - Unstructured text response | |
| - LLM returns: APPROVED or NOT APPROVED + reasoning | |
| 4. Parsing: | |
| approved = "APPROVED" in response.upper() | |
| feedback = full response text | |
| 5. Returns: (approved: bool, feedback: str) | |
| ``` | |
| **`should_regenerate_incorrect_answers(client, question, file_contents, model_name)`** | |
| **Process:** | |
| ```python | |
| 1. Extract incorrect options from question | |
| 2. Create evaluation prompt with: | |
| - Each incorrect option | |
| - IMMEDIATE_RED_FLAGS checklist | |
| - Course content for context | |
| 3. LLM checks each option for: | |
| - Contradictory second clauses | |
| - Explicit negations | |
| - Absolute terms | |
| - Opposite descriptions | |
| - Trade-off language | |
| 4. Returns: needs_regeneration: bool | |
| 5. If true: | |
| - Log to wrong_answer_debug/ directory | |
| - Provides detailed feedback on issues | |
| ``` | |
| **`regenerate_incorrect_answers(client, model, temperature, questions, file_contents)`** | |
| **Process:** | |
| ```python | |
| 1. For each question: | |
| - Check if regeneration needed | |
| - If yes: | |
| a. Create new prompt with stricter constraints | |
| b. Include original question for context | |
| c. Add specific rules about avoiding red flags | |
| d. Regenerate options | |
| e. Validate again | |
| - If no: keep original | |
| 2. Returns: List of questions with improved incorrect answers | |
| ``` | |
| #### **feedback_questions.py** | |
| **Key Function: `generate_multiple_choice_question_from_feedback()`** | |
| **Process:** | |
| ```python | |
| 1. Accept user feedback/guidance as free-form text | |
| 2. Create prompt combining: | |
| - User feedback | |
| - All quality standards | |
| - Course content | |
| - Standard generation criteria | |
| 3. LLM infers: | |
| - Learning objective from feedback | |
| - Appropriate question | |
| - 4 options with feedback | |
| - Source references | |
| 4. API call: | |
| - Model: User-selected | |
| - Response format: MultipleChoiceQuestionFromFeedback | |
| 5. Includes user feedback as metadata in response | |
| 6. Returns: Single question object | |
| ``` | |
| #### **assessment.py** | |
| **Key Functions:** | |
| **`generate_questions_in_parallel()`** | |
| **Parallel Processing Details:** | |
| ```python | |
| 1. Setup: | |
| max_workers = min(len(learning_objectives), 5) | |
| # Limits to 5 concurrent threads | |
| 2. Thread Pool Executor: | |
| with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor: | |
| 3. For each objective (in separate thread): | |
| Worker function: | |
| def generate_question_for_objective(objective, idx): | |
| - Generate question | |
| - Judge quality | |
| - Update with approval and feedback | |
| - Handle errors gracefully | |
| - Return complete question | |
| 4. Submit all tasks: | |
| future_to_idx = { | |
| executor.submit(generate_question_for_objective, obj, i): i | |
| for i, obj in enumerate(learning_objectives) | |
| } | |
| 5. Collect results as completed: | |
| for future in concurrent.futures.as_completed(future_to_idx): | |
| question = future.result() | |
| questions.append(question) | |
| print progress | |
| 6. Error handling: | |
| - Individual failures don't stop other threads | |
| - Placeholder questions created on error | |
| - All errors logged | |
| 7. Returns: List[MultipleChoiceQuestion] with quality judgments | |
| ``` | |
| **`save_assessment_to_json(assessment, output_path)`** | |
| ```python | |
| 1. Convert Pydantic model to dict: | |
| assessment_dict = assessment.model_dump() | |
| 2. Write to JSON file: | |
| with open(output_path, "w") as f: | |
| json.dump(assessment_dict, f, indent=2) | |
| 3. File contains: | |
| { | |
| "learning_objectives": [...], | |
| "questions": [...] | |
| } | |
| ``` | |
| ### State Management (`ui/state.py`) | |
| **Global State Variables:** | |
| ```python | |
| processed_file_contents = [] # List of XML-tagged content strings | |
| generated_learning_objectives = [] # List of learning objective objects | |
| ``` | |
| **Functions:** | |
| - `get_processed_contents()` → retrieves file contents | |
| - `set_processed_contents(contents)` → stores file contents | |
| - `get_learning_objectives()` → retrieves objectives | |
| - `set_learning_objectives(objectives)` → stores objectives | |
| - `clear_state()` → resets both variables | |
| **Purpose:** | |
| - Persists data between UI tabs | |
| - Allows Tab 2 to access content processed in Tab 1 | |
| - Allows Tab 3 to access content for custom questions | |
| - Enables regeneration with feedback | |
| ### UI Handlers | |
| #### **objective_handlers.py** | |
| **`process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)`** | |
| **Complete Workflow:** | |
| ```python | |
| 1. Validate inputs (files exist, API key present) | |
| 2. Extract file paths from Gradio file objects | |
| 3. Process files → get XML-tagged content | |
| 4. Store in state | |
| 5. Create QuizGenerator | |
| 6. Generate multiple runs of base objectives | |
| 7. Group and rank objectives | |
| 8. Generate incorrect answers for best-in-group | |
| 9. Improve incorrect answers | |
| 10. Reassign IDs (best from 001 group → ID=1) | |
| 11. Format results for display | |
| 12. Store in state | |
| 13. Return 4 outputs: status, best-in-group, all-grouped, raw | |
| ``` | |
| **`regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)`** | |
| **Workflow:** | |
| ```python | |
| 1. Retrieve processed contents from state | |
| 2. Append feedback to content: | |
| file_contents_with_feedback.append(f"FEEDBACK: {feedback}") | |
| 3. Generate new objectives with feedback context | |
| 4. Group and rank | |
| 5. Return regenerated objectives | |
| ``` | |
| **`_reassign_objective_ids(grouped_objectives)`** | |
| **ID Assignment Logic:** | |
| ```python | |
| 1. Find all objectives with IDs ending in 001 (1001, 2001, etc.) | |
| 2. Identify their groups | |
| 3. Find best_in_group objective from these groups | |
| 4. Assign it ID = 1 | |
| 5. Assign all other objectives sequential IDs starting from 2 | |
| ``` | |
| **`_format_objective_results(grouped_result, all_learning_objectives)`** | |
| **Formatting:** | |
| ```python | |
| 1. Sort by ID | |
| 2. Create dictionaries from Pydantic objects | |
| 3. Include all metadata fields | |
| 4. Convert to JSON with indent=2 | |
| 5. Return 3 formatted outputs + status message | |
| ``` | |
| #### **question_handlers.py** | |
| **`generate_questions(objectives_json, model_name, temperature, num_runs)`** | |
| **Complete Workflow:** | |
| ```python | |
| 1. Validate inputs | |
| 2. Parse objectives JSON → create LearningObjective objects | |
| 3. Retrieve processed contents from state | |
| 4. Create QuizGenerator | |
| 5. Generate questions (multiple runs in parallel) | |
| 6. Group questions by similarity | |
| 7. Rank best-in-group questions | |
| 8. Optionally improve incorrect answers (currently commented out) | |
| 9. Format results | |
| 10. Return 4 outputs: status, best-ranked, all-grouped, formatted | |
| ``` | |
| **`_generate_questions_multiple_runs()`** | |
| ```python | |
| For each run: | |
| 1. Call generate_questions_in_parallel() | |
| 2. Assign unique IDs across runs: | |
| start_id = len(all_questions) + 1 | |
| for i, q in enumerate(run_questions): | |
| q.id = start_id + i | |
| 3. Aggregate all questions | |
| ``` | |
| **`_group_and_rank_questions()`** | |
| ```python | |
| 1. Group all questions → get grouped and best_in_group | |
| 2. Rank only best_in_group questions | |
| 3. Return: | |
| { | |
| "grouped": all with group metadata, | |
| "best_in_group_ranked": best with ranks | |
| } | |
| ``` | |
| #### **feedback_handlers.py** | |
| **`propose_question_handler(guidance, model_name, temperature)`** | |
| **Workflow:** | |
| ```python | |
| 1. Validate state (processed contents available) | |
| 2. Create QuizGenerator | |
| 3. Call generate_multiple_choice_question_from_feedback() | |
| - Passes user guidance and course content | |
| - LLM infers learning objective | |
| - Generates complete question | |
| 4. Format as JSON | |
| 5. Return status and question JSON | |
| ``` | |
| ### Formatting Utilities (`ui/formatting.py`) | |
| **`format_quiz_for_ui(questions_json)`** | |
| **Process:** | |
| ```python | |
| 1. Parse JSON to list of question dictionaries | |
| 2. Sort by rank if available | |
| 3. For each question: | |
| - Add header: "**Question N [Rank: X]:** {question_text}" | |
| - Add ranking reasoning if available | |
| - For each option: | |
| - Add letter (A, B, C, D) | |
| - Mark correct option | |
| - Include option text | |
| - Include feedback indented | |
| 4. Return formatted string with markdown | |
| ``` | |
| **Output Example:** | |
| ``` | |
| **Question 1 [Rank: 2]:** What is the primary purpose of AI agents? | |
| Ranking Reasoning: Clear question that tests fundamental understanding... | |
| • A [Correct]: To automate tasks and make decisions | |
| ◦ Feedback: Correct! AI agents are designed to automate tasks... | |
| • B: To replace human workers entirely | |
| ◦ Feedback: While AI agents can automate tasks, they are not... | |
| [continues...] | |
| ``` | |
| --- | |
| ## Quality Standards and Prompts | |
| ### Learning Objectives Quality Standards | |
| **From `prompts/learning_objectives.py`:** | |
| **BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:** | |
| 1. **Assessability:** | |
| - Must be testable via multiple-choice questions | |
| - Cannot be about "building", "creating", "developing" | |
| - Should use verbs like: identify, list, describe, define, compare | |
| 2. **Specificity:** | |
| - One goal per objective | |
| - Don't combine multiple action verbs | |
| - Example of what NOT to do: "identify X and explain Y" | |
| 3. **Source Alignment:** | |
| - Derived DIRECTLY from course content | |
| - No topics not covered in content | |
| - Appropriate difficulty level for course | |
| 4. **Independence:** | |
| - Each objective stands alone | |
| - No dependencies on other objectives | |
| - No context required from other objectives | |
| 5. **Focus:** | |
| - Address "why" over "what" when possible | |
| - Critical knowledge over trivial facts | |
| - Principles over specific implementation details | |
| 6. **Tool/Framework Agnosticism:** | |
| - Don't mention specific tools/frameworks | |
| - Focus on underlying principles | |
| - Example: Don't ask about "Pandas DataFrame methods", | |
| ask about "data filtering concepts" | |
| 7. **First Objective Rule:** | |
| - Should be relatively easy recall question | |
| - Address main topic/concept of course | |
| - Format: "Identify what X is" or "Explain why X is important" | |
| 8. **Answer Length:** | |
| - Aim for ≤20 words in correct answer | |
| - Avoid unnecessary elaboration | |
| - No compound sentences with extra consequences | |
| **BLOOMS_TAXONOMY_LEVELS:** | |
| Levels from lowest to highest: | |
| - **Recall:** Retention of key concepts (not trivialities) | |
| - **Comprehension:** Connect ideas, demonstrate understanding | |
| - **Application:** Apply concept to new but similar scenario | |
| - **Analysis:** Examine parts, determine relationships, make inferences | |
| - **Evaluation:** Make judgments requiring critical thinking | |
| **LEARNING_OBJECTIVE_EXAMPLES:** | |
| Includes 7 high-quality examples with: | |
| - Appropriate action verbs | |
| - Clear learning objectives | |
| - Concise correct answers (mostly <20 words) | |
| - Multiple source references | |
| - Framework-agnostic language | |
| ### Question Quality Standards | |
| **From `prompts/questions.py`:** | |
| **GENERAL_QUALITY_STANDARDS:** | |
| - Overall goal: Set learner up for success | |
| - Perfect score attainable for thoughtful students | |
| - Aligned with course content | |
| - Aligned with learning objective and correct answer | |
| - No references to manual intervention (software/AI course) | |
| **MULTIPLE_CHOICE_STANDARDS:** | |
| - **EXACTLY ONE** correct answer per question | |
| - Clear, unambiguous correct answer | |
| - Plausible distractors representing common misconceptions | |
| - Not obviously wrong distractors | |
| - All options similar length and detail | |
| - Mutually exclusive options | |
| - Avoid "all/none of the above" | |
| - Typically 4 options (A, B, C, D) | |
| - Don't start feedback with "Correct" or "Incorrect" | |
| **QUESTION_SPECIFIC_QUALITY_STANDARDS:** | |
| Questions must: | |
| - Match language and tone of course | |
| - Match difficulty level of course | |
| - Assess only course information | |
| - Not teach as part of quiz | |
| - Use clear, concise language | |
| - Not induce confusion | |
| - Provide slight (not major) challenge | |
| - Be easily interpreted and unambiguous | |
| - Have proper grammar and sentence structure | |
| - Be thoughtful and specific (not broad and ambiguous) | |
| - Be complete in wording (understanding question shouldn't be part of assessment) | |
| **CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:** | |
| Correct answers must: | |
| - Be factually correct and unambiguous | |
| - Match course language and tone | |
| - Be complete sentences | |
| - Match course difficulty level | |
| - Contain only course information | |
| - Not teach during quiz | |
| - Use clear, concise language | |
| - Be thoughtful and specific | |
| - Be complete (identifying correct answer shouldn't require interpretation) | |
| **INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:** | |
| Incorrect answers should: | |
| - Represent reasonable potential misconceptions | |
| - Sound plausible to non-experts | |
| - Require thought even from diligent learners | |
| - Not be obviously wrong | |
| - Use incorrect_answer_suggestions from objective (as starting point) | |
| **Avoid:** | |
| - Obviously wrong options anyone can eliminate | |
| - Absolute terms: "always", "never", "only", "exclusively" | |
| - Phrases like "used exclusively for scenarios where..." | |
| **ANSWER_FEEDBACK_QUALITY_STANDARDS:** | |
| **For Incorrect Answers:** | |
| - Be informational and encouraging (not punitive) | |
| - Single sentence, concise | |
| - Do NOT say "Incorrect" or "Wrong" | |
| **For Correct Answers:** | |
| - Be informational and encouraging | |
| - Single sentence, concise | |
| - Do NOT say "Correct!" (redundant after "Correct: " prefix) | |
| ### Incorrect Answer Generation Guidelines | |
| **From `prompts/incorrect_answers.py`:** | |
| **Core Principles:** | |
| 1. **Create Common Misunderstandings:** | |
| - Represent how students actually misunderstand | |
| - Confuse related concepts | |
| - Mix up terminology | |
| 2. **Maintain Identical Structure:** | |
| - Match grammatical pattern of correct answer | |
| - Same length and complexity | |
| - Same formatting style | |
| 3. **Use Course Terminology Correctly but in Wrong Contexts:** | |
| - Apply correct terms incorrectly | |
| - Confuse with related concepts | |
| - Example: Describe backpropagation but actually describe forward propagation | |
| 4. **Include Partially Correct Information:** | |
| - First part correct, second part wrong | |
| - Correct process but wrong application | |
| - Correct concept but incomplete | |
| 5. **Avoid Obviously Wrong Answers:** | |
| - No contradictions with basic knowledge | |
| - Not immediately eliminable | |
| - Require course knowledge to reject | |
| 6. **Mirror Detail Level and Style:** | |
| - Match technical depth | |
| - Match tone | |
| - Same level of specificity | |
| 7. **For Lists, Maintain Consistency:** | |
| - Same number of items | |
| - Same format | |
| - Mix some correct with incorrect items | |
| 8. **AVOID ABSOLUTE TERMS:** | |
| - "always", "never", "exclusively", "primarily" | |
| - "all", "every", "none", "nothing", "only" | |
| - "must", "required", "impossible" | |
| - "rather than", "as opposed to", "instead of" | |
| **IMMEDIATE_RED_FLAGS** (triggers regeneration): | |
| **Contradictory Second Clauses:** | |
| - "but not necessarily" | |
| - "at the expense of" | |
| - "rather than [core concept]" | |
| - "ensuring X rather than Y" | |
| - "without necessarily" | |
| - "but has no impact on" | |
| - "but cannot", "but prevents", "but limits" | |
| **Explicit Negations:** | |
| - "without automating", "without incorporating" | |
| - "preventing [main benefit]" | |
| - "limiting [main capability]" | |
| **Opposite Descriptions:** | |
| - "fixed steps" (for flexible systems) | |
| - "manual intervention" (for automation) | |
| - "simple question answering" (for complex processing) | |
| **Hedging Creating Limitations:** | |
| - "sometimes", "occasionally", "might" | |
| - "to some extent", "partially", "somewhat" | |
| **INCORRECT_ANSWER_EXAMPLES:** | |
| Includes 10 detailed examples showing: | |
| - Learning objective | |
| - Correct answer | |
| - 3 plausible incorrect suggestions | |
| - Explanation of why each is plausible but wrong | |
| - Consistent formatting across all options | |
| ### Ranking and Grouping | |
| **RANK_QUESTIONS_PROMPT:** | |
| **Criteria:** | |
| 1. Question clarity and unambiguity | |
| 2. Alignment with learning objective | |
| 3. Quality of incorrect options | |
| 4. Quality of feedback | |
| 5. Appropriate difficulty (simple English preferred) | |
| 6. Adherence to all guidelines | |
| **Critical Instructions:** | |
| - DO NOT change question with ID=1 | |
| - Rank starting from 2 | |
| - Each question unique rank | |
| - Must return ALL questions | |
| - No omissions | |
| - No duplicate ranks | |
| **Simple vs Complex English:** | |
| ``` | |
| Simple: "AI engineers create computer programs that learn from data" | |
| Complex: "AI engineering practitioners architect computational paradigms | |
| exhibiting autonomous erudition capabilities" | |
| ``` | |
| **GROUP_QUESTIONS_PROMPT:** | |
| **Grouping Logic:** | |
| - Questions with same learning_objective_id are similar | |
| - Identify topic overlap | |
| - Mark best_in_group within each group | |
| - Single-member groups: best_in_group = true | |
| **Critical Instructions:** | |
| - Must return ALL questions | |
| - Each question needs group metadata | |
| - No omissions | |
| - Best in group marked appropriately | |
| --- | |
| ## Summary of Data Flow | |
| ### Complete End-to-End Flow | |
| ``` | |
| User Uploads Files | |
| ↓ | |
| ContentProcessor extracts and tags content | |
| ↓ | |
| [Stored in global state] | |
| ↓ | |
| Generate Base Objectives (multiple runs) | |
| ↓ | |
| Group Base Objectives (by similarity) | |
| ↓ | |
| Generate Incorrect Answers (for best-in-group only) | |
| ↓ | |
| Improve Incorrect Answers (quality check) | |
| ↓ | |
| Reassign IDs (best from 001 group → ID=1) | |
| ↓ | |
| [Objectives displayed in UI, stored in state] | |
| ↓ | |
| Generate Questions (parallel, multiple runs) | |
| ↓ | |
| Judge Question Quality (parallel) | |
| ↓ | |
| Group Questions (by similarity) | |
| ↓ | |
| Rank Questions (best-in-group only) | |
| ↓ | |
| [Questions displayed in UI] | |
| ↓ | |
| Format for Display | |
| ↓ | |
| Export to JSON (optional) | |
| ``` | |
| ### Key Optimization Strategies | |
| 1. **Multiple Generation Runs:** | |
| - Generates variety of objectives/questions | |
| - Grouping identifies best versions | |
| - Reduces risk of poor quality individual outputs | |
| 2. **Hierarchical Processing:** | |
| - Generate base → Group → Enhance → Improve | |
| - Only enhances best candidates (saves API calls) | |
| - Progressive refinement | |
| 3. **Parallel Processing:** | |
| - Questions generated concurrently (up to 5 threads) | |
| - Significant time savings for multiple objectives | |
| - Independent evaluations | |
| 4. **Quality Gating:** | |
| - LLM judges question quality | |
| - Checks for red flags in incorrect answers | |
| - Regenerates problematic content | |
| 5. **Source Tracking:** | |
| - XML tags preserve origin | |
| - Questions link back to source materials | |
| - Enables accurate content matching | |
| 6. **Modular Prompts:** | |
| - Reusable quality standards | |
| - Consistent across all generations | |
| - Easy to update centrally | |
| --- | |
| ## Configuration and Customization | |
| ### Available Models | |
| **Configured in `models/config.py`:** | |
| ```python | |
| MODELS = [ | |
| "o3-mini", "o1", # Reasoning models (no temperature) | |
| "gpt-4.1", "gpt-4o", # GPT-4 variants | |
| "gpt-4o-mini", "gpt-4", | |
| "gpt-3.5-turbo", # Legacy | |
| "gpt-5", # Latest (no temperature) | |
| "gpt-5-mini", # Efficient (no temperature) | |
| "gpt-5-nano" # Ultra-efficient (no temperature) | |
| ] | |
| ``` | |
| **Temperature Support:** | |
| - Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature | |
| - Other models: Temperature 0.0 to 1.0 | |
| **Model Selection Strategy:** | |
| - **Base objectives:** User-selected (default: gpt-5) | |
| - **Grouping:** Hardcoded gpt-5-mini (efficiency) | |
| - **Incorrect answers:** Separate user selection (default: gpt-5) | |
| - **Questions:** User-selected (default: gpt-5) | |
| - **Quality judging:** User-selected or gpt-5-mini | |
| ### Environment Variables | |
| **Required:** | |
| ``` | |
| OPENAI_API_KEY=your_api_key_here | |
| ``` | |
| **Configured via `.env` file in project root** | |
| ### Customization Points | |
| 1. **Quality Standards:** | |
| - Edit `prompts/learning_objectives.py` | |
| - Edit `prompts/questions.py` | |
| - Edit `prompts/incorrect_answers.py` | |
| - Changes apply to all future generations | |
| 2. **Example Questions/Objectives:** | |
| - Modify LEARNING_OBJECTIVE_EXAMPLES | |
| - Modify EXAMPLE_QUESTIONS | |
| - Modify INCORRECT_ANSWER_EXAMPLES | |
| - LLM learns from these examples | |
| 3. **Generation Parameters:** | |
| - Number of objectives per run | |
| - Number of runs (variety) | |
| - Temperature (creativity vs consistency) | |
| - Model selection (quality vs cost/speed) | |
| 4. **Parallel Processing:** | |
| - `max_workers` in assessment.py | |
| - Currently: min(len(objectives), 5) | |
| - Adjust for your rate limits | |
| 5. **Output Formats:** | |
| - Modify `formatting.py` for display | |
| - Assessment JSON structure in `models/assessment.py` | |
| --- | |
| ## Error Handling and Resilience | |
| ### Content Processing Errors | |
| - **Invalid JSON notebooks:** Falls back to raw text | |
| - **Parse failures:** Wraps in code blocks, continues | |
| - **Missing files:** Logged, skipped | |
| - **Encoding issues:** UTF-8 fallback | |
| ### Generation Errors | |
| - **API failures:** Logged with traceback | |
| - **Structured output parse errors:** Fallback responses created | |
| - **Missing required fields:** Default values assigned | |
| - **Validation errors:** Caught and logged | |
| ### Parallel Processing Errors | |
| - **Individual thread failures:** Don't stop other threads | |
| - **Placeholder questions:** Created on error | |
| - **Complete error details:** Logged for debugging | |
| - **Graceful degradation:** Partial results returned | |
| ### Quality Check Failures | |
| - **Regeneration failures:** Original kept with warning | |
| - **Judge unavailable:** Questions marked unapproved | |
| - **Validation failures:** Detailed logs in debug directories | |
| --- | |
| ## Debug and Logging | |
| ### Debug Directories | |
| 1. **`incorrect_suggestion_debug/`** | |
| - Created during objective enhancement | |
| - Contains logs of problematic incorrect answers | |
| - Format: `{objective_id}.txt` | |
| - Includes: Original suggestions, identified issues, regeneration attempts | |
| 2. **`wrong_answer_debug/`** | |
| - Created during question improvement | |
| - Logs question-level incorrect answer issues | |
| - Regeneration history | |
| ### Console Logging | |
| **Extensive logging throughout:** | |
| - File processing status | |
| - Generation progress (run numbers) | |
| - Parallel thread activity (thread IDs) | |
| - API call results | |
| - Error messages with tracebacks | |
| - Timing information (start/end times) | |
| **Example Log Output:** | |
| ``` | |
| DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt'] | |
| DEBUG - Found source file: file1.vtt | |
| Generating 3 learning objectives from 3 files | |
| Successfully generated 3 learning objectives without correct answers | |
| Generated correct answer for objective 1 | |
| Grouping 9 base learning objectives | |
| Received 9 grouped results | |
| Generating incorrect answer options only for best-in-group objectives... | |
| PARALLEL: Starting ThreadPoolExecutor with 3 workers | |
| PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective... | |
| Question generation completed in 45.23 seconds | |
| ``` | |
| --- | |
| ## Performance Considerations | |
| ### API Call Optimization | |
| **Calls per Workflow:** | |
| For 3 objectives × 3 runs = 9 base objectives: | |
| 1. **Learning Objectives:** | |
| - Base generation: 3 calls (one per run) | |
| - Correct answers: 9 calls (one per objective) | |
| - Grouping: 1 call | |
| - Incorrect answers: ~3 calls (best-in-group only) | |
| - Improvement checks: ~3 calls | |
| - **Total: ~19 calls** | |
| 2. **Questions (for 3 objectives × 1 run):** | |
| - Question generation: 3 calls (parallel) | |
| - Quality judging: 3 calls (parallel) | |
| - Grouping: 1 call | |
| - Ranking: 1 call | |
| - **Total: ~8 calls** | |
| **Total for complete workflow: ~27 API calls** | |
| ### Time Estimates | |
| **Typical Execution Times:** | |
| - File processing: <1 second | |
| - Objective generation (3×3): 30-60 seconds | |
| - Question generation (3×1): 20-40 seconds (with parallelization) | |
| - **Total: 1-2 minutes for small course** | |
| **Factors Affecting Speed:** | |
| - Model selection (gpt-5 slower than gpt-5-mini) | |
| - Number of runs | |
| - Number of objectives/questions | |
| - API rate limits | |
| - Network latency | |
| - Parallel worker count | |
| ### Cost Optimization | |
| **Strategies:** | |
| 1. Use gpt-5-mini for grouping/ranking (hardcoded) | |
| 2. Reduce number of runs (trade-off: variety) | |
| 3. Generate fewer objectives initially | |
| 4. Use faster models for initial exploration | |
| 5. Use premium models for final production | |
| --- | |
| ## Conclusion | |
| The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs: | |
| - **Modular architecture** for maintainability | |
| - **Structured output generation** for reliability | |
| - **Quality-driven iterative refinement** for excellence | |
| - **Parallel processing** for efficiency | |
| - **Comprehensive error handling** for resilience | |
| The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials. | |