Spaces:

DeepLearningAI
/

quiz-generator-v3

Sleeping

App Files Files Community

quiz-generator-v3 / APP_FUNCTIONALITY_REPORT.md

ecuartasm

Initial commit: AI Course Assessment Generator

217abc3 about 1 month ago

preview code

raw

history blame contribute delete

57.3 kB

A newer version of the Gradio SDK is available: 6.10.0

Upgrade

AI Course Assessment Generator - Functionality Report

Overview
System Architecture
Data Models
Application Entry Point
User Interface Structure
Complete Workflow
Detailed Component Functionality
Quality Standards and Prompts

Overview

The AI Course Assessment Generator is a sophisticated educational tool that automates the creation of learning objectives and multiple-choice questions from course materials. It leverages OpenAI's language models with structured output generation to produce high-quality educational assessments that adhere to specified quality standards and Bloom's Taxonomy levels.

Key Capabilities

Multi-format Content Processing: Accepts .vtt, .srt (subtitle files), and .ipynb (Jupyter notebooks)
AI-Powered Generation: Uses OpenAI's GPT models with configurable parameters
Quality Assurance: Implements LLM-based quality assessment and ranking
Source Tracking: Maintains XML-tagged references from source materials to generated content
Iterative Improvement: Supports feedback-based regeneration and enhancement
Parallel Processing: Generates questions concurrently for improved performance

System Architecture

Architectural Patterns

1. Orchestrator Pattern

Both LearningObjectiveGenerator and QuizGenerator act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.

2. Modular Prompt System

The prompts/ directory contains reusable prompt components that are imported and combined in generation modules, allowing for consistent quality standards across different generation tasks.

3. Structured Output Generation

All LLM interactions use Pydantic models with the instructor library to ensure consistent, validated output formats using OpenAI's structured output API.

4. Source Tracking via XML Tags

Content is wrapped in XML tags (e.g., <source file="example.ipynb">content</source>) throughout the pipeline to maintain traceability from source files to generated questions.

Technology Stack

Python 3.8+
Gradio 5.29.0+: Web-based UI framework
Pydantic 2.8.0+: Data validation and schema management
OpenAI 1.52.0+: LLM API integration
Instructor 1.7.9+: Structured output generation
nbformat 5.9.2: Jupyter notebook parsing
python-dotenv 1.0.0: Environment variable management

Data Models

Learning Objectives Progression

The system uses a hierarchical progression of learning objective models:

1. BaseLearningObjectiveWithoutCorrectAnswer

- id: int
- learning_objective: str
- source_reference: Union[List[str], str]

Initial generation without correct answers.

2. BaseLearningObjective

- id: int
- learning_objective: str
- source_reference: Union[List[str], str]
- correct_answer: str

Base objectives with correct answers added.

3. LearningObjective

- id: int
- learning_objective: str
- source_reference: Union[List[str], str]
- correct_answer: str
- incorrect_answer_options: Union[List[str], str]
- in_group: Optional[bool]
- group_members: Optional[List[int]]
- best_in_group: Optional[bool]

Enhanced with incorrect answer suggestions and grouping metadata.

4. GroupedLearningObjective

(All fields from LearningObjective)
- in_group: bool (required)
- group_members: List[int] (required)
- best_in_group: bool (required)

Fully grouped and ranked objectives.

Question Models Progression

1. MultipleChoiceOption

- option_text: str
- is_correct: bool
- feedback: str

2. MultipleChoiceQuestion

- id: int
- question_text: str
- options: List[MultipleChoiceOption]
- learning_objective_id: int
- learning_objective: str
- correct_answer: str
- source_reference: Union[List[str], str]
- judge_feedback: Optional[str]
- approved: Optional[bool]

3. RankedMultipleChoiceQuestion

(All fields from MultipleChoiceQuestion)
- rank: int
- ranking_reasoning: str
- in_group: bool
- group_members: List[int]
- best_in_group: bool

4. Assessment

- learning_objectives: List[LearningObjective]
- questions: List[RankedMultipleChoiceQuestion]

Final output containing both objectives and questions.

Configuration Models

MODELS

Available OpenAI models: ["o3-mini", "o1", "gpt-4.1", "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo", "gpt-5", "gpt-5-mini", "gpt-5-nano"]

TEMPERATURE_UNAVAILABLE

Dictionary mapping models to temperature availability (some models like o1, o3-mini, and gpt-5 variants don't support temperature settings).

Application Entry Point

`app.py`

The root-level entry point that:

Loads environment variables from .env file
Checks for OPENAI_API_KEY presence
Creates the Gradio UI via ui.app.create_ui()
Launches the web interface at http://127.0.0.1:7860

# Workflow:
load_dotenv() → Check API key → create_ui() → app.launch()

User Interface Structure

`ui/app.py` - Gradio Interface

The UI is organized into 3 main tabs:

Tab 1: Generate Learning Objectives

Input Components:

File uploader (accepts .ipynb, .vtt, .srt)
Number of objectives per run (slider: 1-20, default: 3)
Number of generation runs (dropdown: 1-5, default: 3)
Model selection (dropdown, default: "gpt-5")
Incorrect answer model selection (dropdown, default: "gpt-5")
Temperature setting (dropdown: 0.0-1.0, default: 1.0)
Generate button
Feedback input textbox
Regenerate button

Output Components:

Status textbox
Best-in-Group Learning Objectives (JSON)
All Grouped Learning Objectives (JSON)
Raw Ungrouped Learning Objectives (JSON) - for debugging

Event Handler: process_files() from objective_handlers.py

Tab 2: Generate Questions

Input Components:

Learning Objectives JSON (auto-populated from Tab 1)
Model selection
Temperature setting
Number of question generation runs (slider: 1-5, default: 1)
Generate Questions button

Output Components:

Status textbox
Ranked Best-in-Group Questions (JSON)
All Grouped Questions (JSON)
Formatted Quiz (human-readable format)

Event Handler: generate_questions() from question_handlers.py

Tab 3: Propose/Edit Question

Input Components:

Question guidance/feedback textbox
Model selection
Temperature setting
Generate Question button

Output Components:

Status textbox
Generated Question (JSON)

Event Handler: propose_question_handler() from feedback_handlers.py

Complete Workflow

Phase 1: File Upload and Content Processing

Step 1.1: File Upload

User uploads one or more files (.vtt, .srt, .ipynb) through the Gradio interface.

Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)

# Handles different input formats:
- List of file paths
- Single file path string
- File objects with .name attribute

Step 1.3: Content Processing (`ui/content_processor.py`)

For Subtitle Files (.vtt, .srt):

1. Read file with UTF-8 encoding
2. Split into lines
3. Filter out:
   - Empty lines
   - Numeric timestamp indicators
   - Lines containing '-->' (timestamps)
   - 'WEBVTT' header lines
4. Combine remaining text lines
5. Wrap in XML tags: <source file='filename.vtt'>content</source>

For Jupyter Notebooks (.ipynb):

1. Validate JSON format
2. Parse with nbformat.read()
3. Extract from cells:
   - Markdown cells: [Markdown]\n{content}
   - Code cells: [Code]\n```python\n{content}\n```
4. Combine all cell content
5. Wrap in XML tags: <source file='filename.ipynb'>content</source>

Error Handling:

Invalid JSON: Wraps raw content in code blocks
Parsing failures: Falls back to plain text extraction
All errors logged to console

Step 1.4: State Storage

Processed content stored in global state (ui/state.py):

processed_file_contents = [tagged_content_1, tagged_content_2, ...]

Phase 2: Learning Objective Generation

Step 2.1: Multi-Run Base Generation

Process: objective_handlers._generate_multiple_runs()

For each run (user-specified, typically 3 runs):

Call: QuizGenerator.generate_base_learning_objectives()

Workflow:

generate_base_learning_objectives()
  ↓
generate_base_learning_objectives_without_correct_answers()
  → Creates prompt with:
     - BASE_LEARNING_OBJECTIVES_PROMPT
     - BLOOMS_TAXONOMY_LEVELS
     - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
     - Combined file contents
  → Calls OpenAI API with structured output
  → Returns List[BaseLearningObjectiveWithoutCorrectAnswer]
  ↓
generate_correct_answers_for_objectives()
  → For each objective:
     - Creates prompt with objective and course content
     - Calls OpenAI API (unstructured text response)
     - Extracts correct answer
  → Returns List[BaseLearningObjective]

ID Assignment:

# Temporary IDs by run:
Run 1: 1001, 1002, 1003
Run 2: 2001, 2002, 2003
Run 3: 3001, 3002, 3003

Aggregation: All objectives from all runs combined into single list.

Example: 3 runs × 3 objectives = 9 total base objectives

Step 2.2: Grouping and Ranking

Process: objective_handlers._group_base_objectives_add_incorrect_answers()

Step 2.2.1: Group Base Objectives

QuizGenerator.group_base_learning_objectives()
  ↓
learning_objective_generator/grouping_and_ranking.py
  → group_base_learning_objectives()

Grouping Logic:

Creates prompt containing:
- Original generation criteria
- All base objectives with IDs
- Course content for context
- Grouping instructions
Special Rule: All objectives with IDs ending in 1 (1001, 2001, 3001) are grouped together and ONE is marked as best-in-group (this becomes the primary/first objective)
LLM Call:
- Model: gpt-5-mini
- Response format: GroupedBaseLearningObjectivesResponse
- Returns: Grouped objectives with metadata

Output Structure:

{
  "all_grouped": [all objectives with group metadata],
  "best_in_group": [objectives marked as best in their groups]
}

Step 2.2.2: ID Reassignment (_reassign_objective_ids())

1. Find best objective from the 001 group
2. Assign it ID = 1
3. Assign remaining objectives IDs starting from 2

Step 2.2.3: Generate Incorrect Answer Options

Only for best-in-group objectives:

QuizGenerator.generate_lo_incorrect_answer_options()
  ↓
learning_objective_generator/enhancement.py
  → generate_incorrect_answer_options()

Process:

For each best-in-group objective:
- Creates prompt with:
  - Objective and correct answer
  - INCORRECT_ANSWER_PROMPT guidelines
  - INCORRECT_ANSWER_EXAMPLES
  - Course content
- Calls OpenAI API (with optional model override)
- Generates 5 plausible incorrect answer options
Returns: List[LearningObjective] with incorrect_answer_options populated

Step 2.2.4: Improve Incorrect Answers

learning_objective_generator.regenerate_incorrect_answers()
  ↓
learning_objective_generator/suggestion_improvement.py

Quality Check Process:

For each objective's incorrect answers:
- Checks for red flags (contradictory phrases, absolute terms)
- Examples of red flags:
  - "but not necessarily"
  - "at the expense of"
  - "rather than"
  - "always", "never", "exclusively"
If problems found:
- Logs issue to incorrect_suggestion_debug/ directory
- Regenerates incorrect answers with additional constraints
- Updates objective with improved answers

Step 2.2.5: Final Assembly

Creates final list where:

Best-in-group objectives have enhanced incorrect answers
Non-best-in-group objectives have empty incorrect_answer_options: []

Step 2.3: Display Results

Three output formats:

Best-in-Group Objectives (primary output):
- Only objectives marked as best_in_group
- Includes incorrect answer options
- Sorted by ID
- Formatted as JSON
All Grouped Objectives:
- All objectives with grouping metadata
- Shows group_members arrays
- Best-in-group flags visible
Raw Ungrouped (debug):
- Original objectives from all runs
- No grouping metadata
- Original temporary IDs

Step 2.4: State Update

set_learning_objectives(grouped_result["all_grouped"])
set_processed_contents(file_contents)  # Already set, but persisted

Phase 3: Question Generation

Step 3.1: Parse Learning Objectives

Process: question_handlers._parse_learning_objectives()

1. Parse JSON from Tab 1 output
2. Create LearningObjective objects from dictionaries
3. Validate required fields
4. Return List[LearningObjective]

Step 3.2: Multi-Run Question Generation

Process: question_handlers._generate_questions_multiple_runs()

For each run (user-specified, typically 1 run):

QuizGenerator.generate_questions_in_parallel()
  ↓
quiz_generator/assessment.py
  → generate_questions_in_parallel()

Parallel Generation Process:

Thread Pool Setup:

max_workers = min(len(learning_objectives), 5)
ThreadPoolExecutor(max_workers=max_workers)

For Each Learning Objective (in parallel):

Step 3.2.1: Question Generation (quiz_generator/question_generation.py)

generate_multiple_choice_question()

a) Source Content Matching:

- Extract source_reference from objective
- Search file_contents for matching XML tags
- Exact match: <source file='filename.vtt'>
- Fallback: Partial filename match
- Last resort: Use all file contents combined

b) Multi-Source Handling:

if len(source_references) > 1:
    Add special instruction:
    "Question should synthesize information across sources"

c) Prompt Construction:

Combines:
- Learning objective
- Correct answer
- Incorrect answer options from objective
- GENERAL_QUALITY_STANDARDS
- MULTIPLE_CHOICE_STANDARDS
- EXAMPLE_QUESTIONS
- QUESTION_SPECIFIC_QUALITY_STANDARDS
- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
- ANSWER_FEEDBACK_QUALITY_STANDARDS
- Matched course content

d) API Call:

- Model: User-selected (default: gpt-5)
- Temperature: User-selected (if supported by model)
- Response format: MultipleChoiceQuestion
- Returns: Question with 4 options, each with feedback

e) Post-Processing:

- Set question ID = learning_objective ID
- Verify all options have feedback
- Add default feedback if missing

Step 3.2.2: Quality Assessment (quiz_generator/question_improvement.py)

judge_question_quality()

Quality Judging Process:

1. Creates evaluation prompt with:
   - Question text and all options
   - Quality criteria from prompts
   - Evaluation instructions

2. LLM evaluates question for:
   - Clarity and unambiguity
   - Alignment with learning objective
   - Quality of incorrect options
   - Feedback quality
   - Appropriate difficulty

3. Returns:
   - approved: bool
   - feedback: str (reasoning for judgment)

4. Updates question:
   question.approved = approved
   question.judge_feedback = feedback

Results Collection:

- Questions collected as futures complete
- IDs assigned sequentially across runs
- All questions aggregated into single list

Example: 3 objectives × 1 run = 3 questions generated in parallel

Step 3.3: Grouping Questions

Process: quiz_generator/question_ranking.py → group_questions()

1. Creates prompt with:
   - All generated questions
   - Grouping instructions
   - Example format

2. LLM identifies:
   - Questions testing same concept (same learning_objective_id)
   - Groups of similar questions
   - Best question in each group

3. Model: gpt-5-mini
   Response format: GroupedMultipleChoiceQuestionsResponse

4. Returns:
   {
     "grouped": [all questions with group metadata],
     "best_in_group": [best questions from each group]
   }

Step 3.4: Ranking Questions

Process: quiz_generator/question_ranking.py → rank_questions()

Only ranks best-in-group questions:

1. Creates prompt with:
   - RANK_QUESTIONS_PROMPT
   - All quality standards
   - Best-in-group questions only
   - Course content for context

2. Ranking Criteria:
   - Question clarity and unambiguity
   - Alignment with learning objective
   - Quality of incorrect options
   - Feedback quality
   - Appropriate difficulty (prefers simple English)
   - Adherence to all guidelines
   - Avoidance of absolute terms

3. Special Instructions:
   - NEVER change question with ID=1
   - Each question gets unique rank (2, 3, 4, ...)
   - Rank 1 is reserved
   - All questions must be returned

4. Model: User-selected
   Response format: RankedMultipleChoiceQuestionsResponse

5. Returns:
   {
     "ranked": [questions with rank and ranking_reasoning]
   }

Step 3.5: Format Results

Process: question_handlers._format_question_results()

Three outputs:

Best-in-Group Ranked Questions:

- Sorted by rank
- Includes all question data
- Includes rank and ranking_reasoning
- Includes group metadata
- Formatted as JSON

All Grouped Questions:

- All questions with group metadata
- No ranking information
- Shows which questions are in groups
- Formatted as JSON

Formatted Quiz:

format_quiz_for_ui() creates human-readable format:

**Question 1 [Rank: 2]:** What is...

Ranking Reasoning: ...

• A [Correct]: Option text
  ◦ Feedback: Correct feedback

• B: Option text
  ◦ Feedback: Incorrect feedback

[continues for all questions]

Phase 4: Custom Question Generation (Optional)

Tab 3 Workflow:

Step 4.1: User Input

User provides:

Free-form guidance/feedback text
Model selection
Temperature setting

Step 4.2: Generation

Process: feedback_handlers.propose_question_handler()

QuizGenerator.generate_multiple_choice_question_from_feedback()
  ↓
quiz_generator/feedback_questions.py

Workflow:

1. Retrieves processed file contents from state

2. Creates prompt combining:
   - User feedback/guidance
   - All quality standards
   - Course content
   - Generation criteria

3. Model generates:
   - Single question
   - With learning objective inferred from guidance
   - 4 options with feedback
   - Source references

4. Returns: MultipleChoiceQuestionFromFeedback object
   (includes user feedback as metadata)

5. Formatted as JSON for display

Phase 5: Assessment Export (Automated)

The final assessment can be saved using:

QuizGenerator.save_assessment_to_json()
  ↓
quiz_generator/assessment.py → save_assessment_to_json()

Process:

1. Convert Assessment object to dictionary
   assessment_dict = assessment.model_dump()

2. Write to JSON file with indent=2
   Default filename: "assessment.json"

3. Contains:
   - All learning objectives (best-in-group)
   - All ranked questions
   - Complete metadata

Detailed Component Functionality

Content Processor (`ui/content_processor.py`)

Class: ContentProcessor

Methods:

process_files(file_paths: List[str]) -> List[str]
- Main entry point for processing multiple files
- Returns list of XML-tagged content strings
- Stores results in self.file_contents
process_file(file_path: str) -> List[str]
- Routes to appropriate handler based on file extension
- Returns single-item list with tagged content
_process_subtitle_file(file_path: str) -> List[str]
- Filters out timestamps and metadata
- Preserves actual subtitle text
- Wraps in <source file='...'> tags
_process_notebook_file(file_path: str) -> List[str]
- Validates JSON structure
- Parses with nbformat
- Extracts markdown and code cells
- Falls back to raw text on parsing errors
- Wraps in <source file='...'> tags

Learning Objective Generator (`learning_objective_generator/`)

generator.py - LearningObjectiveGenerator Class

Orchestrator that delegates to specialized modules:

Methods:

generate_base_learning_objectives()
- Delegates to base_generation.py
- Returns base objectives with correct answers
group_base_learning_objectives()
- Delegates to grouping_and_ranking.py
- Groups similar objectives
- Identifies best in each group
generate_incorrect_answer_options()
- Delegates to enhancement.py
- Adds 5 incorrect answer suggestions per objective
regenerate_incorrect_answers()
- Delegates to suggestion_improvement.py
- Quality-checks and improves incorrect answers
generate_and_group_learning_objectives()
- Complete workflow method
- Combines: base generation → grouping → incorrect answers
- Returns dict with all_grouped and best_in_group

base_generation.py

Key Functions:

generate_base_learning_objectives()

Wrapper that calls two separate functions
First: Generate objectives without correct answers
Second: Generate correct answers for those objectives

generate_base_learning_objectives_without_correct_answers()

Process:

1. Extract source filenames from XML tags
2. Combine all file contents
3. Create prompt with:
   - BASE_LEARNING_OBJECTIVES_PROMPT
   - BLOOMS_TAXONOMY_LEVELS
   - LEARNING_OBJECTIVE_EXAMPLES_WITHOUT_ANSWERS
   - Course content
4. API call:
   - Model: User-selected
   - Temperature: User-selected (if supported)
   - Response format: BaseLearningObjectivesWithoutCorrectAnswerResponse
5. Post-process:
   - Assign sequential IDs
   - Normalize source_reference (extract basenames)
6. Returns: List[BaseLearningObjectiveWithoutCorrectAnswer]

generate_correct_answers_for_objectives()

Process:

1. For each objective without answer:
   - Create prompt with objective + course content
   - Call OpenAI API (text response, not structured)
   - Extract correct answer
   - Create BaseLearningObjective with answer
2. Error handling: Add "[Error generating correct answer]" on failure
3. Returns: List[BaseLearningObjective]

Quality Guidelines in Prompt:

Objectives must be assessable via multiple-choice
Start with action verbs (identify, describe, define, list, compare)
One goal per objective
Derived directly from course content
Tool/framework agnostic (focus on principles, not specific implementations)
First objective should be relatively easy recall question
Avoid objectives about "building" or "creating" (not MC-assessable)

grouping_and_ranking.py

Key Functions:

group_base_learning_objectives()

Process:

1. Format objectives for display in prompt
2. Create grouping prompt with:
   - Original generation criteria
   - All base objectives
   - Course content
   - Grouping instructions
3. Special rule:
   - All objectives with IDs ending in 1 grouped together
   - Best one selected from this group
   - Will become primary objective (ID=1)
4. API call:
   - Model: "gpt-5-mini" (hardcoded for efficiency)
   - Response format: GroupedBaseLearningObjectivesResponse
5. Post-process:
   - Normalize best_in_group to Python bool
   - Filter for best-in-group objectives
6. Returns:
   {
     "all_grouped": List[GroupedBaseLearningObjective],
     "best_in_group": List[GroupedBaseLearningObjective]
   }

Grouping Criteria:

Topic overlap
Similarity of concepts
Quality based on original generation criteria
Clarity and specificity
Alignment with course content

enhancement.py

Key Function: generate_incorrect_answer_options()

Process:

1. For each base objective:
   - Create prompt with:
     - Learning objective and correct answer
     - INCORRECT_ANSWER_PROMPT (detailed guidelines)
     - INCORRECT_ANSWER_EXAMPLES
     - Course content
   - Request 5 plausible incorrect options
2. API call:
   - Model: model_override or default
   - Temperature: User-selected (if supported)
   - Response format: LearningObjective (includes incorrect_answer_options)
3. Returns: List[LearningObjective] with all fields populated

Incorrect Answer Quality Principles:

Create common misunderstandings
Maintain identical structure to correct answer
Use course terminology correctly but in wrong contexts
Include partially correct information
Avoid obviously wrong answers
Mirror detail level and style of correct answer
Avoid absolute terms ("always", "never", "exclusively")
Avoid contradictory second clauses

suggestion_improvement.py

Key Function: regenerate_incorrect_answers()

Process:

1. For each learning objective:
   - Call should_regenerate_incorrect_answers()

2. should_regenerate_incorrect_answers():
   - Creates evaluation prompt with:
     - Objective and all incorrect options
     - IMMEDIATE_RED_FLAGS checklist
     - RULES_FOR_SECOND_CLAUSES
   - LLM evaluates each option
   - Returns: needs_regeneration: bool

3. If regeneration needed:
   - Logs to incorrect_suggestion_debug/{id}.txt
   - Creates new prompt with additional constraints
   - Regenerates incorrect answers
   - Validates again

4. Returns: List[LearningObjective] with improved incorrect answers

Red Flags Checked:

Contradictory second clauses ("but not necessarily")
Explicit negations ("without automating")
Opposite descriptions ("fixed steps" for flexible systems)
Absolute/comparative terms
Hedging that creates limitations
Trade-off language creating false dichotomies

Quiz Generator (`quiz_generator/`)

generator.py - QuizGenerator Class

Orchestrator with LearningObjectiveGenerator embedded:

Initialization:

def __init__(self, api_key, model="gpt-5", temperature=1.0):
    self.client = OpenAI(api_key=api_key)
    self.model = model
    self.temperature = temperature
    self.learning_objective_generator = LearningObjectiveGenerator(
        api_key=api_key, model=model, temperature=temperature
    )

Methods (delegates to specialized modules):

generate_base_learning_objectives() → delegates to LearningObjectiveGenerator
generate_lo_incorrect_answer_options() → delegates to LearningObjectiveGenerator
group_base_learning_objectives() → delegates to grouping_and_ranking.py
generate_multiple_choice_question() → delegates to question_generation.py
generate_questions_in_parallel() → delegates to assessment.py
group_questions() → delegates to question_ranking.py
rank_questions() → delegates to question_ranking.py
judge_question_quality() → delegates to question_improvement.py
regenerate_incorrect_answers() → delegates to question_improvement.py
generate_multiple_choice_question_from_feedback() → delegates to feedback_questions.py
save_assessment_to_json() → delegates to assessment.py

question_generation.py

Key Function: generate_multiple_choice_question()

Detailed Process:

1. Source Content Matching:

source_references = learning_objective.source_reference
if isinstance(source_references, str):
    source_references = [source_references]

combined_content = ""
for source_file in source_references:
    # Try exact match: <source file='filename'>
    for file_content in file_contents:
        if f"<source file='{source_file}'>" in file_content:
            combined_content += file_content
            break

    # Fallback: partial match
    if not found:
        for file_content in file_contents:
            if source_file in file_content:
                combined_content += file_content
                break

# Last resort: use all content
if not combined_content:
    combined_content = "\n\n".join(file_contents)

2. Multi-Source Instruction:

if len(source_references) > 1:
    Add special instruction:
    "This learning objective spans multiple sources.
     Your question should:
     1. Synthesize information across these sources
     2. Test understanding of overarching themes
     3. Require knowledge from multiple sources"

3. Prompt Construction: Combines extensive quality standards:

- Learning objective
- Correct answer
- Incorrect answer options from objective
- GENERAL_QUALITY_STANDARDS
- MULTIPLE_CHOICE_STANDARDS
- EXAMPLE_QUESTIONS
- QUESTION_SPECIFIC_QUALITY_STANDARDS
- CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS
- INCORRECT_ANSWER_EXAMPLES_WITH_EXPLANATION
- ANSWER_FEEDBACK_QUALITY_STANDARDS
- Multi-source instruction (if applicable)
- Matched course content

4. API Call:

params = {
    "model": model,
    "messages": [
        {"role": "system", "content": "Expert educational assessment creator"},
        {"role": "user", "content": prompt}
    ],
    "response_format": MultipleChoiceQuestion
}
if not TEMPERATURE_UNAVAILABLE.get(model, True):
    params["temperature"] = temperature

response = client.beta.chat.completions.parse(**params)

5. Post-Processing:

- Set response.id = learning_objective.id
- Set response.learning_objective_id = learning_objective.id
- Set response.learning_objective = learning_objective.learning_objective
- Set response.source_reference = learning_objective.source_reference
- Verify all options have feedback
- Add default feedback if missing

6. Error Handling:

On exception:
- Create fallback question with 4 generic options
- Include error message in question_text
- Mark as questionable quality

question_ranking.py

Key Functions:

group_questions(questions, file_contents)

Process:

1. Create prompt with:
   - GROUP_QUESTIONS_PROMPT
   - All questions with complete data
   - Grouping instructions

2. Grouping Logic:
   - Questions with same learning_objective_id are similar
   - Group by topic overlap
   - Mark best_in_group within each group
   - Single-member groups: best_in_group = true by default

3. API call:
   - Model: User-selected
   - Response format: GroupedMultipleChoiceQuestionsResponse

4. Critical Instructions:
   - MUST return ALL questions
   - Each question must have group metadata
   - best_in_group set appropriately

5. Returns:
   {
     "grouped": List[GroupedMultipleChoiceQuestion],
     "best_in_group": [questions where best_in_group=true]
   }

rank_questions(questions, file_contents)

Process:

1. Create prompt with:
   - RANK_QUESTIONS_PROMPT
   - ALL quality standards (comprehensive)
   - Best-in-group questions only
   - Course content

2. Ranking Criteria (from prompt):
   - Question clarity and unambiguity
   - Alignment with learning objective
   - Quality of incorrect options
   - Feedback quality
   - Appropriate difficulty (simple English preferred)
   - Adherence to all guidelines
   - Avoidance of problematic words/phrases

3. Special Instructions:
   - DO NOT change question with ID=1
   - Rank starting from 2 (rank 1 reserved)
   - Each question gets unique rank
   - Must return ALL questions

4. API call:
   - Model: User-selected
   - Response format: RankedMultipleChoiceQuestionsResponse

5. Returns:
   {
     "ranked": List[RankedMultipleChoiceQuestion]
              (includes rank and ranking_reasoning for each)
   }

Simple vs Complex English Examples (from ranking criteria):

Simple: "AI engineers create computer programs that can learn from data"
Complex: "AI engineering practitioners architect computational paradigms
          exhibiting autonomous erudition capabilities"

question_improvement.py

Key Functions:

judge_question_quality(client, model, temperature, question)

Process:

1. Create evaluation prompt with:
   - Question text
   - All options with feedback
   - Quality criteria
   - Evaluation instructions

2. LLM evaluates:
   - Clarity and lack of ambiguity
   - Alignment with learning objective
   - Quality of distractors (incorrect options)
   - Feedback quality and helpfulness
   - Appropriate difficulty level
   - Adherence to all standards

3. API call:
   - Unstructured text response
   - LLM returns: APPROVED or NOT APPROVED + reasoning

4. Parsing:
   approved = "APPROVED" in response.upper()
   feedback = full response text

5. Returns: (approved: bool, feedback: str)

should_regenerate_incorrect_answers(client, question, file_contents, model_name)

Process:

1. Extract incorrect options from question

2. Create evaluation prompt with:
   - Each incorrect option
   - IMMEDIATE_RED_FLAGS checklist
   - Course content for context

3. LLM checks each option for:
   - Contradictory second clauses
   - Explicit negations
   - Absolute terms
   - Opposite descriptions
   - Trade-off language

4. Returns: needs_regeneration: bool

5. If true:
   - Log to wrong_answer_debug/ directory
   - Provides detailed feedback on issues

regenerate_incorrect_answers(client, model, temperature, questions, file_contents)

Process:

1. For each question:
   - Check if regeneration needed
   - If yes:
     a. Create new prompt with stricter constraints
     b. Include original question for context
     c. Add specific rules about avoiding red flags
     d. Regenerate options
     e. Validate again
   - If no: keep original

2. Returns: List of questions with improved incorrect answers

feedback_questions.py

Key Function: generate_multiple_choice_question_from_feedback()

Process:

1. Accept user feedback/guidance as free-form text

2. Create prompt combining:
   - User feedback
   - All quality standards
   - Course content
   - Standard generation criteria

3. LLM infers:
   - Learning objective from feedback
   - Appropriate question
   - 4 options with feedback
   - Source references

4. API call:
   - Model: User-selected
   - Response format: MultipleChoiceQuestionFromFeedback

5. Includes user feedback as metadata in response

6. Returns: Single question object

assessment.py

Key Functions:

generate_questions_in_parallel()

Parallel Processing Details:

1. Setup:
   max_workers = min(len(learning_objectives), 5)
   # Limits to 5 concurrent threads

2. Thread Pool Executor:
   with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:

3. For each objective (in separate thread):

   Worker function:
   def generate_question_for_objective(objective, idx):
       - Generate question
       - Judge quality
       - Update with approval and feedback
       - Handle errors gracefully
       - Return complete question

4. Submit all tasks:
   future_to_idx = {
       executor.submit(generate_question_for_objective, obj, i): i
       for i, obj in enumerate(learning_objectives)
   }

5. Collect results as completed:
   for future in concurrent.futures.as_completed(future_to_idx):
       question = future.result()
       questions.append(question)
       print progress

6. Error handling:
   - Individual failures don't stop other threads
   - Placeholder questions created on error
   - All errors logged

7. Returns: List[MultipleChoiceQuestion] with quality judgments

save_assessment_to_json(assessment, output_path)

1. Convert Pydantic model to dict:
   assessment_dict = assessment.model_dump()

2. Write to JSON file:
   with open(output_path, "w") as f:
       json.dump(assessment_dict, f, indent=2)

3. File contains:
   {
     "learning_objectives": [...],
     "questions": [...]
   }

State Management (`ui/state.py`)

Global State Variables:

processed_file_contents = []  # List of XML-tagged content strings
generated_learning_objectives = []  # List of learning objective objects

Functions:

get_processed_contents() → retrieves file contents
set_processed_contents(contents) → stores file contents
get_learning_objectives() → retrieves objectives
set_learning_objectives(objectives) → stores objectives
clear_state() → resets both variables

Purpose:

Persists data between UI tabs
Allows Tab 2 to access content processed in Tab 1
Allows Tab 3 to access content for custom questions
Enables regeneration with feedback

UI Handlers

objective_handlers.py

process_files(files, num_objectives, num_runs, model_name, incorrect_answer_model_name, temperature)

Complete Workflow:

1. Validate inputs (files exist, API key present)
2. Extract file paths from Gradio file objects
3. Process files → get XML-tagged content
4. Store in state
5. Create QuizGenerator
6. Generate multiple runs of base objectives
7. Group and rank objectives
8. Generate incorrect answers for best-in-group
9. Improve incorrect answers
10. Reassign IDs (best from 001 group → ID=1)
11. Format results for display
12. Store in state
13. Return 4 outputs: status, best-in-group, all-grouped, raw

regenerate_objectives(objectives_json, feedback, num_objectives, num_runs, model_name, temperature)

Workflow:

1. Retrieve processed contents from state
2. Append feedback to content:
   file_contents_with_feedback.append(f"FEEDBACK: {feedback}")
3. Generate new objectives with feedback context
4. Group and rank
5. Return regenerated objectives

_reassign_objective_ids(grouped_objectives)

ID Assignment Logic:

1. Find all objectives with IDs ending in 001 (1001, 2001, etc.)
2. Identify their groups
3. Find best_in_group objective from these groups
4. Assign it ID = 1
5. Assign all other objectives sequential IDs starting from 2

_format_objective_results(grouped_result, all_learning_objectives)

Formatting:

1. Sort by ID
2. Create dictionaries from Pydantic objects
3. Include all metadata fields
4. Convert to JSON with indent=2
5. Return 3 formatted outputs + status message

question_handlers.py

generate_questions(objectives_json, model_name, temperature, num_runs)

Complete Workflow:

1. Validate inputs
2. Parse objectives JSON → create LearningObjective objects
3. Retrieve processed contents from state
4. Create QuizGenerator
5. Generate questions (multiple runs in parallel)
6. Group questions by similarity
7. Rank best-in-group questions
8. Optionally improve incorrect answers (currently commented out)
9. Format results
10. Return 4 outputs: status, best-ranked, all-grouped, formatted

_generate_questions_multiple_runs()

For each run:
1. Call generate_questions_in_parallel()
2. Assign unique IDs across runs:
   start_id = len(all_questions) + 1
   for i, q in enumerate(run_questions):
       q.id = start_id + i
3. Aggregate all questions

_group_and_rank_questions()

1. Group all questions → get grouped and best_in_group
2. Rank only best_in_group questions
3. Return:
   {
     "grouped": all with group metadata,
     "best_in_group_ranked": best with ranks
   }

feedback_handlers.py

propose_question_handler(guidance, model_name, temperature)

Workflow:

1. Validate state (processed contents available)
2. Create QuizGenerator
3. Call generate_multiple_choice_question_from_feedback()
   - Passes user guidance and course content
   - LLM infers learning objective
   - Generates complete question
4. Format as JSON
5. Return status and question JSON

Formatting Utilities (`ui/formatting.py`)

format_quiz_for_ui(questions_json)

Process:

1. Parse JSON to list of question dictionaries
2. Sort by rank if available
3. For each question:
   - Add header: "**Question N [Rank: X]:** {question_text}"
   - Add ranking reasoning if available
   - For each option:
     - Add letter (A, B, C, D)
     - Mark correct option
     - Include option text
     - Include feedback indented
4. Return formatted string with markdown

Output Example:

**Question 1 [Rank: 2]:** What is the primary purpose of AI agents?

Ranking Reasoning: Clear question that tests fundamental understanding...

    • A [Correct]: To automate tasks and make decisions
      ◦ Feedback: Correct! AI agents are designed to automate tasks...

    • B: To replace human workers entirely
      ◦ Feedback: While AI agents can automate tasks, they are not...

[continues...]

Quality Standards and Prompts

Learning Objectives Quality Standards

From prompts/learning_objectives.py:

BASE_LEARNING_OBJECTIVES_PROMPT - Key Requirements:

Assessability:
- Must be testable via multiple-choice questions
- Cannot be about "building", "creating", "developing"
- Should use verbs like: identify, list, describe, define, compare
Specificity:
- One goal per objective
- Don't combine multiple action verbs
- Example of what NOT to do: "identify X and explain Y"
Source Alignment:
- Derived DIRECTLY from course content
- No topics not covered in content
- Appropriate difficulty level for course
Independence:
- Each objective stands alone
- No dependencies on other objectives
- No context required from other objectives
Focus:
- Address "why" over "what" when possible
- Critical knowledge over trivial facts
- Principles over specific implementation details
Tool/Framework Agnosticism:
- Don't mention specific tools/frameworks
- Focus on underlying principles
- Example: Don't ask about "Pandas DataFrame methods", ask about "data filtering concepts"
First Objective Rule:
- Should be relatively easy recall question
- Address main topic/concept of course
- Format: "Identify what X is" or "Explain why X is important"
Answer Length:
- Aim for ≤20 words in correct answer
- Avoid unnecessary elaboration
- No compound sentences with extra consequences

BLOOMS_TAXONOMY_LEVELS:

Levels from lowest to highest:

Recall: Retention of key concepts (not trivialities)
Comprehension: Connect ideas, demonstrate understanding
Application: Apply concept to new but similar scenario
Analysis: Examine parts, determine relationships, make inferences
Evaluation: Make judgments requiring critical thinking

LEARNING_OBJECTIVE_EXAMPLES:

Includes 7 high-quality examples with:

Appropriate action verbs
Clear learning objectives
Concise correct answers (mostly <20 words)
Multiple source references
Framework-agnostic language

Question Quality Standards

From prompts/questions.py:

GENERAL_QUALITY_STANDARDS:

Overall goal: Set learner up for success
Perfect score attainable for thoughtful students
Aligned with course content
Aligned with learning objective and correct answer
No references to manual intervention (software/AI course)

MULTIPLE_CHOICE_STANDARDS:

EXACTLY ONE correct answer per question
Clear, unambiguous correct answer
Plausible distractors representing common misconceptions
Not obviously wrong distractors
All options similar length and detail
Mutually exclusive options
Avoid "all/none of the above"
Typically 4 options (A, B, C, D)
Don't start feedback with "Correct" or "Incorrect"

QUESTION_SPECIFIC_QUALITY_STANDARDS:

Questions must:

Match language and tone of course
Match difficulty level of course
Assess only course information
Not teach as part of quiz
Use clear, concise language
Not induce confusion
Provide slight (not major) challenge
Be easily interpreted and unambiguous
Have proper grammar and sentence structure
Be thoughtful and specific (not broad and ambiguous)
Be complete in wording (understanding question shouldn't be part of assessment)

CORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:

Correct answers must:

Be factually correct and unambiguous
Match course language and tone
Be complete sentences
Match course difficulty level
Contain only course information
Not teach during quiz
Use clear, concise language
Be thoughtful and specific
Be complete (identifying correct answer shouldn't require interpretation)

INCORRECT_ANSWER_SPECIFIC_QUALITY_STANDARDS:

Incorrect answers should:

Represent reasonable potential misconceptions
Sound plausible to non-experts
Require thought even from diligent learners
Not be obviously wrong
Use incorrect_answer_suggestions from objective (as starting point)

Avoid:

Obviously wrong options anyone can eliminate
Absolute terms: "always", "never", "only", "exclusively"
Phrases like "used exclusively for scenarios where..."

ANSWER_FEEDBACK_QUALITY_STANDARDS:

For Incorrect Answers:

Be informational and encouraging (not punitive)
Single sentence, concise
Do NOT say "Incorrect" or "Wrong"

For Correct Answers:

Be informational and encouraging
Single sentence, concise
Do NOT say "Correct!" (redundant after "Correct: " prefix)

Incorrect Answer Generation Guidelines

From prompts/incorrect_answers.py:

Core Principles:

Create Common Misunderstandings:
- Represent how students actually misunderstand
- Confuse related concepts
- Mix up terminology
Maintain Identical Structure:
- Match grammatical pattern of correct answer
- Same length and complexity
- Same formatting style
Use Course Terminology Correctly but in Wrong Contexts:
- Apply correct terms incorrectly
- Confuse with related concepts
- Example: Describe backpropagation but actually describe forward propagation
Include Partially Correct Information:
- First part correct, second part wrong
- Correct process but wrong application
- Correct concept but incomplete
Avoid Obviously Wrong Answers:
- No contradictions with basic knowledge
- Not immediately eliminable
- Require course knowledge to reject
Mirror Detail Level and Style:
- Match technical depth
- Match tone
- Same level of specificity
For Lists, Maintain Consistency:
- Same number of items
- Same format
- Mix some correct with incorrect items
AVOID ABSOLUTE TERMS:
- "always", "never", "exclusively", "primarily"
- "all", "every", "none", "nothing", "only"
- "must", "required", "impossible"
- "rather than", "as opposed to", "instead of"

IMMEDIATE_RED_FLAGS (triggers regeneration):

Contradictory Second Clauses:

"but not necessarily"
"at the expense of"
"rather than [core concept]"
"ensuring X rather than Y"
"without necessarily"
"but has no impact on"
"but cannot", "but prevents", "but limits"

Explicit Negations:

"without automating", "without incorporating"
"preventing [main benefit]"
"limiting [main capability]"

Opposite Descriptions:

"fixed steps" (for flexible systems)
"manual intervention" (for automation)
"simple question answering" (for complex processing)

Hedging Creating Limitations:

"sometimes", "occasionally", "might"
"to some extent", "partially", "somewhat"

INCORRECT_ANSWER_EXAMPLES:

Includes 10 detailed examples showing:

Learning objective
Correct answer
3 plausible incorrect suggestions
Explanation of why each is plausible but wrong
Consistent formatting across all options

Ranking and Grouping

RANK_QUESTIONS_PROMPT:

Criteria:

Question clarity and unambiguity
Alignment with learning objective
Quality of incorrect options
Quality of feedback
Appropriate difficulty (simple English preferred)
Adherence to all guidelines

Critical Instructions:

DO NOT change question with ID=1
Rank starting from 2
Each question unique rank
Must return ALL questions
No omissions
No duplicate ranks

Simple vs Complex English:

Simple: "AI engineers create computer programs that learn from data"
Complex: "AI engineering practitioners architect computational paradigms
          exhibiting autonomous erudition capabilities"

GROUP_QUESTIONS_PROMPT:

Grouping Logic:

Questions with same learning_objective_id are similar
Identify topic overlap
Mark best_in_group within each group
Single-member groups: best_in_group = true

Critical Instructions:

Must return ALL questions
Each question needs group metadata
No omissions
Best in group marked appropriately

Summary of Data Flow

Complete End-to-End Flow

User Uploads Files
      ↓
ContentProcessor extracts and tags content
      ↓
[Stored in global state]
      ↓
Generate Base Objectives (multiple runs)
      ↓
Group Base Objectives (by similarity)
      ↓
Generate Incorrect Answers (for best-in-group only)
      ↓
Improve Incorrect Answers (quality check)
      ↓
Reassign IDs (best from 001 group → ID=1)
      ↓
[Objectives displayed in UI, stored in state]
      ↓
Generate Questions (parallel, multiple runs)
      ↓
Judge Question Quality (parallel)
      ↓
Group Questions (by similarity)
      ↓
Rank Questions (best-in-group only)
      ↓
[Questions displayed in UI]
      ↓
Format for Display
      ↓
Export to JSON (optional)

Key Optimization Strategies

Multiple Generation Runs:
- Generates variety of objectives/questions
- Grouping identifies best versions
- Reduces risk of poor quality individual outputs
Hierarchical Processing:
- Generate base → Group → Enhance → Improve
- Only enhances best candidates (saves API calls)
- Progressive refinement
Parallel Processing:
- Questions generated concurrently (up to 5 threads)
- Significant time savings for multiple objectives
- Independent evaluations
Quality Gating:
- LLM judges question quality
- Checks for red flags in incorrect answers
- Regenerates problematic content
Source Tracking:
- XML tags preserve origin
- Questions link back to source materials
- Enables accurate content matching
Modular Prompts:
- Reusable quality standards
- Consistent across all generations
- Easy to update centrally

Configuration and Customization

Available Models

Configured in models/config.py:

MODELS = [
    "o3-mini", "o1",           # Reasoning models (no temperature)
    "gpt-4.1", "gpt-4o",       # GPT-4 variants
    "gpt-4o-mini", "gpt-4",
    "gpt-3.5-turbo",           # Legacy
    "gpt-5",                   # Latest (no temperature)
    "gpt-5-mini",              # Efficient (no temperature)
    "gpt-5-nano"               # Ultra-efficient (no temperature)
]

Temperature Support:

Models with reasoning (o1, o3-mini, gpt-5 variants): No temperature
Other models: Temperature 0.0 to 1.0

Model Selection Strategy:

Base objectives: User-selected (default: gpt-5)
Grouping: Hardcoded gpt-5-mini (efficiency)
Incorrect answers: Separate user selection (default: gpt-5)
Questions: User-selected (default: gpt-5)
Quality judging: User-selected or gpt-5-mini

Environment Variables

Required:

OPENAI_API_KEY=your_api_key_here

Configured via .env file in project root

Customization Points

Quality Standards:
- Edit prompts/learning_objectives.py
- Edit prompts/questions.py
- Edit prompts/incorrect_answers.py
- Changes apply to all future generations
Example Questions/Objectives:
- Modify LEARNING_OBJECTIVE_EXAMPLES
- Modify EXAMPLE_QUESTIONS
- Modify INCORRECT_ANSWER_EXAMPLES
- LLM learns from these examples
Generation Parameters:
- Number of objectives per run
- Number of runs (variety)
- Temperature (creativity vs consistency)
- Model selection (quality vs cost/speed)
Parallel Processing:
- max_workers in assessment.py
- Currently: min(len(objectives), 5)
- Adjust for your rate limits
Output Formats:
- Modify formatting.py for display
- Assessment JSON structure in models/assessment.py

Error Handling and Resilience

Content Processing Errors

Invalid JSON notebooks: Falls back to raw text
Parse failures: Wraps in code blocks, continues
Missing files: Logged, skipped
Encoding issues: UTF-8 fallback

Generation Errors

API failures: Logged with traceback
Structured output parse errors: Fallback responses created
Missing required fields: Default values assigned
Validation errors: Caught and logged

Parallel Processing Errors

Individual thread failures: Don't stop other threads
Placeholder questions: Created on error
Complete error details: Logged for debugging
Graceful degradation: Partial results returned

Quality Check Failures

Regeneration failures: Original kept with warning
Judge unavailable: Questions marked unapproved
Validation failures: Detailed logs in debug directories

Debug and Logging

Debug Directories

incorrect_suggestion_debug/
- Created during objective enhancement
- Contains logs of problematic incorrect answers
- Format: {objective_id}.txt
- Includes: Original suggestions, identified issues, regeneration attempts
wrong_answer_debug/
- Created during question improvement
- Logs question-level incorrect answer issues
- Regeneration history

Console Logging

Extensive logging throughout:

File processing status
Generation progress (run numbers)
Parallel thread activity (thread IDs)
API call results
Error messages with tracebacks
Timing information (start/end times)

Example Log Output:

DEBUG - Processing 3 files: ['file1.vtt', 'file2.ipynb', 'file3.srt']
DEBUG - Found source file: file1.vtt
Generating 3 learning objectives from 3 files
Successfully generated 3 learning objectives without correct answers
Generated correct answer for objective 1
Grouping 9 base learning objectives
Received 9 grouped results
Generating incorrect answer options only for best-in-group objectives...
PARALLEL: Starting ThreadPoolExecutor with 3 workers
PARALLEL: Worker 1 (Thread ID: 12345): Starting work on objective...
Question generation completed in 45.23 seconds

Performance Considerations

API Call Optimization

Calls per Workflow:

For 3 objectives × 3 runs = 9 base objectives:

Learning Objectives:
- Base generation: 3 calls (one per run)
- Correct answers: 9 calls (one per objective)
- Grouping: 1 call
- Incorrect answers: ~3 calls (best-in-group only)
- Improvement checks: ~3 calls
- Total: ~19 calls
Questions (for 3 objectives × 1 run):
- Question generation: 3 calls (parallel)
- Quality judging: 3 calls (parallel)
- Grouping: 1 call
- Ranking: 1 call
- Total: ~8 calls

Total for complete workflow: ~27 API calls

Time Estimates

Typical Execution Times:

File processing: <1 second
Objective generation (3×3): 30-60 seconds
Question generation (3×1): 20-40 seconds (with parallelization)
Total: 1-2 minutes for small course

Factors Affecting Speed:

Model selection (gpt-5 slower than gpt-5-mini)
Number of runs
Number of objectives/questions
API rate limits
Network latency
Parallel worker count

Cost Optimization

Strategies:

Use gpt-5-mini for grouping/ranking (hardcoded)
Reduce number of runs (trade-off: variety)
Generate fewer objectives initially
Use faster models for initial exploration
Use premium models for final production

Conclusion

The AI Course Assessment Generator is a sophisticated, multi-stage system that transforms raw course materials into high-quality educational assessments. It employs:

Modular architecture for maintainability
Structured output generation for reliability
Quality-driven iterative refinement for excellence
Parallel processing for efficiency
Comprehensive error handling for resilience

The system successfully balances automation with quality control, producing assessments that align with educational best practices and Bloom's Taxonomy while maintaining complete traceability to source materials.

AI Course Assessment Generator - Functionality Report

Table of Contents

Overview

Key Capabilities

System Architecture

Architectural Patterns

1. Orchestrator Pattern

2. Modular Prompt System

3. Structured Output Generation

4. Source Tracking via XML Tags

Technology Stack

Data Models

Learning Objectives Progression

1. BaseLearningObjectiveWithoutCorrectAnswer

2. BaseLearningObjective

3. LearningObjective

4. GroupedLearningObjective

Question Models Progression

1. MultipleChoiceOption

2. MultipleChoiceQuestion

3. RankedMultipleChoiceQuestion

4. Assessment

Configuration Models

MODELS

TEMPERATURE_UNAVAILABLE

Application Entry Point

app.py

User Interface Structure

ui/app.py - Gradio Interface

Tab 1: Generate Learning Objectives

Tab 2: Generate Questions

Tab 3: Propose/Edit Question

Complete Workflow

Phase 1: File Upload and Content Processing

Step 1.1: File Upload

Step 1.2: File Path Extraction (objective_handlers._extract_file_paths())

Step 1.3: Content Processing (ui/content_processor.py)

Step 1.4: State Storage

Phase 2: Learning Objective Generation

Step 2.1: Multi-Run Base Generation

Step 2.2: Grouping and Ranking

Step 2.3: Display Results

Step 2.4: State Update

Phase 3: Question Generation

Step 3.1: Parse Learning Objectives

Step 3.2: Multi-Run Question Generation

Step 3.3: Grouping Questions

Step 3.4: Ranking Questions

Step 3.5: Format Results

Phase 4: Custom Question Generation (Optional)

Step 4.1: User Input

Step 4.2: Generation

Phase 5: Assessment Export (Automated)

Detailed Component Functionality

Content Processor (ui/content_processor.py)

Learning Objective Generator (learning_objective_generator/)

generator.py - LearningObjectiveGenerator Class

base_generation.py

grouping_and_ranking.py

enhancement.py

suggestion_improvement.py

Quiz Generator (quiz_generator/)

generator.py - QuizGenerator Class

question_generation.py

question_ranking.py

question_improvement.py

feedback_questions.py

assessment.py

State Management (ui/state.py)

UI Handlers

objective_handlers.py

question_handlers.py

feedback_handlers.py

Formatting Utilities (ui/formatting.py)

Quality Standards and Prompts

Learning Objectives Quality Standards

Question Quality Standards

Incorrect Answer Generation Guidelines

Ranking and Grouping

Summary of Data Flow

`app.py`

`ui/app.py` - Gradio Interface

Step 1.2: File Path Extraction (`objective_handlers._extract_file_paths()`)

Step 1.3: Content Processing (`ui/content_processor.py`)

Content Processor (`ui/content_processor.py`)

Learning Objective Generator (`learning_objective_generator/`)

Quiz Generator (`quiz_generator/`)

State Management (`ui/state.py`)

Formatting Utilities (`ui/formatting.py`)