Spaces:

DeepLearningAI
/

quiz-generator-v3

Sleeping

File size: 4,272 Bytes

217abc3

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Development Commands

### Running the Application
```bash
python app.py
```
The application will start a Gradio web interface at http://127.0.0.1:7860

### Environment Setup
```bash
# Using uv (recommended)
uv venv -p 3.12
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

# Using pip
pip install -r requirements.txt
```

### Environment Variables
Create a `.env` file with:
```
OPENAI_API_KEY=your_api_key_here
```

## Architecture Overview

This is an AI Course Assessment Generator that creates learning objectives and multiple-choice questions from course materials. The system uses OpenAI's language models with structured output generation via the `instructor` library.

### Core Workflow
1. **Content Processing**: Upload course materials (.vtt, .srt, .ipynb) → Extract and tag content with XML source references
2. **Learning Objective Generation**: Generate base objectives → Group and rank → Enhance with incorrect answer suggestions
3. **Question Generation**: Create multiple-choice questions from objectives → Quality assessment → Ranking and grouping
4. **Assessment Export**: Save final assessment to JSON format

### Key Architecture Patterns

**Modular Prompt System**: The `prompts/` directory contains reusable prompt components that are imported and combined in generation modules. This allows for consistent quality standards across different generation tasks.

**Orchestrator Pattern**: Both `LearningObjectiveGenerator` and `QuizGenerator` act as orchestrators that coordinate calls to specialized generation functions rather than implementing generation logic directly.

**Structured Output**: All LLM interactions use Pydantic models with the `instructor` library to ensure consistent, validated output formats.

**Source Tracking**: Content is wrapped in XML tags (e.g., `<source file="example.ipynb">content</source>`) throughout the pipeline to maintain traceability from source files to generated questions.

## Key Components

### Main Generators
- `LearningObjectiveGenerator` (`learning_objective_generator/generator.py`): Orchestrates learning objective generation, grouping, and enhancement
- `QuizGenerator` (`quiz_generator/generator.py`): Orchestrates question generation, quality assessment, and ranking

### Data Models (`models/`)
- Learning objectives progress from `BaseLearningObjective` → `LearningObjective` (with incorrect answers) → `GroupedLearningObjective`
- Questions progress from `MultipleChoiceQuestion` → `RankedMultipleChoiceQuestion` → `GroupedMultipleChoiceQuestion`
- Final output is an `Assessment` containing both objectives and questions

### Generation Pipeline
1. **Base Generation**: Create initial learning objectives from content
2. **Grouping & Ranking**: Group similar objectives and select best in each group  
3. **Enhancement**: Add incorrect answer suggestions to selected objectives
4. **Question Generation**: Create multiple-choice questions with feedback
5. **Quality Assessment**: Use LLM judge to evaluate question quality
6. **Final Ranking**: Rank and group questions for output

### UI Structure (`ui/`)
- `app.py`: Gradio interface with tabs for objectives, questions, and export
- Handler modules process user interactions and coordinate with generators
- State management tracks data between UI components

## Development Notes

### Model Configuration
- Default model: `gpt-5` with temperature `1.0`
- Separate model selection for incorrect answer generation (typically `o1`)
- Quality assessment often uses `gpt-5-mini` for cost efficiency

### Content Processing
- Supports `.vtt/.srt` subtitle files and `.ipynb` Jupyter notebooks
- All content is tagged with XML source references for traceability
- Content processor handles multiple file formats uniformly

### Quality Standards
The system enforces educational quality through modular prompt components:
- General quality standards apply to all generated content
- Specific standards for questions, correct answers, and incorrect answers
- Bloom's taxonomy integration for appropriate learning levels
- Example-based prompting for consistency