title: Template Final Assignment
emoji: π΅π»ββοΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Project Overview
Project Name: Final_Assignment_Template
Purpose: Course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). This project serves as a learning-focused workspace to support iterative agent development and experimentation.
Target Users: Students learning agent development through hands-on implementation
Key Objectives:
- Build production-ready code that passes GAIA test requirements
- Learn agent development through discovery-based implementation
- Develop systematic approach to complex AI task solving
- Document learning process and key decisions
Project Architecture
Technology Stack:
- Platform: Hugging Face Spaces with OAuth integration
- Framework: Gradio (UI), Requests (API communication)
- Language: Python 3.x
Project Structure:
Final_Assignment_Template/
βββ archive/ # Reference materials, previous solutions, static resources
βββ input/ # Input files, configuration, raw data
βββ output/ # Generated files, results, processed data
βββ test/ # Testing files, test scripts, development records
βββ dev/ # Development records (permanent knowledge packages)
βββ app.py # Main application file with BasicAgent and Gradio UI
βββ requirements.txt # Python dependencies
βββ README.md # Project overview, architecture, workflow, specification
βββ CLAUDE.md # Project-specific AI instructions
βββ PLAN.md # Active implementation plan (temporary workspace)
βββ TODO.md # Active task tracking (temporary workspace)
βββ CHANGELOG.md # Session changelog (temporary workspace)
Core Components:
- BasicAgent class: Student-customizable template for agent logic implementation
- run_and_submit_all function: Evaluation orchestration (question fetching, submission, scoring)
- Gradio UI: Login button + evaluation trigger + results display
- API integration: Connection to external scoring service
System Architecture Diagram:
---
config:
layout: elk
---
graph TB
subgraph "Student Development"
BasicAgent[BasicAgent Class<br/>__call__ method<br/>Custom logic here]
end
subgraph "Provided Infrastructure"
GradioUI[Gradio UI<br/>Login + Run Button<br/>Results Display]
Orchestrator[run_and_submit_all Function<br/>Workflow orchestration]
OAuth[HF OAuth<br/>User authentication]
end
subgraph "External Services"
API[Scoring API<br/>agents-course-unit4-scoring.hf.space]
QEndpoint["/questions endpoint"]
SEndpoint["/submit endpoint"]
end
subgraph "HF Space Environment"
EnvVars[Environment Variables<br/>SPACE_ID, SPACE_HOST]
end
GradioUI --> OAuth
OAuth -->|Authenticated| Orchestrator
Orchestrator --> QEndpoint
QEndpoint -->|GAIA questions| Orchestrator
Orchestrator -->|For each question| BasicAgent
BasicAgent -->|Answer| Orchestrator
Orchestrator -->|All answers| SEndpoint
SEndpoint -->|Score & results| Orchestrator
Orchestrator --> GradioUI
EnvVars -.->|Used by| Orchestrator
style BasicAgent fill:#ffcccc
style GradioUI fill:#cce5ff
style Orchestrator fill:#cce5ff
style API fill:#d9f2d9
Project Specification
Project Context:
This is a course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). The project was recently started as a learning-focused workspace to support iterative agent development and experimentation.
Current State:
- Status: Early development phase (within first week)
- Purpose: Build production-ready code that passes GAIA test requirements
- Learning Objective: Discovery-based development where students design and implement agent capabilities themselves
Data & Workflows:
- Input Data: GAIA test questions fetched from external scoring API (
agents-course-unit4-scoring.hf.space) - Processing: BasicAgent class processes questions and generates answers
- Output: Agent responses submitted to scoring endpoint for evaluation
- Development Workflow:
- Local development and testing
- Deploy to Hugging Face Space
- Submit via integrated evaluation UI
User Workflow Diagram:
---
config:
layout: fixed
---
flowchart TB
Start(["Student starts assignment"]) --> Clone["Clone HF Space template"]
Clone --> LocalDev["Local development:<br>Implement BasicAgent logic"]
LocalDev --> LocalTest{"Test locally?"}
LocalTest -- Yes --> RunLocal["Run app locally"]
RunLocal --> Debug{"Works?"}
Debug -- No --> LocalDev
Debug -- Yes --> Deploy["Deploy to HF Space"]
LocalTest -- Skip --> Deploy
Deploy --> Login["Login with HF OAuth"]
Login --> RunEval@{ label: "Click 'Run Evaluation'<br>button in UI" }
RunEval --> FetchQ["System fetches GAIA<br>questions from API"]
FetchQ --> RunAgent["Agent processes<br>each question"]
RunAgent --> Submit["Submit answers<br>to scoring API"]
Submit --> Display["Display score<br>and results"]
Display --> Iterate{"Satisfied with<br>score?"}
Iterate -- "No - improve agent" --> LocalDev
Iterate -- Yes --> Complete(["Assignment complete"])
RunEval@{ shape: rect}
style Start fill:#e1f5e1
style LocalDev fill:#fff4e1
style Deploy fill:#e1f0ff
style RunAgent fill:#ffe1f0
style Complete fill:#e1f5e1
Technical Architecture:
- Platform: Hugging Face Spaces with OAuth integration
- Framework: Gradio for UI, Requests for API communication
- Core Component: BasicAgent class (student-customizable template)
- Evaluation Infrastructure: Pre-built orchestration (question fetching, submission, scoring display)
- Deployment: HF Space with environment variables (SPACE_ID, SPACE_HOST)
Requirements & Constraints:
- Constraint Type: Minimal at current stage
- Infrastructure: Must run on Hugging Face Spaces platform
- Integration: Fixed scoring API endpoints (cannot modify evaluation system)
- Flexibility: Students have full freedom to design agent capabilities
Integration Points:
- External API:
https://agents-course-unit4-scoring.hf.space/questionsendpoint: Fetch GAIA test questions/submitendpoint: Submit answers and receive scores
- Authentication: Hugging Face OAuth for student identification
- Deployment: HF Space runtime environment variables
Development Goals:
- Primary: Organized development environment supporting iterative experimentation
- Focus: Learning process - students discover optimal approaches through implementation
- Structure: Workspace that tracks experiments, tests, and development progress
- Documentation: Capture decisions and learnings throughout development cycle
Workflow
Dev Record Workflow
Philosophy: Dev records are the single source of truth. CHANGELOG/PLAN/TODO are temporary workspace files.
Dev Record Types:
- π Issue: Problem-solving, bug fixes, error resolution
- π¨ Development: Feature development, enhancements, new functionality
Session Start Workflow
Phase 1: Planning (Explicit)
- Create or identify dev record:
dev/dev_YYMMDD_##_concise_title.md- Choose type: π Issue or π¨ Development
- Create PLAN.md ONLY: Use
/plancommand or write directly- Document implementation approach, steps, files to modify
- DO NOT create TODO.md or CHANGELOG.md yet
Phase 2: Development (Automatic)
- Create TODO.md: Automatically populate as you start implementing
- Track tasks in real-time using TodoWrite tool
- Mark in_progress/completed as you work
- Create CHANGELOG.md: Automatically populate as you make changes
- Record file modifications/creations/deletions as they happen
- Work on solution: Update all three files during development
Session End Workflow
Phase 3: Completion (Manual)
After AI completes all work and updates PLAN/TODO/CHANGELOG:
- AI stops and waits for user review (Checkpoint 3)
- User reviews PLAN.md, TODO.md, and CHANGELOG.md
- User manually runs
/update-dev dev_YYMMDD_##when satisfied
When /update-dev runs:
- Distills PLAN decisions β dev record "Key Decisions" section
- Distills TODO deliverables β dev record "Outcome" section
- Distills CHANGELOG changes β dev record "Changelog" section
- Empties PLAN.md, TODO.md, CHANGELOG.md back to templates
- Marks dev record status as β Resolved
AI Context Loading
When new AI session starts:
- Read last 2-3 dev records for recent context (NOT CHANGELOG)
- Dev records sorted by date: newest
dev_YYMMDD_##_title.mdfiles first
- Dev records sorted by date: newest
- Read README.md for project structure
- Read CLAUDE.md for coding standards
- Check PLAN.md/TODO.md for active work (if any)
Do NOT read entire CHANGELOG for context - it's a temporary workspace, not a historical record.