agentbee

Sleeping

App Files Files Community

agentbee / README.md

mangubee

16.12.2025 - project analysis

fa94723 4 months ago

9.48 kB

title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Project Overview

Project Name: Final_Assignment_Template

Purpose: Course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). This project serves as a learning-focused workspace to support iterative agent development and experimentation.

Target Users: Students learning agent development through hands-on implementation

Key Objectives:

Build production-ready code that passes GAIA test requirements
Learn agent development through discovery-based implementation
Develop systematic approach to complex AI task solving
Document learning process and key decisions

Project Architecture

Technology Stack:

Platform: Hugging Face Spaces with OAuth integration
Framework: Gradio (UI), Requests (API communication)
Language: Python 3.x

Project Structure:

Final_Assignment_Template/
├── archive/         # Reference materials, previous solutions, static resources
├── input/           # Input files, configuration, raw data
├── output/          # Generated files, results, processed data
├── test/            # Testing files, test scripts, development records
├── dev/             # Development records (permanent knowledge packages)
├── app.py           # Main application file with BasicAgent and Gradio UI
├── requirements.txt # Python dependencies
├── README.md        # Project overview, architecture, workflow, specification
├── CLAUDE.md        # Project-specific AI instructions
├── PLAN.md          # Active implementation plan (temporary workspace)
├── TODO.md          # Active task tracking (temporary workspace)
└── CHANGELOG.md     # Session changelog (temporary workspace)

Core Components:

BasicAgent class: Student-customizable template for agent logic implementation
run_and_submit_all function: Evaluation orchestration (question fetching, submission, scoring)
Gradio UI: Login button + evaluation trigger + results display
API integration: Connection to external scoring service

System Architecture Diagram:

---
config:
  layout: elk
---
graph TB
    subgraph "Student Development"
        BasicAgent[BasicAgent Class<br/>__call__ method<br/>Custom logic here]
    end

    subgraph "Provided Infrastructure"
        GradioUI[Gradio UI<br/>Login + Run Button<br/>Results Display]
        Orchestrator[run_and_submit_all Function<br/>Workflow orchestration]
        OAuth[HF OAuth<br/>User authentication]
    end

    subgraph "External Services"
        API[Scoring API<br/>agents-course-unit4-scoring.hf.space]
        QEndpoint["/questions endpoint"]
        SEndpoint["/submit endpoint"]
    end

    subgraph "HF Space Environment"
        EnvVars[Environment Variables<br/>SPACE_ID, SPACE_HOST]
    end

    GradioUI --> OAuth
    OAuth -->|Authenticated| Orchestrator
    Orchestrator --> QEndpoint
    QEndpoint -->|GAIA questions| Orchestrator
    Orchestrator -->|For each question| BasicAgent
    BasicAgent -->|Answer| Orchestrator
    Orchestrator -->|All answers| SEndpoint
    SEndpoint -->|Score & results| Orchestrator
    Orchestrator --> GradioUI
    EnvVars -.->|Used by| Orchestrator

    style BasicAgent fill:#ffcccc
    style GradioUI fill:#cce5ff
    style Orchestrator fill:#cce5ff
    style API fill:#d9f2d9

Project Specification

Project Context:

This is a course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). The project was recently started as a learning-focused workspace to support iterative agent development and experimentation.

Current State:

Status: Early development phase (within first week)
Purpose: Build production-ready code that passes GAIA test requirements
Learning Objective: Discovery-based development where students design and implement agent capabilities themselves

Data & Workflows:

Input Data: GAIA test questions fetched from external scoring API (agents-course-unit4-scoring.hf.space)
Processing: BasicAgent class processes questions and generates answers
Output: Agent responses submitted to scoring endpoint for evaluation
Development Workflow:
1. Local development and testing
2. Deploy to Hugging Face Space
3. Submit via integrated evaluation UI

User Workflow Diagram:

---
config:
  layout: fixed
---
flowchart TB
    Start(["Student starts assignment"]) --> Clone["Clone HF Space template"]
    Clone --> LocalDev["Local development:<br>Implement BasicAgent logic"]
    LocalDev --> LocalTest{"Test locally?"}
    LocalTest -- Yes --> RunLocal["Run app locally"]
    RunLocal --> Debug{"Works?"}
    Debug -- No --> LocalDev
    Debug -- Yes --> Deploy["Deploy to HF Space"]
    LocalTest -- Skip --> Deploy
    Deploy --> Login["Login with HF OAuth"]
    Login --> RunEval@{ label: "Click 'Run Evaluation'<br>button in UI" }
    RunEval --> FetchQ["System fetches GAIA<br>questions from API"]
    FetchQ --> RunAgent["Agent processes<br>each question"]
    RunAgent --> Submit["Submit answers<br>to scoring API"]
    Submit --> Display["Display score<br>and results"]
    Display --> Iterate{"Satisfied with<br>score?"}
    Iterate -- "No - improve agent" --> LocalDev
    Iterate -- Yes --> Complete(["Assignment complete"])

    RunEval@{ shape: rect}
    style Start fill:#e1f5e1
    style LocalDev fill:#fff4e1
    style Deploy fill:#e1f0ff
    style RunAgent fill:#ffe1f0
    style Complete fill:#e1f5e1

Technical Architecture:

Platform: Hugging Face Spaces with OAuth integration
Framework: Gradio for UI, Requests for API communication
Core Component: BasicAgent class (student-customizable template)
Evaluation Infrastructure: Pre-built orchestration (question fetching, submission, scoring display)
Deployment: HF Space with environment variables (SPACE_ID, SPACE_HOST)

Requirements & Constraints:

Constraint Type: Minimal at current stage
Infrastructure: Must run on Hugging Face Spaces platform
Integration: Fixed scoring API endpoints (cannot modify evaluation system)
Flexibility: Students have full freedom to design agent capabilities

Integration Points:

External API: https://agents-course-unit4-scoring.hf.space
- /questions endpoint: Fetch GAIA test questions
- /submit endpoint: Submit answers and receive scores
Authentication: Hugging Face OAuth for student identification
Deployment: HF Space runtime environment variables

Development Goals:

Primary: Organized development environment supporting iterative experimentation
Focus: Learning process - students discover optimal approaches through implementation
Structure: Workspace that tracks experiments, tests, and development progress
Documentation: Capture decisions and learnings throughout development cycle

Workflow

Dev Record Workflow

Philosophy: Dev records are the single source of truth. CHANGELOG/PLAN/TODO are temporary workspace files.

Dev Record Types:

🐞 Issue: Problem-solving, bug fixes, error resolution
🔨 Development: Feature development, enhancements, new functionality

Session Start Workflow

Phase 1: Planning (Explicit)

Create or identify dev record: dev/dev_YYMMDD_##_concise_title.md
- Choose type: 🐞 Issue or 🔨 Development
Create PLAN.md ONLY: Use /plan command or write directly
- Document implementation approach, steps, files to modify
- DO NOT create TODO.md or CHANGELOG.md yet

Phase 2: Development (Automatic)

Create TODO.md: Automatically populate as you start implementing
- Track tasks in real-time using TodoWrite tool
- Mark in_progress/completed as you work
Create CHANGELOG.md: Automatically populate as you make changes
- Record file modifications/creations/deletions as they happen
Work on solution: Update all three files during development

Session End Workflow

Phase 3: Completion (Manual)

After AI completes all work and updates PLAN/TODO/CHANGELOG:

AI stops and waits for user review (Checkpoint 3)
User reviews PLAN.md, TODO.md, and CHANGELOG.md
User manually runs /update-dev dev_YYMMDD_## when satisfied

When /update-dev runs:

Distills PLAN decisions → dev record "Key Decisions" section
Distills TODO deliverables → dev record "Outcome" section
Distills CHANGELOG changes → dev record "Changelog" section
Empties PLAN.md, TODO.md, CHANGELOG.md back to templates
Marks dev record status as ✅ Resolved

AI Context Loading

When new AI session starts:

Read last 2-3 dev records for recent context (NOT CHANGELOG)
- Dev records sorted by date: newest dev_YYMMDD_##_title.md files first
Read README.md for project structure
Read CLAUDE.md for coding standards
Check PLAN.md/TODO.md for active work (if any)

Do NOT read entire CHANGELOG for context - it's a temporary workspace, not a historical record.