16.12.2025 - project analysis
Browse files- CHANGELOG.md +22 -0
- CLAUDE.md +4 -0
- PLAN.md +25 -0
- README.md +240 -1
- TODO.md +14 -0
- app.py +50 -26
- requirements.txt +1 -0
CHANGELOG.md
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Session Changelog
|
| 2 |
+
|
| 3 |
+
**Session Date:** [YYYY-MM-DD]
|
| 4 |
+
**Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
|
| 5 |
+
|
| 6 |
+
## Changes Made
|
| 7 |
+
|
| 8 |
+
### Created Files
|
| 9 |
+
|
| 10 |
+
- [file path] - [Purpose/description]
|
| 11 |
+
|
| 12 |
+
### Modified Files
|
| 13 |
+
|
| 14 |
+
- [file path] - [What was changed]
|
| 15 |
+
|
| 16 |
+
### Deleted Files
|
| 17 |
+
|
| 18 |
+
- [file path] - [Reason for deletion]
|
| 19 |
+
|
| 20 |
+
## Notes
|
| 21 |
+
|
| 22 |
+
[Any additional context about the session's work]
|
CLAUDE.md
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project-Specific Instructions
|
| 2 |
+
|
| 3 |
+
[Leave empty unless project requires special behavior]
|
| 4 |
+
[Inherits all rules from ~/.claude/CLAUDE.md]
|
PLAN.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Plan
|
| 2 |
+
|
| 3 |
+
**Date:** [YYYY-MM-DD]
|
| 4 |
+
**Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
|
| 5 |
+
**Status:** [Planning | In Progress | Completed]
|
| 6 |
+
|
| 7 |
+
## Objective
|
| 8 |
+
|
| 9 |
+
[Clear goal statement from task description]
|
| 10 |
+
|
| 11 |
+
## Steps
|
| 12 |
+
|
| 13 |
+
1. [Step 1]
|
| 14 |
+
2. [Step 2]
|
| 15 |
+
3. [Step 3]
|
| 16 |
+
|
| 17 |
+
## Files to Modify
|
| 18 |
+
|
| 19 |
+
- [file1.py]
|
| 20 |
+
- [file2.md]
|
| 21 |
+
|
| 22 |
+
## Success Criteria
|
| 23 |
+
|
| 24 |
+
- [ ] [Criterion 1]
|
| 25 |
+
- [ ] [Criterion 2]
|
README.md
CHANGED
|
@@ -12,4 +12,243 @@ hf_oauth: true
|
|
| 12 |
hf_oauth_expiration_minutes: 480
|
| 13 |
---
|
| 14 |
|
| 15 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
hf_oauth_expiration_minutes: 480
|
| 13 |
---
|
| 14 |
|
| 15 |
+
Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
|
| 16 |
+
|
| 17 |
+
## Project Overview
|
| 18 |
+
|
| 19 |
+
**Project Name:** Final_Assignment_Template
|
| 20 |
+
|
| 21 |
+
**Purpose:** Course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). This project serves as a learning-focused workspace to support iterative agent development and experimentation.
|
| 22 |
+
|
| 23 |
+
**Target Users:** Students learning agent development through hands-on implementation
|
| 24 |
+
|
| 25 |
+
**Key Objectives:**
|
| 26 |
+
|
| 27 |
+
- Build production-ready code that passes GAIA test requirements
|
| 28 |
+
- Learn agent development through discovery-based implementation
|
| 29 |
+
- Develop systematic approach to complex AI task solving
|
| 30 |
+
- Document learning process and key decisions
|
| 31 |
+
|
| 32 |
+
## Project Architecture
|
| 33 |
+
|
| 34 |
+
**Technology Stack:**
|
| 35 |
+
|
| 36 |
+
- Platform: Hugging Face Spaces with OAuth integration
|
| 37 |
+
- Framework: Gradio (UI), Requests (API communication)
|
| 38 |
+
- Language: Python 3.x
|
| 39 |
+
|
| 40 |
+
**Project Structure:**
|
| 41 |
+
|
| 42 |
+
```
|
| 43 |
+
Final_Assignment_Template/
|
| 44 |
+
├── archive/ # Reference materials, previous solutions, static resources
|
| 45 |
+
├── input/ # Input files, configuration, raw data
|
| 46 |
+
├── output/ # Generated files, results, processed data
|
| 47 |
+
├── test/ # Testing files, test scripts, development records
|
| 48 |
+
├── dev/ # Development records (permanent knowledge packages)
|
| 49 |
+
├── app.py # Main application file with BasicAgent and Gradio UI
|
| 50 |
+
├── requirements.txt # Python dependencies
|
| 51 |
+
├── README.md # Project overview, architecture, workflow, specification
|
| 52 |
+
├── CLAUDE.md # Project-specific AI instructions
|
| 53 |
+
├── PLAN.md # Active implementation plan (temporary workspace)
|
| 54 |
+
├── TODO.md # Active task tracking (temporary workspace)
|
| 55 |
+
└── CHANGELOG.md # Session changelog (temporary workspace)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
**Core Components:**
|
| 59 |
+
|
| 60 |
+
- BasicAgent class: Student-customizable template for agent logic implementation
|
| 61 |
+
- run_and_submit_all function: Evaluation orchestration (question fetching, submission, scoring)
|
| 62 |
+
- Gradio UI: Login button + evaluation trigger + results display
|
| 63 |
+
- API integration: Connection to external scoring service
|
| 64 |
+
|
| 65 |
+
**System Architecture Diagram:**
|
| 66 |
+
|
| 67 |
+
```mermaid
|
| 68 |
+
---
|
| 69 |
+
config:
|
| 70 |
+
layout: elk
|
| 71 |
+
---
|
| 72 |
+
graph TB
|
| 73 |
+
subgraph "Student Development"
|
| 74 |
+
BasicAgent[BasicAgent Class<br/>__call__ method<br/>Custom logic here]
|
| 75 |
+
end
|
| 76 |
+
|
| 77 |
+
subgraph "Provided Infrastructure"
|
| 78 |
+
GradioUI[Gradio UI<br/>Login + Run Button<br/>Results Display]
|
| 79 |
+
Orchestrator[run_and_submit_all Function<br/>Workflow orchestration]
|
| 80 |
+
OAuth[HF OAuth<br/>User authentication]
|
| 81 |
+
end
|
| 82 |
+
|
| 83 |
+
subgraph "External Services"
|
| 84 |
+
API[Scoring API<br/>agents-course-unit4-scoring.hf.space]
|
| 85 |
+
QEndpoint["/questions endpoint"]
|
| 86 |
+
SEndpoint["/submit endpoint"]
|
| 87 |
+
end
|
| 88 |
+
|
| 89 |
+
subgraph "HF Space Environment"
|
| 90 |
+
EnvVars[Environment Variables<br/>SPACE_ID, SPACE_HOST]
|
| 91 |
+
end
|
| 92 |
+
|
| 93 |
+
GradioUI --> OAuth
|
| 94 |
+
OAuth -->|Authenticated| Orchestrator
|
| 95 |
+
Orchestrator --> QEndpoint
|
| 96 |
+
QEndpoint -->|GAIA questions| Orchestrator
|
| 97 |
+
Orchestrator -->|For each question| BasicAgent
|
| 98 |
+
BasicAgent -->|Answer| Orchestrator
|
| 99 |
+
Orchestrator -->|All answers| SEndpoint
|
| 100 |
+
SEndpoint -->|Score & results| Orchestrator
|
| 101 |
+
Orchestrator --> GradioUI
|
| 102 |
+
EnvVars -.->|Used by| Orchestrator
|
| 103 |
+
|
| 104 |
+
style BasicAgent fill:#ffcccc
|
| 105 |
+
style GradioUI fill:#cce5ff
|
| 106 |
+
style Orchestrator fill:#cce5ff
|
| 107 |
+
style API fill:#d9f2d9
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## Project Specification
|
| 111 |
+
|
| 112 |
+
**Project Context:**
|
| 113 |
+
|
| 114 |
+
This is a course assignment template for building an AI agent that passes the GAIA benchmark (General AI Assistants). The project was recently started as a learning-focused workspace to support iterative agent development and experimentation.
|
| 115 |
+
|
| 116 |
+
**Current State:**
|
| 117 |
+
|
| 118 |
+
- **Status:** Early development phase (within first week)
|
| 119 |
+
- **Purpose:** Build production-ready code that passes GAIA test requirements
|
| 120 |
+
- **Learning Objective:** Discovery-based development where students design and implement agent capabilities themselves
|
| 121 |
+
|
| 122 |
+
**Data & Workflows:**
|
| 123 |
+
|
| 124 |
+
- **Input Data:** GAIA test questions fetched from external scoring API (`agents-course-unit4-scoring.hf.space`)
|
| 125 |
+
- **Processing:** BasicAgent class processes questions and generates answers
|
| 126 |
+
- **Output:** Agent responses submitted to scoring endpoint for evaluation
|
| 127 |
+
- **Development Workflow:**
|
| 128 |
+
1. Local development and testing
|
| 129 |
+
2. Deploy to Hugging Face Space
|
| 130 |
+
3. Submit via integrated evaluation UI
|
| 131 |
+
|
| 132 |
+
**User Workflow Diagram:**
|
| 133 |
+
|
| 134 |
+
```mermaid
|
| 135 |
+
---
|
| 136 |
+
config:
|
| 137 |
+
layout: fixed
|
| 138 |
+
---
|
| 139 |
+
flowchart TB
|
| 140 |
+
Start(["Student starts assignment"]) --> Clone["Clone HF Space template"]
|
| 141 |
+
Clone --> LocalDev["Local development:<br>Implement BasicAgent logic"]
|
| 142 |
+
LocalDev --> LocalTest{"Test locally?"}
|
| 143 |
+
LocalTest -- Yes --> RunLocal["Run app locally"]
|
| 144 |
+
RunLocal --> Debug{"Works?"}
|
| 145 |
+
Debug -- No --> LocalDev
|
| 146 |
+
Debug -- Yes --> Deploy["Deploy to HF Space"]
|
| 147 |
+
LocalTest -- Skip --> Deploy
|
| 148 |
+
Deploy --> Login["Login with HF OAuth"]
|
| 149 |
+
Login --> RunEval@{ label: "Click 'Run Evaluation'<br>button in UI" }
|
| 150 |
+
RunEval --> FetchQ["System fetches GAIA<br>questions from API"]
|
| 151 |
+
FetchQ --> RunAgent["Agent processes<br>each question"]
|
| 152 |
+
RunAgent --> Submit["Submit answers<br>to scoring API"]
|
| 153 |
+
Submit --> Display["Display score<br>and results"]
|
| 154 |
+
Display --> Iterate{"Satisfied with<br>score?"}
|
| 155 |
+
Iterate -- "No - improve agent" --> LocalDev
|
| 156 |
+
Iterate -- Yes --> Complete(["Assignment complete"])
|
| 157 |
+
|
| 158 |
+
RunEval@{ shape: rect}
|
| 159 |
+
style Start fill:#e1f5e1
|
| 160 |
+
style LocalDev fill:#fff4e1
|
| 161 |
+
style Deploy fill:#e1f0ff
|
| 162 |
+
style RunAgent fill:#ffe1f0
|
| 163 |
+
style Complete fill:#e1f5e1
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
**Technical Architecture:**
|
| 167 |
+
|
| 168 |
+
- **Platform:** Hugging Face Spaces with OAuth integration
|
| 169 |
+
- **Framework:** Gradio for UI, Requests for API communication
|
| 170 |
+
- **Core Component:** BasicAgent class (student-customizable template)
|
| 171 |
+
- **Evaluation Infrastructure:** Pre-built orchestration (question fetching, submission, scoring display)
|
| 172 |
+
- **Deployment:** HF Space with environment variables (SPACE_ID, SPACE_HOST)
|
| 173 |
+
|
| 174 |
+
**Requirements & Constraints:**
|
| 175 |
+
|
| 176 |
+
- **Constraint Type:** Minimal at current stage
|
| 177 |
+
- **Infrastructure:** Must run on Hugging Face Spaces platform
|
| 178 |
+
- **Integration:** Fixed scoring API endpoints (cannot modify evaluation system)
|
| 179 |
+
- **Flexibility:** Students have full freedom to design agent capabilities
|
| 180 |
+
|
| 181 |
+
**Integration Points:**
|
| 182 |
+
|
| 183 |
+
- **External API:** `https://agents-course-unit4-scoring.hf.space`
|
| 184 |
+
- `/questions` endpoint: Fetch GAIA test questions
|
| 185 |
+
- `/submit` endpoint: Submit answers and receive scores
|
| 186 |
+
- **Authentication:** Hugging Face OAuth for student identification
|
| 187 |
+
- **Deployment:** HF Space runtime environment variables
|
| 188 |
+
|
| 189 |
+
**Development Goals:**
|
| 190 |
+
|
| 191 |
+
- **Primary:** Organized development environment supporting iterative experimentation
|
| 192 |
+
- **Focus:** Learning process - students discover optimal approaches through implementation
|
| 193 |
+
- **Structure:** Workspace that tracks experiments, tests, and development progress
|
| 194 |
+
- **Documentation:** Capture decisions and learnings throughout development cycle
|
| 195 |
+
|
| 196 |
+
## Workflow
|
| 197 |
+
|
| 198 |
+
### Dev Record Workflow
|
| 199 |
+
|
| 200 |
+
**Philosophy:** Dev records are the single source of truth. CHANGELOG/PLAN/TODO are temporary workspace files.
|
| 201 |
+
|
| 202 |
+
**Dev Record Types:**
|
| 203 |
+
|
| 204 |
+
- 🐞 **Issue:** Problem-solving, bug fixes, error resolution
|
| 205 |
+
- 🔨 **Development:** Feature development, enhancements, new functionality
|
| 206 |
+
|
| 207 |
+
### Session Start Workflow
|
| 208 |
+
|
| 209 |
+
#### Phase 1: Planning (Explicit)
|
| 210 |
+
|
| 211 |
+
1. **Create or identify dev record:** `dev/dev_YYMMDD_##_concise_title.md`
|
| 212 |
+
- Choose type: 🐞 Issue or 🔨 Development
|
| 213 |
+
2. **Create PLAN.md ONLY:** Use `/plan` command or write directly
|
| 214 |
+
- Document implementation approach, steps, files to modify
|
| 215 |
+
- DO NOT create TODO.md or CHANGELOG.md yet
|
| 216 |
+
|
| 217 |
+
#### Phase 2: Development (Automatic)
|
| 218 |
+
|
| 219 |
+
3. **Create TODO.md:** Automatically populate as you start implementing
|
| 220 |
+
- Track tasks in real-time using TodoWrite tool
|
| 221 |
+
- Mark in_progress/completed as you work
|
| 222 |
+
4. **Create CHANGELOG.md:** Automatically populate as you make changes
|
| 223 |
+
- Record file modifications/creations/deletions as they happen
|
| 224 |
+
5. **Work on solution:** Update all three files during development
|
| 225 |
+
|
| 226 |
+
### Session End Workflow
|
| 227 |
+
|
| 228 |
+
#### Phase 3: Completion (Manual)
|
| 229 |
+
|
| 230 |
+
After AI completes all work and updates PLAN/TODO/CHANGELOG:
|
| 231 |
+
|
| 232 |
+
- AI stops and waits for user review (Checkpoint 3)
|
| 233 |
+
- User reviews PLAN.md, TODO.md, and CHANGELOG.md
|
| 234 |
+
- User manually runs `/update-dev dev_YYMMDD_##` when satisfied
|
| 235 |
+
|
| 236 |
+
When /update-dev runs:
|
| 237 |
+
|
| 238 |
+
1. Distills PLAN decisions → dev record "Key Decisions" section
|
| 239 |
+
2. Distills TODO deliverables → dev record "Outcome" section
|
| 240 |
+
3. Distills CHANGELOG changes → dev record "Changelog" section
|
| 241 |
+
4. Empties PLAN.md, TODO.md, CHANGELOG.md back to templates
|
| 242 |
+
5. Marks dev record status as ✅ Resolved
|
| 243 |
+
|
| 244 |
+
### AI Context Loading
|
| 245 |
+
|
| 246 |
+
**When new AI session starts:**
|
| 247 |
+
|
| 248 |
+
- Read last 2-3 dev records for recent context (NOT CHANGELOG)
|
| 249 |
+
- Dev records sorted by date: newest `dev_YYMMDD_##_title.md` files first
|
| 250 |
+
- Read README.md for project structure
|
| 251 |
+
- Read CLAUDE.md for coding standards
|
| 252 |
+
- Check PLAN.md/TODO.md for active work (if any)
|
| 253 |
+
|
| 254 |
+
**Do NOT read entire CHANGELOG for context** - it's a temporary workspace, not a historical record.
|
TODO.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TODO List
|
| 2 |
+
|
| 3 |
+
**Session Date:** [YYYY-MM-DD]
|
| 4 |
+
**Dev Record:** [link to dev/dev_YYMMDD_##_concise_title.md]
|
| 5 |
+
|
| 6 |
+
## Active Tasks
|
| 7 |
+
|
| 8 |
+
- [ ] [Task 1]
|
| 9 |
+
- [ ] [Task 2]
|
| 10 |
+
- [ ] [Task 3]
|
| 11 |
+
|
| 12 |
+
## Completed Tasks
|
| 13 |
+
|
| 14 |
+
- [x] [Completed task 1]
|
app.py
CHANGED
|
@@ -8,27 +8,30 @@ import pandas as pd
|
|
| 8 |
# --- Constants ---
|
| 9 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
| 10 |
|
|
|
|
| 11 |
# --- Basic Agent Definition ---
|
| 12 |
# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
|
| 13 |
class BasicAgent:
|
| 14 |
def __init__(self):
|
| 15 |
print("BasicAgent initialized.")
|
|
|
|
| 16 |
def __call__(self, question: str) -> str:
|
| 17 |
print(f"Agent received question (first 50 chars): {question[:50]}...")
|
| 18 |
fixed_answer = "This is a default answer."
|
| 19 |
print(f"Agent returning fixed answer: {fixed_answer}")
|
| 20 |
return fixed_answer
|
| 21 |
|
| 22 |
-
|
|
|
|
| 23 |
"""
|
| 24 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
| 25 |
and displays the results.
|
| 26 |
"""
|
| 27 |
# --- Determine HF Space Runtime URL and Repo URL ---
|
| 28 |
-
space_id = os.getenv("SPACE_ID")
|
| 29 |
|
| 30 |
if profile:
|
| 31 |
-
username= f"{profile.username}"
|
| 32 |
print(f"User logged in: {username}")
|
| 33 |
else:
|
| 34 |
print("User not logged in.")
|
|
@@ -55,16 +58,16 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
|
|
| 55 |
response.raise_for_status()
|
| 56 |
questions_data = response.json()
|
| 57 |
if not questions_data:
|
| 58 |
-
|
| 59 |
-
|
| 60 |
print(f"Fetched {len(questions_data)} questions.")
|
| 61 |
except requests.exceptions.RequestException as e:
|
| 62 |
print(f"Error fetching questions: {e}")
|
| 63 |
return f"Error fetching questions: {e}", None
|
| 64 |
except requests.exceptions.JSONDecodeError as e:
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
except Exception as e:
|
| 69 |
print(f"An unexpected error occurred fetching questions: {e}")
|
| 70 |
return f"An unexpected error occurred fetching questions: {e}", None
|
|
@@ -81,18 +84,36 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
|
|
| 81 |
continue
|
| 82 |
try:
|
| 83 |
submitted_answer = agent(question_text)
|
| 84 |
-
answers_payload.append(
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
except Exception as e:
|
| 87 |
-
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
if not answers_payload:
|
| 91 |
print("Agent did not produce any answers to submit.")
|
| 92 |
return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
|
| 93 |
|
| 94 |
-
# 4. Prepare Submission
|
| 95 |
-
submission_data = {
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
|
| 97 |
print(status_update)
|
| 98 |
|
|
@@ -162,20 +183,19 @@ with gr.Blocks() as demo:
|
|
| 162 |
|
| 163 |
run_button = gr.Button("Run Evaluation & Submit All Answers")
|
| 164 |
|
| 165 |
-
status_output = gr.Textbox(
|
|
|
|
|
|
|
| 166 |
# Removed max_rows=10 from DataFrame constructor
|
| 167 |
results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
|
| 168 |
|
| 169 |
-
run_button.click(
|
| 170 |
-
fn=run_and_submit_all,
|
| 171 |
-
outputs=[status_output, results_table]
|
| 172 |
-
)
|
| 173 |
|
| 174 |
if __name__ == "__main__":
|
| 175 |
-
print("\n" + "-"*30 + " App Starting " + "-"*30)
|
| 176 |
# Check for SPACE_HOST and SPACE_ID at startup for information
|
| 177 |
space_host_startup = os.getenv("SPACE_HOST")
|
| 178 |
-
space_id_startup = os.getenv("SPACE_ID")
|
| 179 |
|
| 180 |
if space_host_startup:
|
| 181 |
print(f"✅ SPACE_HOST found: {space_host_startup}")
|
|
@@ -183,14 +203,18 @@ if __name__ == "__main__":
|
|
| 183 |
else:
|
| 184 |
print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
|
| 185 |
|
| 186 |
-
if space_id_startup:
|
| 187 |
print(f"✅ SPACE_ID found: {space_id_startup}")
|
| 188 |
print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
|
| 189 |
-
print(
|
|
|
|
|
|
|
| 190 |
else:
|
| 191 |
-
print(
|
|
|
|
|
|
|
| 192 |
|
| 193 |
-
print("-"*(60 + len(" App Starting ")) + "\n")
|
| 194 |
|
| 195 |
print("Launching Gradio Interface for Basic Agent Evaluation...")
|
| 196 |
-
demo.launch(debug=True, share=False)
|
|
|
|
| 8 |
# --- Constants ---
|
| 9 |
DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
|
| 10 |
|
| 11 |
+
|
| 12 |
# --- Basic Agent Definition ---
|
| 13 |
# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
|
| 14 |
class BasicAgent:
|
| 15 |
def __init__(self):
|
| 16 |
print("BasicAgent initialized.")
|
| 17 |
+
|
| 18 |
def __call__(self, question: str) -> str:
|
| 19 |
print(f"Agent received question (first 50 chars): {question[:50]}...")
|
| 20 |
fixed_answer = "This is a default answer."
|
| 21 |
print(f"Agent returning fixed answer: {fixed_answer}")
|
| 22 |
return fixed_answer
|
| 23 |
|
| 24 |
+
|
| 25 |
+
def run_and_submit_all(profile: gr.OAuthProfile | None):
|
| 26 |
"""
|
| 27 |
Fetches all questions, runs the BasicAgent on them, submits all answers,
|
| 28 |
and displays the results.
|
| 29 |
"""
|
| 30 |
# --- Determine HF Space Runtime URL and Repo URL ---
|
| 31 |
+
space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
|
| 32 |
|
| 33 |
if profile:
|
| 34 |
+
username = f"{profile.username}"
|
| 35 |
print(f"User logged in: {username}")
|
| 36 |
else:
|
| 37 |
print("User not logged in.")
|
|
|
|
| 58 |
response.raise_for_status()
|
| 59 |
questions_data = response.json()
|
| 60 |
if not questions_data:
|
| 61 |
+
print("Fetched questions list is empty.")
|
| 62 |
+
return "Fetched questions list is empty or invalid format.", None
|
| 63 |
print(f"Fetched {len(questions_data)} questions.")
|
| 64 |
except requests.exceptions.RequestException as e:
|
| 65 |
print(f"Error fetching questions: {e}")
|
| 66 |
return f"Error fetching questions: {e}", None
|
| 67 |
except requests.exceptions.JSONDecodeError as e:
|
| 68 |
+
print(f"Error decoding JSON response from questions endpoint: {e}")
|
| 69 |
+
print(f"Response text: {response.text[:500]}")
|
| 70 |
+
return f"Error decoding server response for questions: {e}", None
|
| 71 |
except Exception as e:
|
| 72 |
print(f"An unexpected error occurred fetching questions: {e}")
|
| 73 |
return f"An unexpected error occurred fetching questions: {e}", None
|
|
|
|
| 84 |
continue
|
| 85 |
try:
|
| 86 |
submitted_answer = agent(question_text)
|
| 87 |
+
answers_payload.append(
|
| 88 |
+
{"task_id": task_id, "submitted_answer": submitted_answer}
|
| 89 |
+
)
|
| 90 |
+
results_log.append(
|
| 91 |
+
{
|
| 92 |
+
"Task ID": task_id,
|
| 93 |
+
"Question": question_text,
|
| 94 |
+
"Submitted Answer": submitted_answer,
|
| 95 |
+
}
|
| 96 |
+
)
|
| 97 |
except Exception as e:
|
| 98 |
+
print(f"Error running agent on task {task_id}: {e}")
|
| 99 |
+
results_log.append(
|
| 100 |
+
{
|
| 101 |
+
"Task ID": task_id,
|
| 102 |
+
"Question": question_text,
|
| 103 |
+
"Submitted Answer": f"AGENT ERROR: {e}",
|
| 104 |
+
}
|
| 105 |
+
)
|
| 106 |
|
| 107 |
if not answers_payload:
|
| 108 |
print("Agent did not produce any answers to submit.")
|
| 109 |
return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
|
| 110 |
|
| 111 |
+
# 4. Prepare Submission
|
| 112 |
+
submission_data = {
|
| 113 |
+
"username": username.strip(),
|
| 114 |
+
"agent_code": agent_code,
|
| 115 |
+
"answers": answers_payload,
|
| 116 |
+
}
|
| 117 |
status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
|
| 118 |
print(status_update)
|
| 119 |
|
|
|
|
| 183 |
|
| 184 |
run_button = gr.Button("Run Evaluation & Submit All Answers")
|
| 185 |
|
| 186 |
+
status_output = gr.Textbox(
|
| 187 |
+
label="Run Status / Submission Result", lines=5, interactive=False
|
| 188 |
+
)
|
| 189 |
# Removed max_rows=10 from DataFrame constructor
|
| 190 |
results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
|
| 191 |
|
| 192 |
+
run_button.click(fn=run_and_submit_all, outputs=[status_output, results_table])
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
if __name__ == "__main__":
|
| 195 |
+
print("\n" + "-" * 30 + " App Starting " + "-" * 30)
|
| 196 |
# Check for SPACE_HOST and SPACE_ID at startup for information
|
| 197 |
space_host_startup = os.getenv("SPACE_HOST")
|
| 198 |
+
space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
|
| 199 |
|
| 200 |
if space_host_startup:
|
| 201 |
print(f"✅ SPACE_HOST found: {space_host_startup}")
|
|
|
|
| 203 |
else:
|
| 204 |
print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
|
| 205 |
|
| 206 |
+
if space_id_startup: # Print repo URLs if SPACE_ID is found
|
| 207 |
print(f"✅ SPACE_ID found: {space_id_startup}")
|
| 208 |
print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
|
| 209 |
+
print(
|
| 210 |
+
f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main"
|
| 211 |
+
)
|
| 212 |
else:
|
| 213 |
+
print(
|
| 214 |
+
"ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined."
|
| 215 |
+
)
|
| 216 |
|
| 217 |
+
print("-" * (60 + len(" App Starting ")) + "\n")
|
| 218 |
|
| 219 |
print("Launching Gradio Interface for Basic Agent Evaluation...")
|
| 220 |
+
demo.launch(debug=True, share=False)
|
requirements.txt
CHANGED
|
@@ -1,2 +1,3 @@
|
|
| 1 |
gradio
|
|
|
|
| 2 |
requests
|
|
|
|
| 1 |
gradio
|
| 2 |
+
gradio[oauth]
|
| 3 |
requests
|