Spaces:
Configuration error
Configuration error
File size: 5,524 Bytes
95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 95bd81e 862a3c5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | # Architecture
## System Overview
```mermaid
graph LR
User[User via Gradio] --> App[app.py]
App -->|fetch questions| API[GAIA Scoring API]
App -->|"run per question (+ file_name)"| Agent[GAIAAgent]
Agent --> Supervisor["Supervisor (GPT-5-mini)"]
Supervisor -->|delegate| WebAgent[Web Research Agent]
Supervisor -->|delegate| CodeAgent[Code Execution Agent]
Supervisor -->|delegate| FileAgent[File Processing Agent]
Supervisor -->|delegate| MathAgent[Math Agent]
Agent -->|"extract FINAL ANSWER"| App
App -->|submit answers| API
App -->|save| Log[submission_payload.log]
```
## Supervisor Routing
The supervisor receives each question and routes based on strict rules. The `[IMPORTANT CONTEXT: ...]` marker in the prompt is the only signal for file-based routing.
```mermaid
flowchart TD
Q[Incoming Question] --> Analyze[Supervisor Analyzes Question]
Analyze --> HasMarker{"Has [IMPORTANT CONTEXT: file...] marker?"}
HasMarker -->|Yes| FileAgent[File Processing Agent]
HasMarker -->|No| HasYT{Contains YouTube URL?}
HasYT -->|Yes| WebAgent[Web Research Agent]
HasYT -->|No| Classify{Question Type?}
Classify -->|Facts / Search| WebAgent
Classify -->|Code / Algorithm| CodeAgent[Code Execution Agent]
Classify -->|Math / Calculation| MathAgent[Math Agent]
FileAgent --> NeedMore{Need further processing?}
NeedMore -->|Yes| Classify
NeedMore -->|No| Extract["Extract FINAL ANSWER"]
WebAgent --> Extract
CodeAgent --> Extract
MathAgent --> Extract
Extract --> Return[Return to App]
```
## Agent-Tool Mapping
Each sub-agent is built with `create_agent` and has access to specific tools.
```mermaid
graph TD
subgraph web ["Web Research Agent (GPT-5-mini)"]
Tavily[Tavily Search]
Wiki[Wikipedia]
Gemini["Gemini 2.5 Pro Video"]
end
subgraph code ["Code Execution Agent (GPT-5-mini)"]
PythonREPL1[Python REPL]
end
subgraph file ["File Processing Agent (GPT-5-mini)"]
Download["GAIA File Downloader (HF Dataset)"]
Excel[Excel/CSV Reader]
Audio[Whisper Transcription]
Vision["GPT-5-mini Vision (Responses API)"]
TextFile[Text File Reader]
PDF[PDF Reader]
PythonREPL2[Python REPL]
end
subgraph math ["Math Agent (GPT-5-mini)"]
Calc[Calculator]
PythonREPL3[Python REPL]
end
```
## Answer Flow
All agents use the GAIA answer format prompt: reason through the problem, then output `FINAL ANSWER: [answer]`. The extraction layer strips the prefix before submission.
```mermaid
flowchart LR
Prompt["GAIA Answer Format Prompt"] --> Agent["Sub-Agent Reasons"]
Agent --> FA["FINAL ANSWER: 42"]
FA --> Extract["_extract_answer()"]
Extract --> Clean["42"]
Clean --> Submit["POST /submit"]
```
## Data Flow — Single Question
```mermaid
sequenceDiagram
participant App as app.py
participant GA as GAIAAgent
participant SV as Supervisor
participant SA as Sub-Agent
participant Tool as Tool
App->>GA: question + task_id + file_name
GA->>GA: Build prompt with file context if file_name present
GA->>SV: Invoke graph with messages
SV->>SV: Analyze question, pick agent
SV->>SA: Delegate with full context
SA->>Tool: Call tool (search, video, code, file, etc.)
Tool-->>SA: Tool result
SA->>SA: Reason and produce FINAL ANSWER
SA-->>SV: Response with FINAL ANSWER
SV-->>GA: Relay FINAL ANSWER
GA->>GA: Extract answer via regex
GA-->>App: Clean answer string
```
## Submission Flow — Full Evaluation
```mermaid
sequenceDiagram
participant User
participant Gradio as Gradio UI
participant App as app.py
participant API as GAIA Scoring API
participant Agent as GAIAAgent
User->>Gradio: Click "Run Evaluation"
Gradio->>App: run_and_submit_all(profile)
App->>API: GET /questions
API-->>App: 20 questions with file_name metadata
loop For each question
App->>Agent: agent(question, task_id, file_name)
Agent-->>App: concise answer
end
App->>App: Save submission_payload.log
App->>API: POST /submit (username, agent_code, answers)
API-->>App: score, correct_count, total_attempted
App-->>Gradio: Display results table + score
Gradio-->>User: Show results
```
## File Processing Pipeline
The file agent downloads from the HuggingFace GAIA dataset (with API fallback) and handles multiple modalities by extension:
```mermaid
flowchart LR
Download["Download from HF Dataset"] --> Detect{File Extension?}
Detect -->|.xlsx .xls .csv| Pandas[Pandas Reader]
Detect -->|.mp3 .wav .m4a| Whisper[Whisper API]
Detect -->|.png .jpg .gif .webp| GPT5V["GPT-5-mini Vision"]
Detect -->|.pdf| PyPDF2[PyPDF2 Reader]
Detect -->|.txt .py .json .md| TextReader[Text Reader]
Pandas --> Analyze["Reason + FINAL ANSWER"]
Whisper --> Analyze
GPT5V --> Analyze
PyPDF2 --> Analyze
TextReader --> Analyze
```
## Video Analysis Pipeline
YouTube video questions are handled by the web research agent using Gemini's native video understanding -- no download required:
```mermaid
flowchart LR
Question["Question with YouTube URL"] --> WebAgent[Web Research Agent]
WebAgent --> GeminiTool["analyze_youtube_video tool"]
GeminiTool -->|"pass URL directly"| Gemini["Gemini 2.5 Pro"]
Gemini -->|watches video| Response[Video Analysis Result]
Response --> Answer["Reason + FINAL ANSWER"]
```
|