File size: 5,524 Bytes
95bd81e
 
 
 
 
 
 
 
862a3c5
 
95bd81e
 
 
 
862a3c5
95bd81e
862a3c5
95bd81e
 
 
 
862a3c5
95bd81e
 
 
 
862a3c5
 
 
 
 
 
95bd81e
 
 
 
862a3c5
95bd81e
 
 
862a3c5
95bd81e
 
 
 
862a3c5
95bd81e
 
 
862a3c5
95bd81e
 
862a3c5
95bd81e
 
862a3c5
95bd81e
 
 
862a3c5
 
95bd81e
 
862a3c5
95bd81e
 
 
 
 
862a3c5
95bd81e
 
 
 
 
862a3c5
 
 
 
 
 
 
 
 
 
 
 
 
95bd81e
 
 
 
 
 
 
 
 
 
862a3c5
 
95bd81e
 
 
862a3c5
95bd81e
862a3c5
 
 
 
95bd81e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
862a3c5
95bd81e
 
862a3c5
95bd81e
 
 
862a3c5
95bd81e
 
 
 
 
 
 
 
862a3c5
95bd81e
 
 
862a3c5
95bd81e
 
862a3c5
95bd81e
 
862a3c5
95bd81e
862a3c5
95bd81e
 
 
862a3c5
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# Architecture

## System Overview

```mermaid
graph LR
    User[User via Gradio] --> App[app.py]
    App -->|fetch questions| API[GAIA Scoring API]
    App -->|"run per question (+ file_name)"| Agent[GAIAAgent]
    Agent --> Supervisor["Supervisor (GPT-5-mini)"]
    Supervisor -->|delegate| WebAgent[Web Research Agent]
    Supervisor -->|delegate| CodeAgent[Code Execution Agent]
    Supervisor -->|delegate| FileAgent[File Processing Agent]
    Supervisor -->|delegate| MathAgent[Math Agent]
    Agent -->|"extract FINAL ANSWER"| App
    App -->|submit answers| API
    App -->|save| Log[submission_payload.log]
```

## Supervisor Routing

The supervisor receives each question and routes based on strict rules. The `[IMPORTANT CONTEXT: ...]` marker in the prompt is the only signal for file-based routing.

```mermaid
flowchart TD
    Q[Incoming Question] --> Analyze[Supervisor Analyzes Question]
    Analyze --> HasMarker{"Has [IMPORTANT CONTEXT: file...] marker?"}
    HasMarker -->|Yes| FileAgent[File Processing Agent]
    HasMarker -->|No| HasYT{Contains YouTube URL?}
    HasYT -->|Yes| WebAgent[Web Research Agent]
    HasYT -->|No| Classify{Question Type?}
    Classify -->|Facts / Search| WebAgent
    Classify -->|Code / Algorithm| CodeAgent[Code Execution Agent]
    Classify -->|Math / Calculation| MathAgent[Math Agent]
    FileAgent --> NeedMore{Need further processing?}
    NeedMore -->|Yes| Classify
    NeedMore -->|No| Extract["Extract FINAL ANSWER"]
    WebAgent --> Extract
    CodeAgent --> Extract
    MathAgent --> Extract
    Extract --> Return[Return to App]
```

## Agent-Tool Mapping

Each sub-agent is built with `create_agent` and has access to specific tools.

```mermaid
graph TD
    subgraph web ["Web Research Agent (GPT-5-mini)"]
        Tavily[Tavily Search]
        Wiki[Wikipedia]
        Gemini["Gemini 2.5 Pro Video"]
    end

    subgraph code ["Code Execution Agent (GPT-5-mini)"]
        PythonREPL1[Python REPL]
    end

    subgraph file ["File Processing Agent (GPT-5-mini)"]
        Download["GAIA File Downloader (HF Dataset)"]
        Excel[Excel/CSV Reader]
        Audio[Whisper Transcription]
        Vision["GPT-5-mini Vision (Responses API)"]
        TextFile[Text File Reader]
        PDF[PDF Reader]
        PythonREPL2[Python REPL]
    end

    subgraph math ["Math Agent (GPT-5-mini)"]
        Calc[Calculator]
        PythonREPL3[Python REPL]
    end
```

## Answer Flow

All agents use the GAIA answer format prompt: reason through the problem, then output `FINAL ANSWER: [answer]`. The extraction layer strips the prefix before submission.

```mermaid
flowchart LR
    Prompt["GAIA Answer Format Prompt"] --> Agent["Sub-Agent Reasons"]
    Agent --> FA["FINAL ANSWER: 42"]
    FA --> Extract["_extract_answer()"]
    Extract --> Clean["42"]
    Clean --> Submit["POST /submit"]
```

## Data Flow — Single Question

```mermaid
sequenceDiagram
    participant App as app.py
    participant GA as GAIAAgent
    participant SV as Supervisor
    participant SA as Sub-Agent
    participant Tool as Tool

    App->>GA: question + task_id + file_name
    GA->>GA: Build prompt with file context if file_name present
    GA->>SV: Invoke graph with messages
    SV->>SV: Analyze question, pick agent
    SV->>SA: Delegate with full context
    SA->>Tool: Call tool (search, video, code, file, etc.)
    Tool-->>SA: Tool result
    SA->>SA: Reason and produce FINAL ANSWER
    SA-->>SV: Response with FINAL ANSWER
    SV-->>GA: Relay FINAL ANSWER
    GA->>GA: Extract answer via regex
    GA-->>App: Clean answer string
```

## Submission Flow — Full Evaluation

```mermaid
sequenceDiagram
    participant User
    participant Gradio as Gradio UI
    participant App as app.py
    participant API as GAIA Scoring API
    participant Agent as GAIAAgent

    User->>Gradio: Click "Run Evaluation"
    Gradio->>App: run_and_submit_all(profile)
    App->>API: GET /questions
    API-->>App: 20 questions with file_name metadata

    loop For each question
        App->>Agent: agent(question, task_id, file_name)
        Agent-->>App: concise answer
    end

    App->>App: Save submission_payload.log
    App->>API: POST /submit (username, agent_code, answers)
    API-->>App: score, correct_count, total_attempted
    App-->>Gradio: Display results table + score
    Gradio-->>User: Show results
```

## File Processing Pipeline

The file agent downloads from the HuggingFace GAIA dataset (with API fallback) and handles multiple modalities by extension:

```mermaid
flowchart LR
    Download["Download from HF Dataset"] --> Detect{File Extension?}
    Detect -->|.xlsx .xls .csv| Pandas[Pandas Reader]
    Detect -->|.mp3 .wav .m4a| Whisper[Whisper API]
    Detect -->|.png .jpg .gif .webp| GPT5V["GPT-5-mini Vision"]
    Detect -->|.pdf| PyPDF2[PyPDF2 Reader]
    Detect -->|.txt .py .json .md| TextReader[Text Reader]
    Pandas --> Analyze["Reason + FINAL ANSWER"]
    Whisper --> Analyze
    GPT5V --> Analyze
    PyPDF2 --> Analyze
    TextReader --> Analyze
```

## Video Analysis Pipeline

YouTube video questions are handled by the web research agent using Gemini's native video understanding -- no download required:

```mermaid
flowchart LR
    Question["Question with YouTube URL"] --> WebAgent[Web Research Agent]
    WebAgent --> GeminiTool["analyze_youtube_video tool"]
    GeminiTool -->|"pass URL directly"| Gemini["Gemini 2.5 Pro"]
    Gemini -->|watches video| Response[Video Analysis Result]
    Response --> Answer["Reason + FINAL ANSWER"]
```