File size: 2,863 Bytes
95bd81e
 
 
 
 
 
 
 
 
 
968382a
95bd81e
968382a
95bd81e
 
 
 
 
 
 
 
 
968382a
95bd81e
 
968382a
95bd81e
 
 
 
 
968382a
95bd81e
 
 
 
 
 
 
 
 
 
 
968382a
95bd81e
 
 
968382a
95bd81e
968382a
 
95bd81e
 
 
 
 
 
 
 
 
 
 
 
 
968382a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# GAIA Multi-Agent Evaluation System

A multi-agent system built with **LangGraph** and **LangChain** to tackle the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard) β€” a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding.

## How It Works

A **supervisor agent** analyzes each incoming question and delegates it to one of four specialized sub-agents:

| Agent | Responsibility | Tools |
|---|---|---|
| **Web Research** | Factual lookups, current events, YouTube video analysis | Tavily Search, Wikipedia, Gemini 2.5 Pro Video |
| **Code Execution** | Python programming, algorithms, data processing | Python REPL |
| **File Processing** | Excel, CSV, PDF, audio, image analysis | GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision |
| **Math/Reasoning** | Arithmetic, algebra, calculus, statistics | Calculator, Python REPL |

See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams and data flow.

## Project Structure

```
β”œβ”€β”€ app.py                  # Gradio UI + submission logic
β”œβ”€β”€ agent.py                # GAIAAgent class (supervisor wrapper)
β”œβ”€β”€ prompts.py              # Shared GAIA answer format prompt
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ supervisor.py       # LangGraph supervisor graph
β”‚   β”œβ”€β”€ web_research.py     # Web search + video agent
β”‚   β”œβ”€β”€ code_agent.py       # Code execution agent
β”‚   β”œβ”€β”€ file_agent.py       # File processing agent
β”‚   └── math_agent.py       # Math/reasoning agent
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ search_tools.py     # Tavily + Wikipedia
β”‚   β”œβ”€β”€ video_tools.py      # Gemini YouTube video analysis
β”‚   β”œβ”€β”€ code_tools.py       # Python REPL
β”‚   β”œβ”€β”€ file_tools.py       # File download, Excel, audio, image, PDF
β”‚   └── math_tools.py       # Calculator + Python REPL
β”œβ”€β”€ requirements.txt
└── test_agent.py           # Local testing script
```

## Setup

### Environment Variables

Set these in a local `.env` file:

| Variable | Purpose |
|---|---|
| `OPENAI_API_KEY` | GPT-5-mini for reasoning, vision, and Whisper transcription |
| `TAVILY_API_KEY` | Web search via Tavily |
| `GOOGLE_API_KEY` | Gemini 2.5 Pro for YouTube video analysis |
| `HF_TOKEN` | HuggingFace token for downloading GAIA dataset files |

### Local Development

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python test_agent.py      # test on a random GAIA question
python app.py             # launch Gradio UI
```

## Scoring

The GAIA benchmark uses **exact match** scoring. The agent uses the official GAIA answer format prompt β€” reasoning through each question before producing a concise `FINAL ANSWER` (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified.