Spaces:
Configuration error
Configuration error
File size: 2,863 Bytes
95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a 95bd81e 968382a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | # GAIA Multi-Agent Evaluation System
A multi-agent system built with **LangGraph** and **LangChain** to tackle the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard) β a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding.
## How It Works
A **supervisor agent** analyzes each incoming question and delegates it to one of four specialized sub-agents:
| Agent | Responsibility | Tools |
|---|---|---|
| **Web Research** | Factual lookups, current events, YouTube video analysis | Tavily Search, Wikipedia, Gemini 2.5 Pro Video |
| **Code Execution** | Python programming, algorithms, data processing | Python REPL |
| **File Processing** | Excel, CSV, PDF, audio, image analysis | GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision |
| **Math/Reasoning** | Arithmetic, algebra, calculus, statistics | Calculator, Python REPL |
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams and data flow.
## Project Structure
```
βββ app.py # Gradio UI + submission logic
βββ agent.py # GAIAAgent class (supervisor wrapper)
βββ prompts.py # Shared GAIA answer format prompt
βββ agents/
β βββ supervisor.py # LangGraph supervisor graph
β βββ web_research.py # Web search + video agent
β βββ code_agent.py # Code execution agent
β βββ file_agent.py # File processing agent
β βββ math_agent.py # Math/reasoning agent
βββ tools/
β βββ search_tools.py # Tavily + Wikipedia
β βββ video_tools.py # Gemini YouTube video analysis
β βββ code_tools.py # Python REPL
β βββ file_tools.py # File download, Excel, audio, image, PDF
β βββ math_tools.py # Calculator + Python REPL
βββ requirements.txt
βββ test_agent.py # Local testing script
```
## Setup
### Environment Variables
Set these in a local `.env` file:
| Variable | Purpose |
|---|---|
| `OPENAI_API_KEY` | GPT-5-mini for reasoning, vision, and Whisper transcription |
| `TAVILY_API_KEY` | Web search via Tavily |
| `GOOGLE_API_KEY` | Gemini 2.5 Pro for YouTube video analysis |
| `HF_TOKEN` | HuggingFace token for downloading GAIA dataset files |
### Local Development
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python test_agent.py # test on a random GAIA question
python app.py # launch Gradio UI
```
## Scoring
The GAIA benchmark uses **exact match** scoring. The agent uses the official GAIA answer format prompt β reasoning through each question before producing a concise `FINAL ANSWER` (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified.
|