GAIA-Langgraph / README.md
jash0803's picture
docs: update readme 2
968382a
# GAIA Multi-Agent Evaluation System
A multi-agent system built with **LangGraph** and **LangChain** to tackle the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard) β€” a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding.
## How It Works
A **supervisor agent** analyzes each incoming question and delegates it to one of four specialized sub-agents:
| Agent | Responsibility | Tools |
|---|---|---|
| **Web Research** | Factual lookups, current events, YouTube video analysis | Tavily Search, Wikipedia, Gemini 2.5 Pro Video |
| **Code Execution** | Python programming, algorithms, data processing | Python REPL |
| **File Processing** | Excel, CSV, PDF, audio, image analysis | GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision |
| **Math/Reasoning** | Arithmetic, algebra, calculus, statistics | Calculator, Python REPL |
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams and data flow.
## Project Structure
```
β”œβ”€β”€ app.py # Gradio UI + submission logic
β”œβ”€β”€ agent.py # GAIAAgent class (supervisor wrapper)
β”œβ”€β”€ prompts.py # Shared GAIA answer format prompt
β”œβ”€β”€ agents/
β”‚ β”œβ”€β”€ supervisor.py # LangGraph supervisor graph
β”‚ β”œβ”€β”€ web_research.py # Web search + video agent
β”‚ β”œβ”€β”€ code_agent.py # Code execution agent
β”‚ β”œβ”€β”€ file_agent.py # File processing agent
β”‚ └── math_agent.py # Math/reasoning agent
β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ search_tools.py # Tavily + Wikipedia
β”‚ β”œβ”€β”€ video_tools.py # Gemini YouTube video analysis
β”‚ β”œβ”€β”€ code_tools.py # Python REPL
β”‚ β”œβ”€β”€ file_tools.py # File download, Excel, audio, image, PDF
β”‚ └── math_tools.py # Calculator + Python REPL
β”œβ”€β”€ requirements.txt
└── test_agent.py # Local testing script
```
## Setup
### Environment Variables
Set these in a local `.env` file:
| Variable | Purpose |
|---|---|
| `OPENAI_API_KEY` | GPT-5-mini for reasoning, vision, and Whisper transcription |
| `TAVILY_API_KEY` | Web search via Tavily |
| `GOOGLE_API_KEY` | Gemini 2.5 Pro for YouTube video analysis |
| `HF_TOKEN` | HuggingFace token for downloading GAIA dataset files |
### Local Development
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python test_agent.py # test on a random GAIA question
python app.py # launch Gradio UI
```
## Scoring
The GAIA benchmark uses **exact match** scoring. The agent uses the official GAIA answer format prompt β€” reasoning through each question before producing a concise `FINAL ANSWER` (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified.