GAIA-Langgraph / README.md
jash0803's picture
docs: update readme 2
968382a

GAIA Multi-Agent Evaluation System

A multi-agent system built with LangGraph and LangChain to tackle the GAIA benchmark β€” a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding.

How It Works

A supervisor agent analyzes each incoming question and delegates it to one of four specialized sub-agents:

Agent Responsibility Tools
Web Research Factual lookups, current events, YouTube video analysis Tavily Search, Wikipedia, Gemini 2.5 Pro Video
Code Execution Python programming, algorithms, data processing Python REPL
File Processing Excel, CSV, PDF, audio, image analysis GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision
Math/Reasoning Arithmetic, algebra, calculus, statistics Calculator, Python REPL

See ARCHITECTURE.md for detailed diagrams and data flow.

Project Structure

β”œβ”€β”€ app.py                  # Gradio UI + submission logic
β”œβ”€β”€ agent.py                # GAIAAgent class (supervisor wrapper)
β”œβ”€β”€ prompts.py              # Shared GAIA answer format prompt
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ supervisor.py       # LangGraph supervisor graph
β”‚   β”œβ”€β”€ web_research.py     # Web search + video agent
β”‚   β”œβ”€β”€ code_agent.py       # Code execution agent
β”‚   β”œβ”€β”€ file_agent.py       # File processing agent
β”‚   └── math_agent.py       # Math/reasoning agent
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ search_tools.py     # Tavily + Wikipedia
β”‚   β”œβ”€β”€ video_tools.py      # Gemini YouTube video analysis
β”‚   β”œβ”€β”€ code_tools.py       # Python REPL
β”‚   β”œβ”€β”€ file_tools.py       # File download, Excel, audio, image, PDF
β”‚   └── math_tools.py       # Calculator + Python REPL
β”œβ”€β”€ requirements.txt
└── test_agent.py           # Local testing script

Setup

Environment Variables

Set these in a local .env file:

Variable Purpose
OPENAI_API_KEY GPT-5-mini for reasoning, vision, and Whisper transcription
TAVILY_API_KEY Web search via Tavily
GOOGLE_API_KEY Gemini 2.5 Pro for YouTube video analysis
HF_TOKEN HuggingFace token for downloading GAIA dataset files

Local Development

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python test_agent.py      # test on a random GAIA question
python app.py             # launch Gradio UI

Scoring

The GAIA benchmark uses exact match scoring. The agent uses the official GAIA answer format prompt β€” reasoning through each question before producing a concise FINAL ANSWER (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified.