Spaces:
Configuration error
Configuration error
GAIA Multi-Agent Evaluation System
A multi-agent system built with LangGraph and LangChain to tackle the GAIA benchmark β a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding.
How It Works
A supervisor agent analyzes each incoming question and delegates it to one of four specialized sub-agents:
| Agent | Responsibility | Tools |
|---|---|---|
| Web Research | Factual lookups, current events, YouTube video analysis | Tavily Search, Wikipedia, Gemini 2.5 Pro Video |
| Code Execution | Python programming, algorithms, data processing | Python REPL |
| File Processing | Excel, CSV, PDF, audio, image analysis | GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision |
| Math/Reasoning | Arithmetic, algebra, calculus, statistics | Calculator, Python REPL |
See ARCHITECTURE.md for detailed diagrams and data flow.
Project Structure
βββ app.py # Gradio UI + submission logic
βββ agent.py # GAIAAgent class (supervisor wrapper)
βββ prompts.py # Shared GAIA answer format prompt
βββ agents/
β βββ supervisor.py # LangGraph supervisor graph
β βββ web_research.py # Web search + video agent
β βββ code_agent.py # Code execution agent
β βββ file_agent.py # File processing agent
β βββ math_agent.py # Math/reasoning agent
βββ tools/
β βββ search_tools.py # Tavily + Wikipedia
β βββ video_tools.py # Gemini YouTube video analysis
β βββ code_tools.py # Python REPL
β βββ file_tools.py # File download, Excel, audio, image, PDF
β βββ math_tools.py # Calculator + Python REPL
βββ requirements.txt
βββ test_agent.py # Local testing script
Setup
Environment Variables
Set these in a local .env file:
| Variable | Purpose |
|---|---|
OPENAI_API_KEY |
GPT-5-mini for reasoning, vision, and Whisper transcription |
TAVILY_API_KEY |
Web search via Tavily |
GOOGLE_API_KEY |
Gemini 2.5 Pro for YouTube video analysis |
HF_TOKEN |
HuggingFace token for downloading GAIA dataset files |
Local Development
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python test_agent.py # test on a random GAIA question
python app.py # launch Gradio UI
Scoring
The GAIA benchmark uses exact match scoring. The agent uses the official GAIA answer format prompt β reasoning through each question before producing a concise FINAL ANSWER (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified.