Spaces:
Configuration error
Configuration error
| # GAIA Multi-Agent Evaluation System | |
| A multi-agent system built with **LangGraph** and **LangChain** to tackle the [GAIA benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard) β a set of real-world questions that test AI assistants on reasoning, tool use, and multimodal understanding. | |
| ## How It Works | |
| A **supervisor agent** analyzes each incoming question and delegates it to one of four specialized sub-agents: | |
| | Agent | Responsibility | Tools | | |
| |---|---|---| | |
| | **Web Research** | Factual lookups, current events, YouTube video analysis | Tavily Search, Wikipedia, Gemini 2.5 Pro Video | | |
| | **Code Execution** | Python programming, algorithms, data processing | Python REPL | | |
| | **File Processing** | Excel, CSV, PDF, audio, image analysis | GAIA File Downloader, Pandas, Whisper, GPT-5-mini Vision | | |
| | **Math/Reasoning** | Arithmetic, algebra, calculus, statistics | Calculator, Python REPL | | |
| See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed diagrams and data flow. | |
| ## Project Structure | |
| ``` | |
| βββ app.py # Gradio UI + submission logic | |
| βββ agent.py # GAIAAgent class (supervisor wrapper) | |
| βββ prompts.py # Shared GAIA answer format prompt | |
| βββ agents/ | |
| β βββ supervisor.py # LangGraph supervisor graph | |
| β βββ web_research.py # Web search + video agent | |
| β βββ code_agent.py # Code execution agent | |
| β βββ file_agent.py # File processing agent | |
| β βββ math_agent.py # Math/reasoning agent | |
| βββ tools/ | |
| β βββ search_tools.py # Tavily + Wikipedia | |
| β βββ video_tools.py # Gemini YouTube video analysis | |
| β βββ code_tools.py # Python REPL | |
| β βββ file_tools.py # File download, Excel, audio, image, PDF | |
| β βββ math_tools.py # Calculator + Python REPL | |
| βββ requirements.txt | |
| βββ test_agent.py # Local testing script | |
| ``` | |
| ## Setup | |
| ### Environment Variables | |
| Set these in a local `.env` file: | |
| | Variable | Purpose | | |
| |---|---| | |
| | `OPENAI_API_KEY` | GPT-5-mini for reasoning, vision, and Whisper transcription | | |
| | `TAVILY_API_KEY` | Web search via Tavily | | |
| | `GOOGLE_API_KEY` | Gemini 2.5 Pro for YouTube video analysis | | |
| | `HF_TOKEN` | HuggingFace token for downloading GAIA dataset files | | |
| ### Local Development | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| python test_agent.py # test on a random GAIA question | |
| python app.py # launch Gradio UI | |
| ``` | |
| ## Scoring | |
| The GAIA benchmark uses **exact match** scoring. The agent uses the official GAIA answer format prompt β reasoning through each question before producing a concise `FINAL ANSWER` (a number, a few words, or a comma-separated list) with no articles, abbreviations, or units unless specified. | |