Spaces:
Sleeping
title: Advanced Multi-Agent System for GAIA Benchmark
emoji: π€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
Advanced Multi-Agent System for GAIA Benchmark
This project implements a sophisticated multi-agent system using LangGraph to tackle the GAIA (General AI Assistant) benchmark questions. The system achieves intelligent task routing and specialized processing through a supervisor-agent architecture.
ποΈ Architecture Overview
Multi-Agent Design Pattern
The system follows a supervisor pattern with specialized worker agents:
βββββββββββββββββββ
β Supervisor β β Routes tasks to appropriate agents
β Agent β
βββββββββββ¬ββββββββ
β
βββββββ΄ββββββ
β β
βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ
βResearch β βReasoningβ β File β
β Agent β β Agent β β Agent β
βββββββββββ βββββββββββ βββββββββββ
Agent Specializations
Supervisor Agent
- Routes incoming tasks to appropriate specialized agents
- Manages workflow and coordination between agents
- Makes decisions based on task content and requirements
Research Agent
- Handles web searches and information gathering
- Processes Wikipedia queries and YouTube analysis
- Uses DuckDuckGo search for reliable information retrieval
Reasoning Agent
- Processes mathematical and logical problems
- Handles text analysis including reversed text puzzles
- Manages set theory and pattern recognition tasks
File Agent
- Analyzes various file types (images, audio, documents, code)
- Provides structured analysis for multimedia content
- Handles spreadsheets and code execution requirements
π οΈ Technical Implementation
Core Technologies
- LangGraph: Multi-agent orchestration framework
- LangChain: LLM integration and tool management
- OpenAI GPT-4: Primary language model for reasoning
- Gradio: Web interface for interaction and submission
- DuckDuckGo: Web search capabilities
Key Features
1. Intelligent Task Classification
def _classify_task(self, question: str, file_name: str) -> str:
"""Classify tasks based on content and file presence"""
if file_name:
return "file_analysis"
elif any(keyword in question_lower for keyword in ["wikipedia", "search"]):
return "research"
elif any(keyword in question_lower for keyword in ["math", "logic"]):
return "reasoning"
# ... additional classification logic
2. Handoff Mechanism
The system uses LangGraph's Command primitive for seamless agent transitions:
@tool
def create_handoff_tool(*, agent_name: str, description: str | None = None):
def handoff_tool(state, tool_call_id) -> Command:
return Command(
goto=agent_name,
update={"messages": state["messages"] + [tool_message]},
graph=Command.PARENT,
)
return handoff_tool
3. Fallback Processing
When OpenAI API is unavailable, the system includes rule-based fallback processing:
- Reversed text detection and processing
- Basic mathematical reasoning
- File type identification and guidance
π GAIA Benchmark Performance
Question Types Handled
Research Questions
- Wikipedia information retrieval
- YouTube video analysis
- General web search queries
- Historical and factual questions
Logic & Reasoning
- Reversed text puzzles
- Mathematical calculations
- Set theory problems (commutativity, etc.)
- Pattern recognition
File Analysis
- Image analysis (chess positions, visual content)
- Audio processing (speech-to-text requirements)
- Code execution and analysis
- Spreadsheet data processing
Multi-step Problems
- Complex queries requiring multiple agents
- Sequential reasoning tasks
- Cross-domain problem solving
Example Question Processing
Reversed Text Question:
Input: ".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI"
Processing: Reasoning Agent β Text Analysis Tool β "right"
Research Question:
Input: "Who nominated the only Featured Article on English Wikipedia about a dinosaur promoted in November 2016?"
Processing: Supervisor β Research Agent β Web Search β Detailed Answer
π Deployment
Hugging Face Spaces
The system is designed for deployment on Hugging Face Spaces with:
- Automatic dependency installation
- OAuth integration for user authentication
- Real-time processing and submission to GAIA API
- Comprehensive result tracking and display
Environment Variables
Required for full functionality:
OPENAI_API_KEY=your_openai_api_key_here
SPACE_ID=your_huggingface_space_id
Local Development
- Clone the repository
- Set up virtual environment:
python3 -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt - Run the application:
python app.py
π Performance Optimization
Scoring Strategy
The system aims for 30%+ accuracy on the GAIA benchmark through:
- Intelligent Routing: Questions are automatically routed to the most appropriate specialist agent
- Tool Specialization: Each agent has access to tools optimized for their domain
- Fallback Mechanisms: Rule-based processing when LLM services are unavailable
- Error Handling: Robust error management and graceful degradation
Bonus Features
- LangSmith Integration: Ready for observability and monitoring
- Free Tools Only: Uses only free/open-source tools for accessibility
- Extensible Architecture: Easy to add new agents and capabilities
π§ Configuration
Agent Prompts
Each agent has carefully crafted prompts for optimal performance:
- Supervisor: Focuses on task analysis and routing decisions
- Research: Emphasizes reliable source identification and factual accuracy
- Reasoning: Promotes step-by-step logical analysis
- File: Provides structured analysis frameworks for different file types
Tool Integration
Tools are integrated using LangChain's @tool decorator with proper error handling and type hints for reliable operation.
π Usage
- Login: Authenticate with your Hugging Face account
- Submit: Click "Run Evaluation & Submit All Answers"
- Monitor: Watch real-time processing of questions
- Review: Examine results and scoring in the interface
π€ Contributing
This implementation serves as a foundation for advanced multi-agent systems. Key areas for enhancement:
- Additional specialized agents (e.g., code execution, image analysis)
- Advanced reasoning capabilities
- Integration with more powerful models
- Enhanced tool ecosystem
π References
Note: This system demonstrates advanced multi-agent coordination using LangGraph and represents a production-ready approach to complex AI task management.