Chris
Complete Multi-Agent System Implementation - LangGraph supervisor pattern with free tools only
e277613
|
raw
history blame
7.8 kB
metadata
title: Advanced Multi-Agent System for GAIA Benchmark
emoji: πŸ€–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Advanced Multi-Agent System for GAIA Benchmark

This project implements a sophisticated multi-agent system using LangGraph to tackle the GAIA (General AI Assistant) benchmark questions. The system achieves intelligent task routing and specialized processing through a supervisor-agent architecture.

πŸ—οΈ Architecture Overview

Multi-Agent Design Pattern

The system follows a supervisor pattern with specialized worker agents:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Supervisor     β”‚ ← Routes tasks to appropriate agents
β”‚     Agent       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚           β”‚
    β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Research β”‚ β”‚Reasoningβ”‚ β”‚  File   β”‚
β”‚ Agent   β”‚ β”‚ Agent   β”‚ β”‚ Agent   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Specializations

  1. Supervisor Agent

    • Routes incoming tasks to appropriate specialized agents
    • Manages workflow and coordination between agents
    • Makes decisions based on task content and requirements
  2. Research Agent

    • Handles web searches and information gathering
    • Processes Wikipedia queries and YouTube analysis
    • Uses DuckDuckGo search for reliable information retrieval
  3. Reasoning Agent

    • Processes mathematical and logical problems
    • Handles text analysis including reversed text puzzles
    • Manages set theory and pattern recognition tasks
  4. File Agent

    • Analyzes various file types (images, audio, documents, code)
    • Provides structured analysis for multimedia content
    • Handles spreadsheets and code execution requirements

πŸ› οΈ Technical Implementation

Core Technologies

  • LangGraph: Multi-agent orchestration framework
  • LangChain: LLM integration and tool management
  • OpenAI GPT-4: Primary language model for reasoning
  • Gradio: Web interface for interaction and submission
  • DuckDuckGo: Web search capabilities

Key Features

1. Intelligent Task Classification

def _classify_task(self, question: str, file_name: str) -> str:
    """Classify tasks based on content and file presence"""
    if file_name:
        return "file_analysis"
    elif any(keyword in question_lower for keyword in ["wikipedia", "search"]):
        return "research"
    elif any(keyword in question_lower for keyword in ["math", "logic"]):
        return "reasoning"
    # ... additional classification logic

2. Handoff Mechanism

The system uses LangGraph's Command primitive for seamless agent transitions:

@tool
def create_handoff_tool(*, agent_name: str, description: str | None = None):
    def handoff_tool(state, tool_call_id) -> Command:
        return Command(
            goto=agent_name,
            update={"messages": state["messages"] + [tool_message]},
            graph=Command.PARENT,
        )
    return handoff_tool

3. Fallback Processing

When OpenAI API is unavailable, the system includes rule-based fallback processing:

  • Reversed text detection and processing
  • Basic mathematical reasoning
  • File type identification and guidance

πŸ“Š GAIA Benchmark Performance

Question Types Handled

  1. Research Questions

    • Wikipedia information retrieval
    • YouTube video analysis
    • General web search queries
    • Historical and factual questions
  2. Logic & Reasoning

    • Reversed text puzzles
    • Mathematical calculations
    • Set theory problems (commutativity, etc.)
    • Pattern recognition
  3. File Analysis

    • Image analysis (chess positions, visual content)
    • Audio processing (speech-to-text requirements)
    • Code execution and analysis
    • Spreadsheet data processing
  4. Multi-step Problems

    • Complex queries requiring multiple agents
    • Sequential reasoning tasks
    • Cross-domain problem solving

Example Question Processing

Reversed Text Question:

Input: ".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI"
Processing: Reasoning Agent β†’ Text Analysis Tool β†’ "right"

Research Question:

Input: "Who nominated the only Featured Article on English Wikipedia about a dinosaur promoted in November 2016?"
Processing: Supervisor β†’ Research Agent β†’ Web Search β†’ Detailed Answer

πŸš€ Deployment

Hugging Face Spaces

The system is designed for deployment on Hugging Face Spaces with:

  • Automatic dependency installation
  • OAuth integration for user authentication
  • Real-time processing and submission to GAIA API
  • Comprehensive result tracking and display

Environment Variables

Required for full functionality:

OPENAI_API_KEY=your_openai_api_key_here
SPACE_ID=your_huggingface_space_id

Local Development

  1. Clone the repository
  2. Set up virtual environment:
    python3 -m venv venv
    source venv/bin/activate
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Run the application:
    python app.py
    

πŸ“ˆ Performance Optimization

Scoring Strategy

The system aims for 30%+ accuracy on the GAIA benchmark through:

  1. Intelligent Routing: Questions are automatically routed to the most appropriate specialist agent
  2. Tool Specialization: Each agent has access to tools optimized for their domain
  3. Fallback Mechanisms: Rule-based processing when LLM services are unavailable
  4. Error Handling: Robust error management and graceful degradation

Bonus Features

  • LangSmith Integration: Ready for observability and monitoring
  • Free Tools Only: Uses only free/open-source tools for accessibility
  • Extensible Architecture: Easy to add new agents and capabilities

πŸ”§ Configuration

Agent Prompts

Each agent has carefully crafted prompts for optimal performance:

  • Supervisor: Focuses on task analysis and routing decisions
  • Research: Emphasizes reliable source identification and factual accuracy
  • Reasoning: Promotes step-by-step logical analysis
  • File: Provides structured analysis frameworks for different file types

Tool Integration

Tools are integrated using LangChain's @tool decorator with proper error handling and type hints for reliable operation.

πŸ“ Usage

  1. Login: Authenticate with your Hugging Face account
  2. Submit: Click "Run Evaluation & Submit All Answers"
  3. Monitor: Watch real-time processing of questions
  4. Review: Examine results and scoring in the interface

🀝 Contributing

This implementation serves as a foundation for advanced multi-agent systems. Key areas for enhancement:

  • Additional specialized agents (e.g., code execution, image analysis)
  • Advanced reasoning capabilities
  • Integration with more powerful models
  • Enhanced tool ecosystem

πŸ“š References


Note: This system demonstrates advanced multi-agent coordination using LangGraph and represents a production-ready approach to complex AI task management.