Agent_Course_Final_Assignment

Sleeping

App Files Files Community

Agent_Course_Final_Assignment / README.md

Chris

Complete Multi-Agent System Implementation - LangGraph supervisor pattern with free tools only

e277613 11 months ago

7.8 kB

title: Advanced Multi-Agent System for GAIA Benchmark
emoji: 🤖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Advanced Multi-Agent System for GAIA Benchmark

This project implements a sophisticated multi-agent system using LangGraph to tackle the GAIA (General AI Assistant) benchmark questions. The system achieves intelligent task routing and specialized processing through a supervisor-agent architecture.

🏗️ Architecture Overview

Multi-Agent Design Pattern

The system follows a supervisor pattern with specialized worker agents:

┌─────────────────┐
│  Supervisor     │ ← Routes tasks to appropriate agents
│     Agent       │
└─────────┬───────┘
          │
    ┌─────┴─────┐
    │           │
    ▼           ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Research │ │Reasoning│ │  File   │
│ Agent   │ │ Agent   │ │ Agent   │
└─────────┘ └─────────┘ └─────────┘

Agent Specializations

Supervisor Agent
- Routes incoming tasks to appropriate specialized agents
- Manages workflow and coordination between agents
- Makes decisions based on task content and requirements
Research Agent
- Handles web searches and information gathering
- Processes Wikipedia queries and YouTube analysis
- Uses DuckDuckGo search for reliable information retrieval
Reasoning Agent
- Processes mathematical and logical problems
- Handles text analysis including reversed text puzzles
- Manages set theory and pattern recognition tasks
File Agent
- Analyzes various file types (images, audio, documents, code)
- Provides structured analysis for multimedia content
- Handles spreadsheets and code execution requirements

🛠️ Technical Implementation

Core Technologies

LangGraph: Multi-agent orchestration framework
LangChain: LLM integration and tool management
OpenAI GPT-4: Primary language model for reasoning
Gradio: Web interface for interaction and submission
DuckDuckGo: Web search capabilities

Key Features

1. Intelligent Task Classification

def _classify_task(self, question: str, file_name: str) -> str:
    """Classify tasks based on content and file presence"""
    if file_name:
        return "file_analysis"
    elif any(keyword in question_lower for keyword in ["wikipedia", "search"]):
        return "research"
    elif any(keyword in question_lower for keyword in ["math", "logic"]):
        return "reasoning"
    # ... additional classification logic

2. Handoff Mechanism

The system uses LangGraph's Command primitive for seamless agent transitions:

@tool
def create_handoff_tool(*, agent_name: str, description: str | None = None):
    def handoff_tool(state, tool_call_id) -> Command:
        return Command(
            goto=agent_name,
            update={"messages": state["messages"] + [tool_message]},
            graph=Command.PARENT,
        )
    return handoff_tool

3. Fallback Processing

When OpenAI API is unavailable, the system includes rule-based fallback processing:

Reversed text detection and processing
Basic mathematical reasoning
File type identification and guidance

📊 GAIA Benchmark Performance

Question Types Handled

Research Questions
- Wikipedia information retrieval
- YouTube video analysis
- General web search queries
- Historical and factual questions
Logic & Reasoning
- Reversed text puzzles
- Mathematical calculations
- Set theory problems (commutativity, etc.)
- Pattern recognition
File Analysis
- Image analysis (chess positions, visual content)
- Audio processing (speech-to-text requirements)
- Code execution and analysis
- Spreadsheet data processing
Multi-step Problems
- Complex queries requiring multiple agents
- Sequential reasoning tasks
- Cross-domain problem solving

Example Question Processing

Reversed Text Question:

Input: ".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI"
Processing: Reasoning Agent → Text Analysis Tool → "right"

Research Question:

Input: "Who nominated the only Featured Article on English Wikipedia about a dinosaur promoted in November 2016?"
Processing: Supervisor → Research Agent → Web Search → Detailed Answer

🚀 Deployment

Hugging Face Spaces

The system is designed for deployment on Hugging Face Spaces with:

Automatic dependency installation
OAuth integration for user authentication
Real-time processing and submission to GAIA API
Comprehensive result tracking and display

Environment Variables

Required for full functionality:

OPENAI_API_KEY=your_openai_api_key_here
SPACE_ID=your_huggingface_space_id

Local Development

Clone the repository

Set up virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```

📈 Performance Optimization

Scoring Strategy

The system aims for 30%+ accuracy on the GAIA benchmark through:

Intelligent Routing: Questions are automatically routed to the most appropriate specialist agent
Tool Specialization: Each agent has access to tools optimized for their domain
Fallback Mechanisms: Rule-based processing when LLM services are unavailable
Error Handling: Robust error management and graceful degradation

Bonus Features

LangSmith Integration: Ready for observability and monitoring
Free Tools Only: Uses only free/open-source tools for accessibility
Extensible Architecture: Easy to add new agents and capabilities

🔧 Configuration

Agent Prompts

Each agent has carefully crafted prompts for optimal performance:

Supervisor: Focuses on task analysis and routing decisions
Research: Emphasizes reliable source identification and factual accuracy
Reasoning: Promotes step-by-step logical analysis
File: Provides structured analysis frameworks for different file types

Tool Integration

Tools are integrated using LangChain's @tool decorator with proper error handling and type hints for reliable operation.

📝 Usage

Login: Authenticate with your Hugging Face account
Submit: Click "Run Evaluation & Submit All Answers"
Monitor: Watch real-time processing of questions
Review: Examine results and scoring in the interface

🤝 Contributing

This implementation serves as a foundation for advanced multi-agent systems. Key areas for enhancement:

Additional specialized agents (e.g., code execution, image analysis)
Advanced reasoning capabilities
Integration with more powerful models
Enhanced tool ecosystem

📚 References

Note: This system demonstrates advanced multi-agent coordination using LangGraph and represents a production-ready approach to complex AI task management.