Final_Assignment_Template / README_MULTI_AGENT_SYSTEM.md
Humanlearning's picture
updated agent
f844f16

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

LangGraph Multi-Agent System

A sophisticated multi-agent system built with LangGraph that follows best practices for state management, tracing, and iterative workflows.

Architecture Overview

The system implements an iterative research/code loop with specialized agents:

User Query β†’ Lead Agent β†’ Research Agent β†’ Code Agent β†’ Lead Agent (loop) β†’ Answer Formatter β†’ Final Answer

Key Components

  1. Lead Agent (agents/lead_agent.py)

    • Orchestrates the entire workflow
    • Makes routing decisions between research and code agents
    • Manages the iterative loop with a maximum of 3 iterations
    • Synthesizes information from specialists into draft answers
  2. Research Agent (agents/research_agent.py)

    • Handles information gathering from multiple sources
    • Uses web search (Tavily), Wikipedia, and ArXiv tools
    • Provides structured research results with citations
  3. Code Agent (agents/code_agent.py)

    • Performs mathematical calculations and code execution
    • Uses calculator tools for basic operations
    • Executes Python code in a sandboxed environment
    • Handles Hugging Face Hub statistics
  4. Answer Formatter (agents/answer_formatter.py)

    • Ensures GAIA benchmark compliance
    • Extracts final answers according to exact-match rules
    • Handles different answer types (numbers, strings, lists)
  5. Memory System (memory_system.py)

    • Vector store integration for long-term learning
    • Session-based caching for performance
    • Similar question retrieval for context

Core Features

State Management

  • Immutable State: Uses LangGraph's Command pattern for pure functions
  • Typed Schema: AgentState TypedDict ensures type safety
  • Accumulation: Research notes and code outputs accumulate across iterations

Observability (Langfuse v3)

  • OTEL-Native Integration: Uses Langfuse v3 with OpenTelemetry for automatic trace correlation
  • Single Callback Handler: One global handler passes traces seamlessly through LangGraph
  • Predictable Span Naming: agent/<role>, tool/<name>, llm/<model> patterns for cost/latency dashboards
  • Session Stitching: User and session tracking for conversation continuity
  • Background Flushing: Non-blocking trace export for optimal performance

Tools Integration

  • Web Search: Tavily API for current information
  • Knowledge Bases: Wikipedia and ArXiv for encyclopedic/academic content
  • Computation: Calculator tools and Python execution
  • Hub Statistics: Hugging Face model information

Setup

Environment Variables

Create an env.local file with:

# LLM API
GROQ_API_KEY=your_groq_api_key

# Search Tools
TAVILY_API_KEY=your_tavily_api_key

# Observability
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com

# Memory (Optional)
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key

Dependencies

The system requires:

  • langgraph>=0.4.8
  • langchain>=0.3.0
  • langchain-groq
  • langfuse>=3.0.0
  • python-dotenv
  • tavily-python

Usage

Basic Usage

import asyncio
from langgraph_agent_system import run_agent_system

async def main():
    result = await run_agent_system(
        query="What is the capital of Maharashtra?",
        user_id="user_123",
        session_id="session_456"
    )
    print(f"Answer: {result}")

asyncio.run(main())

Testing

Run the test suite to verify functionality:

python test_new_multi_agent_system.py

Test Langfuse v3 observability integration:

python test_observability.py

Direct Graph Access

from langgraph_agent_system import create_agent_graph

# Create and compile the workflow
workflow = create_agent_graph()
app = workflow.compile()

# Run with initial state
initial_state = {
    "messages": [HumanMessage(content="Your question")],
    "draft_answer": "",
    "research_notes": "",
    "code_outputs": "",
    "loop_counter": 0,
    "done": False,
    "next": "research",
    "final_answer": "",
    "user_id": "user_123",
    "session_id": "session_456"
}

final_state = await app.ainvoke(initial_state)
print(final_state["final_answer"])

Workflow Details

Iterative Loop

  1. Lead Agent analyzes the query and decides on next action
  2. If research needed β†’ Research Agent gathers information
  3. If computation needed β†’ Code Agent performs calculations
  4. Back to Lead Agent for synthesis and next decision
  5. When sufficient information β†’ Answer Formatter creates final answer

Routing Logic

The Lead Agent uses the following criteria:

  • Research: Factual information, current events, citations needed
  • Code: Mathematical calculations, data analysis, programming tasks
  • Formatter: Sufficient information gathered OR max iterations reached

GAIA Compliance

The Answer Formatter ensures exact-match requirements:

  • Numbers: No commas, units, or extra symbols
  • Strings: Remove unnecessary articles and formatting
  • Lists: Comma and space separation
  • No surrounding text: No "Answer:", quotes, or brackets

Best Practices Implemented

LangGraph Patterns

  • βœ… Pure functions (AgentState β†’ Command)
  • βœ… Immutable state with explicit updates
  • βœ… Typed state schema with operator annotations
  • βœ… Clear routing separated from business logic

Langfuse v3 Observability

  • βœ… OTEL-native SDK with automatic trace correlation
  • βœ… Single global callback handler for seamless LangGraph integration
  • βœ… Predictable span naming (agent/<role>, tool/<name>, llm/<model>)
  • βœ… Session and user tracking with environment tagging
  • βœ… Background trace flushing for performance
  • βœ… Graceful degradation when observability unavailable

Memory Management

  • βœ… TTL-based caching for performance
  • βœ… Vector store integration for learning
  • βœ… Duplicate detection and prevention
  • βœ… Session cleanup for long-running instances

Error Handling

The system implements graceful degradation:

  • Tool failures: Continue with available tools
  • API timeouts: Retry with backoff
  • Memory errors: Degrade to LLM-only mode
  • Agent failures: Return informative error messages

Performance Considerations

  • Caching: Vector store searches cached for 5 minutes
  • Parallelization: Tools can be executed in parallel
  • Memory limits: Sandbox execution has resource constraints
  • Loop termination: Hard limit of 3 iterations prevents infinite loops

Extending the System

Adding New Agents

  1. Create agent file in agents/ directory
  2. Implement agent function returning Command
  3. Add to workflow in create_agent_graph()
  4. Update routing logic in Lead Agent

Adding New Tools

  1. Implement tool following LangChain Tool interface
  2. Add to appropriate agent's tool list
  3. Update agent prompts to describe new capabilities

Custom Memory Backends

  1. Extend MemoryManager class
  2. Implement required interface methods
  3. Update initialization in memory_system.py

Troubleshooting

Common Issues

  • Missing API keys: Check env.local file setup
  • Tool failures: Verify network connectivity and API quotas
  • Memory errors: Check Supabase configuration (optional)
  • Import errors: Ensure all dependencies are installed

Debug Mode

Set environment variable for detailed logging:

export LANGFUSE_DEBUG=true

This implementation follows the specified plan while incorporating LangGraph and Langfuse best practices for a robust, observable, and maintainable multi-agent system.