# LangGraph Multi-Agent System

A sophisticated multi-agent system built with LangGraph that follows best practices for state management, tracing, and iterative workflows.

## Architecture Overview

The system implements an iterative research/code loop with specialized agents:

```
User Query → Lead Agent → Research Agent → Code Agent → Lead Agent (loop) → Answer Formatter → Final Answer
```

### Key Components

1. **Lead Agent** (`agents/lead_agent.py`)
   - Orchestrates the entire workflow
   - Makes routing decisions between research and code agents
   - Manages the iterative loop with a maximum of 3 iterations
   - Synthesizes information from specialists into draft answers

2. **Research Agent** (`agents/research_agent.py`)
   - Handles information gathering from multiple sources
   - Uses web search (Tavily), Wikipedia, and ArXiv tools
   - Provides structured research results with citations

3. **Code Agent** (`agents/code_agent.py`)
   - Performs mathematical calculations and code execution
   - Uses calculator tools for basic operations
   - Executes Python code in a sandboxed environment
   - Handles Hugging Face Hub statistics

4. **Answer Formatter** (`agents/answer_formatter.py`)
   - Ensures GAIA benchmark compliance
   - Extracts final answers according to exact-match rules
   - Handles different answer types (numbers, strings, lists)

5. **Memory System** (`memory_system.py`)
   - Vector store integration for long-term learning
   - Session-based caching for performance
   - Similar question retrieval for context

## Core Features

### State Management
- **Immutable State**: Uses LangGraph's Command pattern for pure functions
- **Typed Schema**: AgentState TypedDict ensures type safety
- **Accumulation**: Research notes and code outputs accumulate across iterations

### Observability (Langfuse v3)
- **OTEL-Native Integration**: Uses Langfuse v3 with OpenTelemetry for automatic trace correlation
- **Single Callback Handler**: One global handler passes traces seamlessly through LangGraph 
- **Predictable Span Naming**: `agent/<role>`, `tool/<name>`, `llm/<model>` patterns for cost/latency dashboards
- **Session Stitching**: User and session tracking for conversation continuity
- **Background Flushing**: Non-blocking trace export for optimal performance

### Tools Integration
- **Web Search**: Tavily API for current information
- **Knowledge Bases**: Wikipedia and ArXiv for encyclopedic/academic content
- **Computation**: Calculator tools and Python execution
- **Hub Statistics**: Hugging Face model information

## Setup

### Environment Variables
Create an `env.local` file with:

```bash
# LLM API
GROQ_API_KEY=your_groq_api_key

# Search Tools
TAVILY_API_KEY=your_tavily_api_key

# Observability
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com

# Memory (Optional)
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key
```

### Dependencies
The system requires:
- `langgraph>=0.4.8`
- `langchain>=0.3.0`
- `langchain-groq`
- `langfuse>=3.0.0`
- `python-dotenv`
- `tavily-python`

## Usage

### Basic Usage

```python
import asyncio
from langgraph_agent_system import run_agent_system

async def main():
    result = await run_agent_system(
        query="What is the capital of Maharashtra?",
        user_id="user_123",
        session_id="session_456"
    )
    print(f"Answer: {result}")

asyncio.run(main())
```

### Testing

Run the test suite to verify functionality:

```bash
python test_new_multi_agent_system.py
```

Test Langfuse v3 observability integration:

```bash
python test_observability.py
```

### Direct Graph Access

```python
from langgraph_agent_system import create_agent_graph

# Create and compile the workflow
workflow = create_agent_graph()
app = workflow.compile()

# Run with initial state
initial_state = {
    "messages": [HumanMessage(content="Your question")],
    "draft_answer": "",
    "research_notes": "",
    "code_outputs": "",
    "loop_counter": 0,
    "done": False,
    "next": "research",
    "final_answer": "",
    "user_id": "user_123",
    "session_id": "session_456"
}

final_state = await app.ainvoke(initial_state)
print(final_state["final_answer"])
```

## Workflow Details

### Iterative Loop
1. **Lead Agent** analyzes the query and decides on next action
2. If research needed → **Research Agent** gathers information
3. If computation needed → **Code Agent** performs calculations
4. Back to **Lead Agent** for synthesis and next decision
5. When sufficient information → **Answer Formatter** creates final answer

### Routing Logic
The Lead Agent uses the following criteria:
- **Research**: Factual information, current events, citations needed
- **Code**: Mathematical calculations, data analysis, programming tasks
- **Formatter**: Sufficient information gathered OR max iterations reached

### GAIA Compliance
The Answer Formatter ensures exact-match requirements:
- **Numbers**: No commas, units, or extra symbols
- **Strings**: Remove unnecessary articles and formatting
- **Lists**: Comma and space separation
- **No surrounding text**: No "Answer:", quotes, or brackets

## Best Practices Implemented

### LangGraph Patterns
- ✅ Pure functions (AgentState → Command)
- ✅ Immutable state with explicit updates
- ✅ Typed state schema with operator annotations
- ✅ Clear routing separated from business logic

### Langfuse v3 Observability
- ✅ OTEL-native SDK with automatic trace correlation
- ✅ Single global callback handler for seamless LangGraph integration
- ✅ Predictable span naming (`agent/<role>`, `tool/<name>`, `llm/<model>`)
- ✅ Session and user tracking with environment tagging
- ✅ Background trace flushing for performance
- ✅ Graceful degradation when observability unavailable

### Memory Management
- ✅ TTL-based caching for performance
- ✅ Vector store integration for learning
- ✅ Duplicate detection and prevention
- ✅ Session cleanup for long-running instances

## Error Handling

The system implements graceful degradation:
- **Tool failures**: Continue with available tools
- **API timeouts**: Retry with backoff
- **Memory errors**: Degrade to LLM-only mode
- **Agent failures**: Return informative error messages

## Performance Considerations

- **Caching**: Vector store searches cached for 5 minutes
- **Parallelization**: Tools can be executed in parallel
- **Memory limits**: Sandbox execution has resource constraints
- **Loop termination**: Hard limit of 3 iterations prevents infinite loops

## Extending the System

### Adding New Agents
1. Create agent file in `agents/` directory
2. Implement agent function returning Command
3. Add to workflow in `create_agent_graph()`
4. Update routing logic in Lead Agent

### Adding New Tools
1. Implement tool following LangChain Tool interface
2. Add to appropriate agent's tool list
3. Update agent prompts to describe new capabilities

### Custom Memory Backends
1. Extend MemoryManager class
2. Implement required interface methods
3. Update initialization in memory_system.py

## Troubleshooting

### Common Issues
- **Missing API keys**: Check env.local file setup
- **Tool failures**: Verify network connectivity and API quotas
- **Memory errors**: Check Supabase configuration (optional)
- **Import errors**: Ensure all dependencies are installed

### Debug Mode
Set environment variable for detailed logging:
```bash
export LANGFUSE_DEBUG=true
```

This implementation follows the specified plan while incorporating LangGraph and Langfuse best practices for a robust, observable, and maintainable multi-agent system.