Final_Assignment_Template / README_MULTI_AGENT_SYSTEM.md
Humanlearning's picture
updated agent
f844f16
# LangGraph Multi-Agent System
A sophisticated multi-agent system built with LangGraph that follows best practices for state management, tracing, and iterative workflows.
## Architecture Overview
The system implements an iterative research/code loop with specialized agents:
```
User Query β†’ Lead Agent β†’ Research Agent β†’ Code Agent β†’ Lead Agent (loop) β†’ Answer Formatter β†’ Final Answer
```
### Key Components
1. **Lead Agent** (`agents/lead_agent.py`)
- Orchestrates the entire workflow
- Makes routing decisions between research and code agents
- Manages the iterative loop with a maximum of 3 iterations
- Synthesizes information from specialists into draft answers
2. **Research Agent** (`agents/research_agent.py`)
- Handles information gathering from multiple sources
- Uses web search (Tavily), Wikipedia, and ArXiv tools
- Provides structured research results with citations
3. **Code Agent** (`agents/code_agent.py`)
- Performs mathematical calculations and code execution
- Uses calculator tools for basic operations
- Executes Python code in a sandboxed environment
- Handles Hugging Face Hub statistics
4. **Answer Formatter** (`agents/answer_formatter.py`)
- Ensures GAIA benchmark compliance
- Extracts final answers according to exact-match rules
- Handles different answer types (numbers, strings, lists)
5. **Memory System** (`memory_system.py`)
- Vector store integration for long-term learning
- Session-based caching for performance
- Similar question retrieval for context
## Core Features
### State Management
- **Immutable State**: Uses LangGraph's Command pattern for pure functions
- **Typed Schema**: AgentState TypedDict ensures type safety
- **Accumulation**: Research notes and code outputs accumulate across iterations
### Observability (Langfuse v3)
- **OTEL-Native Integration**: Uses Langfuse v3 with OpenTelemetry for automatic trace correlation
- **Single Callback Handler**: One global handler passes traces seamlessly through LangGraph
- **Predictable Span Naming**: `agent/<role>`, `tool/<name>`, `llm/<model>` patterns for cost/latency dashboards
- **Session Stitching**: User and session tracking for conversation continuity
- **Background Flushing**: Non-blocking trace export for optimal performance
### Tools Integration
- **Web Search**: Tavily API for current information
- **Knowledge Bases**: Wikipedia and ArXiv for encyclopedic/academic content
- **Computation**: Calculator tools and Python execution
- **Hub Statistics**: Hugging Face model information
## Setup
### Environment Variables
Create an `env.local` file with:
```bash
# LLM API
GROQ_API_KEY=your_groq_api_key
# Search Tools
TAVILY_API_KEY=your_tavily_api_key
# Observability
LANGFUSE_PUBLIC_KEY=your_langfuse_public_key
LANGFUSE_SECRET_KEY=your_langfuse_secret_key
LANGFUSE_HOST=https://cloud.langfuse.com
# Memory (Optional)
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key
```
### Dependencies
The system requires:
- `langgraph>=0.4.8`
- `langchain>=0.3.0`
- `langchain-groq`
- `langfuse>=3.0.0`
- `python-dotenv`
- `tavily-python`
## Usage
### Basic Usage
```python
import asyncio
from langgraph_agent_system import run_agent_system
async def main():
result = await run_agent_system(
query="What is the capital of Maharashtra?",
user_id="user_123",
session_id="session_456"
)
print(f"Answer: {result}")
asyncio.run(main())
```
### Testing
Run the test suite to verify functionality:
```bash
python test_new_multi_agent_system.py
```
Test Langfuse v3 observability integration:
```bash
python test_observability.py
```
### Direct Graph Access
```python
from langgraph_agent_system import create_agent_graph
# Create and compile the workflow
workflow = create_agent_graph()
app = workflow.compile()
# Run with initial state
initial_state = {
"messages": [HumanMessage(content="Your question")],
"draft_answer": "",
"research_notes": "",
"code_outputs": "",
"loop_counter": 0,
"done": False,
"next": "research",
"final_answer": "",
"user_id": "user_123",
"session_id": "session_456"
}
final_state = await app.ainvoke(initial_state)
print(final_state["final_answer"])
```
## Workflow Details
### Iterative Loop
1. **Lead Agent** analyzes the query and decides on next action
2. If research needed β†’ **Research Agent** gathers information
3. If computation needed β†’ **Code Agent** performs calculations
4. Back to **Lead Agent** for synthesis and next decision
5. When sufficient information β†’ **Answer Formatter** creates final answer
### Routing Logic
The Lead Agent uses the following criteria:
- **Research**: Factual information, current events, citations needed
- **Code**: Mathematical calculations, data analysis, programming tasks
- **Formatter**: Sufficient information gathered OR max iterations reached
### GAIA Compliance
The Answer Formatter ensures exact-match requirements:
- **Numbers**: No commas, units, or extra symbols
- **Strings**: Remove unnecessary articles and formatting
- **Lists**: Comma and space separation
- **No surrounding text**: No "Answer:", quotes, or brackets
## Best Practices Implemented
### LangGraph Patterns
- βœ… Pure functions (AgentState β†’ Command)
- βœ… Immutable state with explicit updates
- βœ… Typed state schema with operator annotations
- βœ… Clear routing separated from business logic
### Langfuse v3 Observability
- βœ… OTEL-native SDK with automatic trace correlation
- βœ… Single global callback handler for seamless LangGraph integration
- βœ… Predictable span naming (`agent/<role>`, `tool/<name>`, `llm/<model>`)
- βœ… Session and user tracking with environment tagging
- βœ… Background trace flushing for performance
- βœ… Graceful degradation when observability unavailable
### Memory Management
- βœ… TTL-based caching for performance
- βœ… Vector store integration for learning
- βœ… Duplicate detection and prevention
- βœ… Session cleanup for long-running instances
## Error Handling
The system implements graceful degradation:
- **Tool failures**: Continue with available tools
- **API timeouts**: Retry with backoff
- **Memory errors**: Degrade to LLM-only mode
- **Agent failures**: Return informative error messages
## Performance Considerations
- **Caching**: Vector store searches cached for 5 minutes
- **Parallelization**: Tools can be executed in parallel
- **Memory limits**: Sandbox execution has resource constraints
- **Loop termination**: Hard limit of 3 iterations prevents infinite loops
## Extending the System
### Adding New Agents
1. Create agent file in `agents/` directory
2. Implement agent function returning Command
3. Add to workflow in `create_agent_graph()`
4. Update routing logic in Lead Agent
### Adding New Tools
1. Implement tool following LangChain Tool interface
2. Add to appropriate agent's tool list
3. Update agent prompts to describe new capabilities
### Custom Memory Backends
1. Extend MemoryManager class
2. Implement required interface methods
3. Update initialization in memory_system.py
## Troubleshooting
### Common Issues
- **Missing API keys**: Check env.local file setup
- **Tool failures**: Verify network connectivity and API quotas
- **Memory errors**: Check Supabase configuration (optional)
- **Import errors**: Ensure all dependencies are installed
### Debug Mode
Set environment variable for detailed logging:
```bash
export LANGFUSE_DEBUG=true
```
This implementation follows the specified plan while incorporating LangGraph and Langfuse best practices for a robust, observable, and maintainable multi-agent system.