A newer version of the Gradio SDK is available:
6.5.1
LangGraph Agent System Architecture
This document describes the architecture of the multi-agent system implemented using LangGraph 0.4.8+ and Langfuse 3.0.0.
System Overview
The system implements a sophisticated agent architecture with memory, routing, specialized agents, and verification as shown in the system diagram.
Core Components
1. Memory Layer
- Short-Term Memory: Graph state managed by LangGraph checkpointing
- Checkpointer: SQLite-based persistence for conversation continuity
- Long-Term Memory: Supabase vector store with pgvector for Q&A storage
2. Plan + ReAct Loop
- Initial query analysis and planning
- Contextual prompt injection with system requirements
- Memory retrieval for similar past questions
3. Agent Router
- Intelligent routing based on query analysis
- Routes to specialized agents: Retrieval, Execution, or Critic
- Uses low-temperature LLM for consistent routing decisions
4. Specialized Agents
Retrieval Agent
- Information gathering from external sources
- Tools: Wikipedia, Arxiv, Tavily web search, vector store retrieval
- Handles attachment downloading for GAIA tasks
- Context-aware with memory integration
Execution Agent
- Computational tasks and code execution
- Integrates with existing
code_agent.pysandbox - Python code execution with pandas, cv2, standard libraries
- Step-by-step problem breakdown
Critic Agent
- Response quality evaluation and review
- Accuracy, completeness, and logical consistency checks
- Scoring system with pass/fail determination
- Constructive feedback generation
5. Verification & Fallback
- Final quality control with system prompt compliance
- Format verification for exact-match requirements
- Retry logic with maximum attempt limits
- Graceful fallback pipeline for failed attempts
6. Observability (Langfuse)
- End-to-end tracing of all agent interactions
- Performance monitoring and debugging
- User session tracking
- Error logging and analysis
Data Flow
- User Query β Plan Node (system prompt injection)
- Plan Node β Router (agent selection)
- Router β Specialized Agent (task execution)
- Agent β Tools (if needed) β Agent (results)
- Agent β Verification (quality check)
- Verification β Output or Retry/Fallback
Key Features
Memory Management
- Caching of similarity searches (TTL-based)
- Duplicate detection and prevention
- Task-based attachment tracking
- Session-specific cache management
Quality Control
- Multi-level verification (agent β critic β verification)
- Retry mechanism with attempt limits
- Format compliance checking
- Fallback responses for failures
Tracing & Observability
- Langfuse integration for complete observability
- Agent-level span tracking
- Error monitoring and debugging
- Performance metrics collection
Tool Integration
- Modular tool system for each agent
- Sandboxed code execution environment
- External API integration (search, knowledge bases)
- Attachment handling for complex tasks
Configuration
Environment Variables
See env.template for required configuration:
- LLM API keys (Groq, OpenAI, Google, HuggingFace)
- Search tools (Tavily)
- Vector store (Supabase)
- Observability (Langfuse)
- GAIA API endpoints
System Prompts
Located in prompts/ directory:
system_prompt.txt: Main system requirementsrouter_prompt.txt: Agent routing instructionsretrieval_prompt.txt: Information gathering guidelinesexecution_prompt.txt: Code execution instructionscritic_prompt.txt: Quality evaluation criteriaverification_prompt.txt: Final formatting rules
Usage
Basic Usage
from src import run_agent_system
result = run_agent_system(
query="Your question here",
user_id="user123",
session_id="session456"
)
With Memory Management
from src import memory_manager
# Check if query is similar to previous ones
similar = memory_manager.get_similar_qa(query)
# Clear session cache
memory_manager.clear_session_cache()
Direct Graph Access
from src import create_agent_graph
workflow = create_agent_graph()
app = workflow.compile(checkpointer=checkpointer)
result = app.invoke(initial_state, config=config)
Dependencies
Core Framework
langgraph>=0.4.8: Graph-based agent orchestrationlanggraph-checkpoint-sqlite>=2.0.0: Persistence layerlangchain>=0.3.0: LLM and tool abstractions
Observability
langfuse==3.0.0: Tracing and monitoring
Memory & Storage
supabase>=2.8.0: Vector database backendpgvector>=0.3.0: Vector similarity search
Tools & APIs
tavily-python>=0.5.0: Web searcharxiv>=2.1.0: Academic paper searchwikipedia>=1.4.0: Knowledge base access
Error Handling
The system implements comprehensive error handling:
- Graceful degradation when services are unavailable
- Fallback responses for critical failures
- Retry logic with exponential backoff
- Detailed error logging for debugging
Performance Considerations
- Vector store caching reduces duplicate searches
- Checkpoint-based state management for conversation continuity
- Efficient tool routing based on query analysis
- Memory cleanup for long-running sessions
Future Enhancements
- Additional specialized agents (e.g., Image Analysis, Code Review)
- Enhanced memory clustering and retrieval algorithms
- Real-time collaboration between agents
- Advanced tool composition and chaining