agent-mcp-sql / SYSTEM_OVERVIEW.md
Timothy Eastridge
commit streamlit
f831e98
#### Quick start prompt:
Scan the repo: list the directory tree, key config files, and required env vars. Summarize how to start the system using existing scripts. Do not modify files; just report and wait for confirmation before any changes.
Proceed with a fresh start by leveraging: powershell -ExecutionPolicy Bypass -File ops/scripts/fresh_start.ps1
# Graph-Driven Agentic System with Human-in-the-Loop Controls
## What This System Is
This is a **production-ready agentic workflow orchestration system** that demonstrates how to build AI agents with human oversight and complete audit trails. The system combines:
- **πŸ€– Autonomous AI Agent**: Processes natural language queries and generates SQL
- **πŸ“Š Graph Database**: Neo4j stores all workflow metadata and audit trails
- **⏸️ Human-in-the-Loop**: Configurable pause points for human review and intervention
- **🎯 Single API Gateway**: All operations routed through MCP (Model Context Protocol) server
- **🌐 Real-time Interface**: React frontend with live workflow visualization
- **πŸ” Complete Observability**: Every action logged with timestamps and relationships
## What It Does
### Core Workflow
1. **User asks a question** in natural language via the web interface
2. **System creates a workflow** with multiple instruction steps in Neo4j
3. **Agent discovers the question** and begins processing
4. **Pause for human review** (5 minutes by default, configurable)
5. **Human can edit instructions** during pause via Neo4j Browser
6. **Agent generates SQL** from natural language using LLM
7. **Agent executes SQL** against PostgreSQL database
8. **Results displayed** in formatted table with complete audit trail
### Architecture Components
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend │────│ MCP Server │────│ Neo4j β”‚
β”‚ (Next.js) β”‚ β”‚ (FastAPI) β”‚ β”‚ (Graph) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Agent │────│ PostgreSQL β”‚
β”‚ (Python) β”‚ β”‚ (Data) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
- **Neo4j Graph Database**: Stores workflows, instructions, executions, and logs
- **MCP Server**: FastAPI gateway for all Neo4j operations with parameter fixing
- **Python Agent**: Polls for instructions, pauses for human input, executes tasks
- **PostgreSQL**: Sample data source for SQL generation and execution
- **Next.js Frontend**: Chat interface with Cytoscape.js graph visualization
## Why It's Valuable
### 🎯 **Demonstrates Production Patterns**
- **Human Oversight**: Shows how to build AI systems with meaningful human control
- **Audit Trails**: Complete graph-based logging of all operations and decisions
- **Error Recovery**: System continues gracefully after interruptions or edits
- **Scalable Architecture**: Clean separation of concerns, containerized deployment
### πŸ”„ **Agentic Workflow Orchestration**
- **Graph-Driven**: Workflows stored as connected nodes, not brittle state machines
- **Dynamic Editing**: Instructions can be modified during execution
- **Sequence Management**: Proper instruction chaining and dependency handling
- **Status Tracking**: Real-time visibility into workflow progress
### πŸ›‘οΈ **Human-in-the-Loop Controls**
- **Configurable Pauses**: Built-in review periods before critical operations
- **Live Editing**: Modify AI behavior during execution via graph database
- **Stop Controls**: Terminate workflows at any point
- **Parameter Updates**: Change questions, settings, or instructions mid-flight
### πŸ“Š **Complete Observability**
- **Graph Visualization**: Real-time workflow progress with color-coded status
- **Audit Logging**: Every MCP operation logged with timestamps
- **Execution Tracking**: Full history of what was generated and executed
- **Result Storage**: All outputs preserved in queryable graph format
### πŸš€ **Production Ready**
- **Containerized**: Full Docker Compose setup with health checks
- **Environment Configuration**: Flexible .env-based configuration
- **Error Handling**: Graceful failures and recovery mechanisms
- **Documentation**: Comprehensive setup, usage, and troubleshooting guides
## How to Make It Run
### Quick Start (5 minutes)
```bash
# 1. Clone and navigate to the repo
git clone <repository-url>
cd <repository-name>
# 2. Copy environment template
cp .env.example .env
# 3. Add your LLM API key to .env
# Edit .env and set: LLM_API_KEY=your-openai-or-anthropic-key-here
# 4. Start everything
docker-compose up -d
# 5. Seed Neo4j with demo data (IMPORTANT!)
docker-compose exec mcp python /app/ops/scripts/seed.py
# 6. Open the interface
# Frontend: http://localhost:3000
# Neo4j Browser: http://localhost:7474 (neo4j/password)
```
### Database Seeding Options
**Basic Seeding** (Quick demo):
```bash
docker-compose exec mcp python /app/ops/scripts/seed.py
```
Creates:
- **Demo Workflow**: A 3-step process (discover schema β†’ generate SQL β†’ review results)
- **Query Examples**: 3 basic SQL templates for testing
- **Graph Structure**: Proper relationships between components
**Comprehensive Seeding** (Full system):
```bash
docker-compose exec mcp python /app/ops/scripts/seed_comprehensive.py
```
Creates:
- **Workflow Templates**: Multiple workflow patterns (basic query, analysis, reporting)
- **Instruction Type Library**: 6 different instruction types with schemas
- **Query Library**: 6+ categorized SQL examples (basic, analytics, detailed)
- **Demo Workflows**: Ready-to-run and template workflows
- **System Configuration**: Default settings and supported features
**⚠️ Fresh Installation**: On a brand-new machine, Neo4j starts completely empty. You MUST run a seed script to have any workflows or instructions to interact with.
**πŸ’‘ Recommendation**: Use comprehensive seeding for full system exploration, basic seeding for quick demos.
### PowerShell Fresh Start (Windows)
```powershell
# Fresh deployment with API key
powershell -ExecutionPolicy Bypass -File ops/scripts/fresh_start.ps1 -ApiKey "your-api-key-here"
# Or run the demo (assumes system is already running)
powershell -ExecutionPolicy Bypass -File ops/scripts/demo.ps1
```
### Manual Health Check
```bash
# Check all services
docker-compose ps
# Validate system
docker-compose exec mcp python /app/ops/scripts/validate.py
# Monitor logs
docker-compose logs -f agent
```
### Test the System
1. **Open http://localhost:3000**
2. **Ask a question**: "How many customers do we have?"
3. **Watch the workflow**:
- Graph visualization shows progress
- Agent pauses for 5 minutes
- You can edit instructions in Neo4j Browser
- Results appear in formatted table
### Clean Reset
```bash
# Stop and clean everything
docker-compose down
docker-compose up -d
docker-compose exec mcp python /app/ops/scripts/seed.py
```
## Key Features for Developers
### Graph Database Schema
- **Workflow** nodes: High-level process containers
- **Instruction** nodes: Individual tasks with parameters and status
- **Execution** nodes: Results of instruction processing
- **Log** nodes: Audit trail of all MCP operations
- **Relationships**: `HAS_INSTRUCTION`, `EXECUTED_AS`, `NEXT_INSTRUCTION`
### Configuration Options
- **Pause Duration**: `PAUSE_DURATION` in .env (default: 300 seconds)
- **Polling Interval**: `AGENT_POLL_INTERVAL` in .env (default: 30 seconds)
- **LLM Model**: `LLM_MODEL` in .env (gpt-4, claude-3-sonnet, etc.)
### Extension Points
- **New Instruction Types**: Add handlers in `agent/main.py`
- **Custom Data Sources**: Extend MCP server with new connectors
- **Frontend Customization**: Modify React components in `frontend/app/`
- **Workflow Templates**: Create reusable instruction sequences
### Human Intervention Examples
```cypher
// Find pending instructions
MATCH (i:Instruction {status: 'pending'}) RETURN i
// Change a question
MATCH (i:Instruction {type: 'generate_sql', status: 'pending'})
SET i.parameters = '{"question": "Show me top 10 customers by revenue"}'
// Stop a workflow
MATCH (w:Workflow {status: 'active'})
SET w.status = 'stopped'
```
## Development Setup
### Prerequisites
- Docker & Docker Compose
- OpenAI or Anthropic API key
- Modern web browser
### Project Structure
```
β”œβ”€β”€ agent/ # Python agent that executes instructions
β”œβ”€β”€ frontend/ # Next.js chat interface
β”œβ”€β”€ mcp/ # FastAPI server for Neo4j operations
β”œβ”€β”€ neo4j/ # Neo4j configuration
β”œβ”€β”€ postgres/ # PostgreSQL setup with sample data
β”œβ”€β”€ ops/scripts/ # Operational scripts (seed, validate, demo)
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Makefile # Convenience commands
└── README.md # Detailed documentation
```
### Available Commands
```bash
# If you have make installed
make up # Start all services
make seed # Create demo data
make health # Check service health
make logs # View all logs
make clean # Reset everything
# Using docker-compose directly
docker-compose up -d
docker-compose exec mcp python /app/ops/scripts/seed.py
docker-compose ps
docker-compose logs -f
docker-compose down
```
## Use Cases
### 🏒 **Enterprise AI Governance**
- Audit trails for compliance
- Human oversight for critical decisions
- Risk management in AI operations
### πŸ”¬ **Research & Development**
- Experiment with agentic workflows
- Study human-AI collaboration patterns
- Prototype autonomous systems with safety controls
### πŸ“š **Educational Examples**
- Demonstrate production AI architecture
- Teach graph database concepts
- Show containerized deployment patterns
### πŸ› οΈ **Template for New Projects**
- Fork as starting point for agentic systems
- Adapt components for specific domains
- Scale architecture for production workloads
---
**This system demonstrates that AI agents can be both autonomous and controllable, providing the benefits of automation while maintaining human oversight and complete transparency.**