agent-mcp-sql / SYSTEM_OVERVIEW.md

Timothy Eastridge

commit streamlit

f831e98 5 months ago

10.5 kB

	#### Quick start prompt:
	Scan the repo: list the directory tree, key config files, and required env vars. Summarize how to start the system using existing scripts. Do not modify files; just report and wait for confirmation before any changes.
	Proceed with a fresh start by leveraging: powershell -ExecutionPolicy Bypass -File ops/scripts/fresh_start.ps1


	# Graph-Driven Agentic System with Human-in-the-Loop Controls

	## What This System Is

	This is a production-ready agentic workflow orchestration system that demonstrates how to build AI agents with human oversight and complete audit trails. The system combines:

	- 🤖 Autonomous AI Agent: Processes natural language queries and generates SQL
	- 📊 Graph Database: Neo4j stores all workflow metadata and audit trails
	- ⏸️ Human-in-the-Loop: Configurable pause points for human review and intervention
	- 🎯 Single API Gateway: All operations routed through MCP (Model Context Protocol) server
	- 🌐 Real-time Interface: React frontend with live workflow visualization
	- 🔍 Complete Observability: Every action logged with timestamps and relationships

	## What It Does

	### Core Workflow
	1. User asks a question in natural language via the web interface
	2. System creates a workflow with multiple instruction steps in Neo4j
	3. Agent discovers the question and begins processing
	4. Pause for human review (5 minutes by default, configurable)
	5. Human can edit instructions during pause via Neo4j Browser
	6. Agent generates SQL from natural language using LLM
	7. Agent executes SQL against PostgreSQL database
	8. Results displayed in formatted table with complete audit trail

	### Architecture Components
	```
	┌─────────────┐ ┌─────────────┐ ┌─────────────┐
	│ Frontend │────│ MCP Server │────│ Neo4j │
	│ (Next.js) │ │ (FastAPI) │ │ (Graph) │
	└─────────────┘ └─────────────┘ └─────────────┘
	│
	┌─────────────┐ ┌─────────────┐
	│ Agent │────│ PostgreSQL │
	│ (Python) │ │ (Data) │
	└─────────────┘ └─────────────┘
	```

	- Neo4j Graph Database: Stores workflows, instructions, executions, and logs
	- MCP Server: FastAPI gateway for all Neo4j operations with parameter fixing
	- Python Agent: Polls for instructions, pauses for human input, executes tasks
	- PostgreSQL: Sample data source for SQL generation and execution
	- Next.js Frontend: Chat interface with Cytoscape.js graph visualization

	## Why It's Valuable

	### 🎯 Demonstrates Production Patterns
	- Human Oversight: Shows how to build AI systems with meaningful human control
	- Audit Trails: Complete graph-based logging of all operations and decisions
	- Error Recovery: System continues gracefully after interruptions or edits
	- Scalable Architecture: Clean separation of concerns, containerized deployment

	### 🔄 Agentic Workflow Orchestration
	- Graph-Driven: Workflows stored as connected nodes, not brittle state machines
	- Dynamic Editing: Instructions can be modified during execution
	- Sequence Management: Proper instruction chaining and dependency handling
	- Status Tracking: Real-time visibility into workflow progress

	### 🛡️ Human-in-the-Loop Controls
	- Configurable Pauses: Built-in review periods before critical operations
	- Live Editing: Modify AI behavior during execution via graph database
	- Stop Controls: Terminate workflows at any point
	- Parameter Updates: Change questions, settings, or instructions mid-flight

	### 📊 Complete Observability
	- Graph Visualization: Real-time workflow progress with color-coded status
	- Audit Logging: Every MCP operation logged with timestamps
	- Execution Tracking: Full history of what was generated and executed
	- Result Storage: All outputs preserved in queryable graph format

	### 🚀 Production Ready
	- Containerized: Full Docker Compose setup with health checks
	- Environment Configuration: Flexible .env-based configuration
	- Error Handling: Graceful failures and recovery mechanisms
	- Documentation: Comprehensive setup, usage, and troubleshooting guides

	## How to Make It Run

	### Quick Start (5 minutes)

	```bash
	# 1. Clone and navigate to the repo
	git clone <repository-url>
	cd <repository-name>

	# 2. Copy environment template
	cp .env.example .env

	# 3. Add your LLM API key to .env
	# Edit .env and set: LLM_API_KEY=your-openai-or-anthropic-key-here

	# 4. Start everything
	docker-compose up -d

	# 5. Seed Neo4j with demo data (IMPORTANT!)
	docker-compose exec mcp python /app/ops/scripts/seed.py

	# 6. Open the interface
	# Frontend: http://localhost:3000
	# Neo4j Browser: http://localhost:7474 (neo4j/password)
	```

	### Database Seeding Options

	Basic Seeding (Quick demo):
	```bash
	docker-compose exec mcp python /app/ops/scripts/seed.py
	```
	Creates:
	- Demo Workflow: A 3-step process (discover schema → generate SQL → review results)
	- Query Examples: 3 basic SQL templates for testing
	- Graph Structure: Proper relationships between components

	Comprehensive Seeding (Full system):
	```bash
	docker-compose exec mcp python /app/ops/scripts/seed_comprehensive.py
	```
	Creates:
	- Workflow Templates: Multiple workflow patterns (basic query, analysis, reporting)
	- Instruction Type Library: 6 different instruction types with schemas
	- Query Library: 6+ categorized SQL examples (basic, analytics, detailed)
	- Demo Workflows: Ready-to-run and template workflows
	- System Configuration: Default settings and supported features

	⚠️ Fresh Installation: On a brand-new machine, Neo4j starts completely empty. You MUST run a seed script to have any workflows or instructions to interact with.

	💡 Recommendation: Use comprehensive seeding for full system exploration, basic seeding for quick demos.

	### PowerShell Fresh Start (Windows)
	```powershell
	# Fresh deployment with API key
	powershell -ExecutionPolicy Bypass -File ops/scripts/fresh_start.ps1 -ApiKey "your-api-key-here"

	# Or run the demo (assumes system is already running)
	powershell -ExecutionPolicy Bypass -File ops/scripts/demo.ps1
	```

	### Manual Health Check
	```bash
	# Check all services
	docker-compose ps

	# Validate system
	docker-compose exec mcp python /app/ops/scripts/validate.py

	# Monitor logs
	docker-compose logs -f agent
	```

	### Test the System

	1. Open http://localhost:3000
	2. Ask a question: "How many customers do we have?"
	3. Watch the workflow:
	- Graph visualization shows progress
	- Agent pauses for 5 minutes
	- You can edit instructions in Neo4j Browser
	- Results appear in formatted table

	### Clean Reset
	```bash
	# Stop and clean everything
	docker-compose down
	docker-compose up -d
	docker-compose exec mcp python /app/ops/scripts/seed.py
	```

	## Key Features for Developers

	### Graph Database Schema
	- Workflow nodes: High-level process containers
	- Instruction nodes: Individual tasks with parameters and status
	- Execution nodes: Results of instruction processing
	- Log nodes: Audit trail of all MCP operations
	- Relationships: `HAS_INSTRUCTION`, `EXECUTED_AS`, `NEXT_INSTRUCTION`

	### Configuration Options
	- Pause Duration: `PAUSE_DURATION` in .env (default: 300 seconds)
	- Polling Interval: `AGENT_POLL_INTERVAL` in .env (default: 30 seconds)
	- LLM Model: `LLM_MODEL` in .env (gpt-4, claude-3-sonnet, etc.)

	### Extension Points
	- New Instruction Types: Add handlers in `agent/main.py`
	- Custom Data Sources: Extend MCP server with new connectors
	- Frontend Customization: Modify React components in `frontend/app/`
	- Workflow Templates: Create reusable instruction sequences

	### Human Intervention Examples
	```cypher
	// Find pending instructions
	MATCH (i:Instruction {status: 'pending'}) RETURN i

	// Change a question
	MATCH (i:Instruction {type: 'generate_sql', status: 'pending'})
	SET i.parameters = '{"question": "Show me top 10 customers by revenue"}'

	// Stop a workflow
	MATCH (w:Workflow {status: 'active'})
	SET w.status = 'stopped'
	```

	## Development Setup

	### Prerequisites
	- Docker & Docker Compose
	- OpenAI or Anthropic API key
	- Modern web browser

	### Project Structure
	```
	├── agent/ # Python agent that executes instructions
	├── frontend/ # Next.js chat interface
	├── mcp/ # FastAPI server for Neo4j operations
	├── neo4j/ # Neo4j configuration
	├── postgres/ # PostgreSQL setup with sample data
	├── ops/scripts/ # Operational scripts (seed, validate, demo)
	├── docker-compose.yml
	├── Makefile # Convenience commands
	└── README.md # Detailed documentation
	```

	### Available Commands
	```bash
	# If you have make installed
	make up # Start all services
	make seed # Create demo data
	make health # Check service health
	make logs # View all logs
	make clean # Reset everything

	# Using docker-compose directly
	docker-compose up -d
	docker-compose exec mcp python /app/ops/scripts/seed.py
	docker-compose ps
	docker-compose logs -f
	docker-compose down
	```

	## Use Cases

	### 🏢 Enterprise AI Governance
	- Audit trails for compliance
	- Human oversight for critical decisions
	- Risk management in AI operations

	### 🔬 Research & Development
	- Experiment with agentic workflows
	- Study human-AI collaboration patterns
	- Prototype autonomous systems with safety controls

	### 📚 Educational Examples
	- Demonstrate production AI architecture
	- Teach graph database concepts
	- Show containerized deployment patterns

	### 🛠️ Template for New Projects
	- Fork as starting point for agentic systems
	- Adapt components for specific domains
	- Scale architecture for production workloads

	---

	This system demonstrates that AI agents can be both autonomous and controllable, providing the benefits of automation while maintaining human oversight and complete transparency.