RobotPai

Build error

App Files Files Community

RobotPai / docs /guides /SUPABASE_SQL_SETUP.md

atr0p05

Upload 291 files

8a682b5 verified 8 months ago

preview code

raw

history blame contribute delete

12.3 kB

	# Supabase SQL Setup Guide for AI Agent

	This guide provides all the SQL commands needed to set up your Supabase database for the AI Agent with resilience patterns implementation.

	## Prerequisites

	1. A Supabase project (create one at https://supabase.com)
	2. Access to the SQL Editor in your Supabase dashboard
	3. Your Supabase URL and API keys

	## Required Environment Variables

	Add these to your `.env` file:

	```bash
	SUPABASE_URL=https://your-project-id.supabase.co
	SUPABASE_KEY=your-anon-public-key
	SUPABASE_DB_PASSWORD=your-database-password
	```

	## SQL Tables Setup

	Execute these SQL commands in your Supabase SQL Editor in the following order:

	### 1. Enable Required Extensions

	```sql
	-- Enable pgvector for semantic search
	CREATE EXTENSION IF NOT EXISTS vector;

	-- Enable UUID generation
	CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
	```

	### 2. Core Knowledge Base Table

	```sql
	-- Create the table to store document chunks and their embeddings
	CREATE TABLE knowledge_base (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	node_id TEXT UNIQUE NOT NULL,
	embedding VECTOR(1536) NOT NULL, -- OpenAI 'text-embedding-3-small' produces 1536-dim vectors
	text TEXT,
	metadata_ JSONB DEFAULT '{}'::jsonb,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create an HNSW index for efficient similarity search
	CREATE INDEX ON knowledge_base USING hnsw (embedding vector_cosine_ops);

	-- Create a function for similarity search
	CREATE OR REPLACE FUNCTION match_documents (
	query_embedding VECTOR(1536),
	match_count INT,
	filter JSONB DEFAULT '{}'
	) RETURNS TABLE (
	id UUID,
	node_id TEXT,
	text TEXT,
	metadata_ JSONB,
	similarity FLOAT
	)
	LANGUAGE plpgsql
	AS $$
	BEGIN
	RETURN QUERY
	SELECT
	id,
	node_id,
	text,
	metadata_,
	1 - (knowledge_base.embedding <=> query_embedding) AS similarity
	FROM knowledge_base
	WHERE metadata_ @> filter
	ORDER BY knowledge_base.embedding <=> query_embedding
	LIMIT match_count;
	END;
	$$;
	```

	### 3. Agent Trajectory Logging Table

	```sql
	-- Create the table for logging agent trajectories
	CREATE TABLE agent_trajectory_logs (
	log_id BIGSERIAL PRIMARY KEY,
	run_id UUID NOT NULL,
	correlation_id UUID,
	timestamp TIMESTAMPTZ DEFAULT NOW(),
	step_type TEXT NOT NULL, -- e.g., 'REASON', 'ACTION', 'OBSERVATION', 'FINAL_ANSWER'
	fsm_state TEXT, -- Current FSM state
	payload JSONB,
	error_category TEXT,
	recovery_strategy TEXT,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create indexes for efficient querying
	CREATE INDEX idx_agent_trajectory_logs_run_id ON agent_trajectory_logs(run_id);
	CREATE INDEX idx_agent_trajectory_logs_correlation_id ON agent_trajectory_logs(correlation_id);
	CREATE INDEX idx_agent_trajectory_logs_timestamp ON agent_trajectory_logs(timestamp);
	CREATE INDEX idx_agent_trajectory_logs_step_type ON agent_trajectory_logs(step_type);
	```

	### 4. Tool Reliability Metrics Table

	```sql
	-- Track tool performance and reliability
	CREATE TABLE tool_reliability_metrics (
	tool_name TEXT PRIMARY KEY,
	success_count INTEGER DEFAULT 0,
	failure_count INTEGER DEFAULT 0,
	total_calls INTEGER DEFAULT 0,
	average_latency_ms REAL DEFAULT 0.0,
	last_used_at TIMESTAMP WITH TIME ZONE,
	last_error TEXT,
	error_patterns JSONB DEFAULT '[]'::jsonb,
	fallback_tools JSONB DEFAULT '[]'::jsonb,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create index for last_used_at for cleanup queries
	CREATE INDEX idx_tool_reliability_last_used ON tool_reliability_metrics(last_used_at);
	```

	### 5. Clarification Patterns Table

	```sql
	-- Store patterns of clarification requests for learning
	CREATE TABLE clarification_patterns (
	id TEXT PRIMARY KEY,
	original_query TEXT NOT NULL,
	query_embedding VECTOR(1536), -- For similarity search
	clarification_question TEXT NOT NULL,
	user_response TEXT NOT NULL,
	query_category TEXT NOT NULL,
	frequency INTEGER DEFAULT 1,
	effectiveness_score REAL DEFAULT 0.5,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	last_seen_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create indexes for efficient pattern matching
	CREATE INDEX idx_clarification_patterns_category ON clarification_patterns(query_category);
	CREATE INDEX idx_clarification_patterns_embedding ON clarification_patterns USING hnsw (query_embedding vector_cosine_ops);
	```

	### 6. Plan Corrections Table

	```sql
	-- Record user corrections to agent plans for improvement
	CREATE TABLE plan_corrections (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	query TEXT NOT NULL,
	original_plan JSONB NOT NULL,
	corrected_plan JSONB NOT NULL,
	correction_type TEXT NOT NULL, -- 'steps_added', 'steps_removed', 'parameters_changed', etc.
	user_feedback TEXT,
	applied_to_future_plans BOOLEAN DEFAULT FALSE,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create index for query similarity matching
	CREATE INDEX idx_plan_corrections_query ON plan_corrections USING gin(to_tsvector('english', query));
	```

	### 7. Knowledge Lifecycle Table

	```sql
	-- Track document freshness and validation needs
	CREATE TABLE knowledge_lifecycle (
	document_id TEXT PRIMARY KEY,
	source_url TEXT,
	document_type TEXT NOT NULL, -- 'news', 'documentation', 'research', etc.
	content_hash TEXT NOT NULL,
	ingested_at TIMESTAMP WITH TIME ZONE NOT NULL,
	last_validated_at TIMESTAMP WITH TIME ZONE NOT NULL,
	expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
	validation_status TEXT NOT NULL, -- 'valid', 'stale', 'expired', 'source_unavailable'
	update_frequency_days INTEGER NOT NULL,
	importance_score REAL DEFAULT 0.5,
	validation_failures INTEGER DEFAULT 0,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create indexes for lifecycle management
	CREATE INDEX idx_knowledge_lifecycle_expires ON knowledge_lifecycle(expires_at);
	CREATE INDEX idx_knowledge_lifecycle_validation ON knowledge_lifecycle(last_validated_at);
	CREATE INDEX idx_knowledge_lifecycle_status ON knowledge_lifecycle(validation_status);
	```

	### 8. Resilience Tracking Tables

	```sql
	-- Track GraphRecursionError occurrences and resolutions
	CREATE TABLE recursion_error_logs (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	correlation_id UUID,
	query TEXT NOT NULL,
	state_hash TEXT NOT NULL,
	loop_count INTEGER,
	stagnation_score INTEGER,
	resolution_strategy TEXT, -- 'force_termination', 'alternative_plan', 'user_clarification'
	final_answer TEXT,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Track state corruption incidents
	CREATE TABLE state_corruption_logs (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	correlation_id UUID,
	corrupting_node TEXT NOT NULL,
	failed_field TEXT NOT NULL,
	bad_value TEXT,
	expected_type TEXT,
	stack_trace TEXT,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);
	```

	### 9. Human-in-the-Loop Approvals

	```sql
	-- Track human approval requests and decisions
	CREATE TABLE human_approval_requests (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	correlation_id UUID,
	action_type TEXT NOT NULL, -- 'send_email', 'execute_code', 'modify_data', 'api_call'
	action_description TEXT NOT NULL,
	action_parameters JSONB NOT NULL,
	risk_level TEXT NOT NULL, -- 'low', 'medium', 'high', 'critical'
	reasoning TEXT,
	alternatives JSONB,
	approval_status TEXT DEFAULT 'pending', -- 'pending', 'approved', 'rejected', 'timeout'
	approver_id TEXT,
	approval_timestamp TIMESTAMP WITH TIME ZONE,
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create index for pending approvals
	CREATE INDEX idx_human_approval_pending ON human_approval_requests(approval_status) WHERE approval_status = 'pending';
	```

	### 10. Session Management Table

	```sql
	-- Track user sessions and conversation history
	CREATE TABLE user_sessions (
	session_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	user_id TEXT,
	conversation_history JSONB DEFAULT '[]'::jsonb,
	total_queries INTEGER DEFAULT 0,
	successful_queries INTEGER DEFAULT 0,
	failed_queries INTEGER DEFAULT 0,
	average_steps_per_query REAL DEFAULT 0.0,
	last_active_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
	updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
	);

	-- Create index for user lookup
	CREATE INDEX idx_user_sessions_user_id ON user_sessions(user_id);
	CREATE INDEX idx_user_sessions_last_active ON user_sessions(last_active_at);
	```

	### 11. Create Update Triggers

	```sql
	-- Auto-update updated_at timestamps
	CREATE OR REPLACE FUNCTION update_updated_at_column()
	RETURNS TRIGGER AS $$
	BEGIN
	NEW.updated_at = NOW();
	RETURN NEW;
	END;
	$$ language 'plpgsql';

	-- Apply trigger to tables with updated_at
	CREATE TRIGGER update_knowledge_base_updated_at BEFORE UPDATE ON knowledge_base
	FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

	CREATE TRIGGER update_tool_reliability_updated_at BEFORE UPDATE ON tool_reliability_metrics
	FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

	CREATE TRIGGER update_knowledge_lifecycle_updated_at BEFORE UPDATE ON knowledge_lifecycle
	FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

	CREATE TRIGGER update_user_sessions_updated_at BEFORE UPDATE ON user_sessions
	FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
	```

	### 12. Row Level Security (RLS) Setup

	```sql
	-- Enable RLS for security
	ALTER TABLE knowledge_base ENABLE ROW LEVEL SECURITY;
	ALTER TABLE agent_trajectory_logs ENABLE ROW LEVEL SECURITY;
	ALTER TABLE tool_reliability_metrics ENABLE ROW LEVEL SECURITY;
	ALTER TABLE user_sessions ENABLE ROW LEVEL SECURITY;

	-- Create policies (adjust based on your authentication setup)
	-- Example: Allow authenticated users to read knowledge base
	CREATE POLICY "Allow authenticated read access" ON knowledge_base
	FOR SELECT
	TO authenticated
	USING (true);

	-- Example: Allow service role full access
	CREATE POLICY "Service role full access" ON knowledge_base
	TO service_role
	USING (true)
	WITH CHECK (true);
	```

	## Verification Queries

	After running all the setup SQL, verify your tables are created correctly:

	```sql
	-- Check all tables are created
	SELECT table_name
	FROM information_schema.tables
	WHERE table_schema = 'public'
	ORDER BY table_name;

	-- Check pgvector extension is enabled
	SELECT * FROM pg_extension WHERE extname = 'vector';

	-- Test vector similarity function
	SELECT match_documents(
	array_fill(0.1, ARRAY[1536])::vector,
	5
	);
	```

	## Maintenance Queries

	### Clean up old logs (run periodically)

	```sql
	-- Delete logs older than 30 days
	DELETE FROM agent_trajectory_logs
	WHERE timestamp < NOW() - INTERVAL '30 days';

	-- Delete unused tool metrics
	DELETE FROM tool_reliability_metrics
	WHERE last_used_at < NOW() - INTERVAL '90 days'
	AND total_calls < 10;
	```

	### Performance monitoring

	```sql
	-- Check table sizes
	SELECT
	schemaname,
	tablename,
	pg_size_pretty(pg_total_relation_size(schemaname\|\|'.'\|\|tablename)) AS size
	FROM pg_tables
	WHERE schemaname = 'public'
	ORDER BY pg_total_relation_size(schemaname\|\|'.'\|\|tablename) DESC;

	-- Check index usage
	SELECT
	schemaname,
	tablename,
	indexname,
	idx_scan,
	idx_tup_read,
	idx_tup_fetch
	FROM pg_stat_user_indexes
	ORDER BY idx_scan DESC;
	```

	## Next Steps

	1. Run these SQL commands in your Supabase SQL Editor
	2. Update your `.env` file with your Supabase credentials
	3. Test the connection with:
	```python
	from src.database import get_supabase_client
	client = get_supabase_client()
	print("Connection successful!")
	```
	4. Consider setting up database backups in Supabase dashboard
	5. Monitor usage and costs in your Supabase project settings

	## Troubleshooting

	- pgvector not available: Make sure you're on a Supabase plan that supports pgvector
	- Permission denied: Check that your API key has the correct permissions
	- Connection errors: Verify your SUPABASE_URL format and network connectivity
	- Performance issues: Consider adding more specific indexes based on your query patterns