# Supabase SQL Setup Guide for AI Agent This guide provides all the SQL commands needed to set up your Supabase database for the AI Agent with resilience patterns implementation. ## Prerequisites 1. A Supabase project (create one at https://supabase.com) 2. Access to the SQL Editor in your Supabase dashboard 3. Your Supabase URL and API keys ## Required Environment Variables Add these to your `.env` file: ```bash SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_KEY=your-anon-public-key SUPABASE_DB_PASSWORD=your-database-password ``` ## SQL Tables Setup Execute these SQL commands in your Supabase SQL Editor in the following order: ### 1. Enable Required Extensions ```sql -- Enable pgvector for semantic search CREATE EXTENSION IF NOT EXISTS vector; -- Enable UUID generation CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; ``` ### 2. Core Knowledge Base Table ```sql -- Create the table to store document chunks and their embeddings CREATE TABLE knowledge_base ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), node_id TEXT UNIQUE NOT NULL, embedding VECTOR(1536) NOT NULL, -- OpenAI 'text-embedding-3-small' produces 1536-dim vectors text TEXT, metadata_ JSONB DEFAULT '{}'::jsonb, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create an HNSW index for efficient similarity search CREATE INDEX ON knowledge_base USING hnsw (embedding vector_cosine_ops); -- Create a function for similarity search CREATE OR REPLACE FUNCTION match_documents ( query_embedding VECTOR(1536), match_count INT, filter JSONB DEFAULT '{}' ) RETURNS TABLE ( id UUID, node_id TEXT, text TEXT, metadata_ JSONB, similarity FLOAT ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT id, node_id, text, metadata_, 1 - (knowledge_base.embedding <=> query_embedding) AS similarity FROM knowledge_base WHERE metadata_ @> filter ORDER BY knowledge_base.embedding <=> query_embedding LIMIT match_count; END; $$; ``` ### 3. Agent Trajectory Logging Table ```sql -- Create the table for logging agent trajectories CREATE TABLE agent_trajectory_logs ( log_id BIGSERIAL PRIMARY KEY, run_id UUID NOT NULL, correlation_id UUID, timestamp TIMESTAMPTZ DEFAULT NOW(), step_type TEXT NOT NULL, -- e.g., 'REASON', 'ACTION', 'OBSERVATION', 'FINAL_ANSWER' fsm_state TEXT, -- Current FSM state payload JSONB, error_category TEXT, recovery_strategy TEXT, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create indexes for efficient querying CREATE INDEX idx_agent_trajectory_logs_run_id ON agent_trajectory_logs(run_id); CREATE INDEX idx_agent_trajectory_logs_correlation_id ON agent_trajectory_logs(correlation_id); CREATE INDEX idx_agent_trajectory_logs_timestamp ON agent_trajectory_logs(timestamp); CREATE INDEX idx_agent_trajectory_logs_step_type ON agent_trajectory_logs(step_type); ``` ### 4. Tool Reliability Metrics Table ```sql -- Track tool performance and reliability CREATE TABLE tool_reliability_metrics ( tool_name TEXT PRIMARY KEY, success_count INTEGER DEFAULT 0, failure_count INTEGER DEFAULT 0, total_calls INTEGER DEFAULT 0, average_latency_ms REAL DEFAULT 0.0, last_used_at TIMESTAMP WITH TIME ZONE, last_error TEXT, error_patterns JSONB DEFAULT '[]'::jsonb, fallback_tools JSONB DEFAULT '[]'::jsonb, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create index for last_used_at for cleanup queries CREATE INDEX idx_tool_reliability_last_used ON tool_reliability_metrics(last_used_at); ``` ### 5. Clarification Patterns Table ```sql -- Store patterns of clarification requests for learning CREATE TABLE clarification_patterns ( id TEXT PRIMARY KEY, original_query TEXT NOT NULL, query_embedding VECTOR(1536), -- For similarity search clarification_question TEXT NOT NULL, user_response TEXT NOT NULL, query_category TEXT NOT NULL, frequency INTEGER DEFAULT 1, effectiveness_score REAL DEFAULT 0.5, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), last_seen_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create indexes for efficient pattern matching CREATE INDEX idx_clarification_patterns_category ON clarification_patterns(query_category); CREATE INDEX idx_clarification_patterns_embedding ON clarification_patterns USING hnsw (query_embedding vector_cosine_ops); ``` ### 6. Plan Corrections Table ```sql -- Record user corrections to agent plans for improvement CREATE TABLE plan_corrections ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), query TEXT NOT NULL, original_plan JSONB NOT NULL, corrected_plan JSONB NOT NULL, correction_type TEXT NOT NULL, -- 'steps_added', 'steps_removed', 'parameters_changed', etc. user_feedback TEXT, applied_to_future_plans BOOLEAN DEFAULT FALSE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create index for query similarity matching CREATE INDEX idx_plan_corrections_query ON plan_corrections USING gin(to_tsvector('english', query)); ``` ### 7. Knowledge Lifecycle Table ```sql -- Track document freshness and validation needs CREATE TABLE knowledge_lifecycle ( document_id TEXT PRIMARY KEY, source_url TEXT, document_type TEXT NOT NULL, -- 'news', 'documentation', 'research', etc. content_hash TEXT NOT NULL, ingested_at TIMESTAMP WITH TIME ZONE NOT NULL, last_validated_at TIMESTAMP WITH TIME ZONE NOT NULL, expires_at TIMESTAMP WITH TIME ZONE NOT NULL, validation_status TEXT NOT NULL, -- 'valid', 'stale', 'expired', 'source_unavailable' update_frequency_days INTEGER NOT NULL, importance_score REAL DEFAULT 0.5, validation_failures INTEGER DEFAULT 0, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create indexes for lifecycle management CREATE INDEX idx_knowledge_lifecycle_expires ON knowledge_lifecycle(expires_at); CREATE INDEX idx_knowledge_lifecycle_validation ON knowledge_lifecycle(last_validated_at); CREATE INDEX idx_knowledge_lifecycle_status ON knowledge_lifecycle(validation_status); ``` ### 8. Resilience Tracking Tables ```sql -- Track GraphRecursionError occurrences and resolutions CREATE TABLE recursion_error_logs ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), correlation_id UUID, query TEXT NOT NULL, state_hash TEXT NOT NULL, loop_count INTEGER, stagnation_score INTEGER, resolution_strategy TEXT, -- 'force_termination', 'alternative_plan', 'user_clarification' final_answer TEXT, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Track state corruption incidents CREATE TABLE state_corruption_logs ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), correlation_id UUID, corrupting_node TEXT NOT NULL, failed_field TEXT NOT NULL, bad_value TEXT, expected_type TEXT, stack_trace TEXT, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); ``` ### 9. Human-in-the-Loop Approvals ```sql -- Track human approval requests and decisions CREATE TABLE human_approval_requests ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), correlation_id UUID, action_type TEXT NOT NULL, -- 'send_email', 'execute_code', 'modify_data', 'api_call' action_description TEXT NOT NULL, action_parameters JSONB NOT NULL, risk_level TEXT NOT NULL, -- 'low', 'medium', 'high', 'critical' reasoning TEXT, alternatives JSONB, approval_status TEXT DEFAULT 'pending', -- 'pending', 'approved', 'rejected', 'timeout' approver_id TEXT, approval_timestamp TIMESTAMP WITH TIME ZONE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create index for pending approvals CREATE INDEX idx_human_approval_pending ON human_approval_requests(approval_status) WHERE approval_status = 'pending'; ``` ### 10. Session Management Table ```sql -- Track user sessions and conversation history CREATE TABLE user_sessions ( session_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), user_id TEXT, conversation_history JSONB DEFAULT '[]'::jsonb, total_queries INTEGER DEFAULT 0, successful_queries INTEGER DEFAULT 0, failed_queries INTEGER DEFAULT 0, average_steps_per_query REAL DEFAULT 0.0, last_active_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Create index for user lookup CREATE INDEX idx_user_sessions_user_id ON user_sessions(user_id); CREATE INDEX idx_user_sessions_last_active ON user_sessions(last_active_at); ``` ### 11. Create Update Triggers ```sql -- Auto-update updated_at timestamps CREATE OR REPLACE FUNCTION update_updated_at_column() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = NOW(); RETURN NEW; END; $$ language 'plpgsql'; -- Apply trigger to tables with updated_at CREATE TRIGGER update_knowledge_base_updated_at BEFORE UPDATE ON knowledge_base FOR EACH ROW EXECUTE FUNCTION update_updated_at_column(); CREATE TRIGGER update_tool_reliability_updated_at BEFORE UPDATE ON tool_reliability_metrics FOR EACH ROW EXECUTE FUNCTION update_updated_at_column(); CREATE TRIGGER update_knowledge_lifecycle_updated_at BEFORE UPDATE ON knowledge_lifecycle FOR EACH ROW EXECUTE FUNCTION update_updated_at_column(); CREATE TRIGGER update_user_sessions_updated_at BEFORE UPDATE ON user_sessions FOR EACH ROW EXECUTE FUNCTION update_updated_at_column(); ``` ### 12. Row Level Security (RLS) Setup ```sql -- Enable RLS for security ALTER TABLE knowledge_base ENABLE ROW LEVEL SECURITY; ALTER TABLE agent_trajectory_logs ENABLE ROW LEVEL SECURITY; ALTER TABLE tool_reliability_metrics ENABLE ROW LEVEL SECURITY; ALTER TABLE user_sessions ENABLE ROW LEVEL SECURITY; -- Create policies (adjust based on your authentication setup) -- Example: Allow authenticated users to read knowledge base CREATE POLICY "Allow authenticated read access" ON knowledge_base FOR SELECT TO authenticated USING (true); -- Example: Allow service role full access CREATE POLICY "Service role full access" ON knowledge_base TO service_role USING (true) WITH CHECK (true); ``` ## Verification Queries After running all the setup SQL, verify your tables are created correctly: ```sql -- Check all tables are created SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' ORDER BY table_name; -- Check pgvector extension is enabled SELECT * FROM pg_extension WHERE extname = 'vector'; -- Test vector similarity function SELECT match_documents( array_fill(0.1, ARRAY[1536])::vector, 5 ); ``` ## Maintenance Queries ### Clean up old logs (run periodically) ```sql -- Delete logs older than 30 days DELETE FROM agent_trajectory_logs WHERE timestamp < NOW() - INTERVAL '30 days'; -- Delete unused tool metrics DELETE FROM tool_reliability_metrics WHERE last_used_at < NOW() - INTERVAL '90 days' AND total_calls < 10; ``` ### Performance monitoring ```sql -- Check table sizes SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; -- Check index usage SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes ORDER BY idx_scan DESC; ``` ## Next Steps 1. Run these SQL commands in your Supabase SQL Editor 2. Update your `.env` file with your Supabase credentials 3. Test the connection with: ```python from src.database import get_supabase_client client = get_supabase_client() print("Connection successful!") ``` 4. Consider setting up database backups in Supabase dashboard 5. Monitor usage and costs in your Supabase project settings ## Troubleshooting - **pgvector not available**: Make sure you're on a Supabase plan that supports pgvector - **Permission denied**: Check that your API key has the correct permissions - **Connection errors**: Verify your SUPABASE_URL format and network connectivity - **Performance issues**: Consider adding more specific indexes based on your query patterns