RobotPai / docs /guides /SUPABASE_SQL_SETUP.md
atr0p05's picture
Upload 291 files
8a682b5 verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Supabase SQL Setup Guide for AI Agent

This guide provides all the SQL commands needed to set up your Supabase database for the AI Agent with resilience patterns implementation.

Prerequisites

  1. A Supabase project (create one at https://supabase.com)
  2. Access to the SQL Editor in your Supabase dashboard
  3. Your Supabase URL and API keys

Required Environment Variables

Add these to your .env file:

SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_KEY=your-anon-public-key
SUPABASE_DB_PASSWORD=your-database-password

SQL Tables Setup

Execute these SQL commands in your Supabase SQL Editor in the following order:

1. Enable Required Extensions

-- Enable pgvector for semantic search
CREATE EXTENSION IF NOT EXISTS vector;

-- Enable UUID generation
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

2. Core Knowledge Base Table

-- Create the table to store document chunks and their embeddings
CREATE TABLE knowledge_base (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    node_id TEXT UNIQUE NOT NULL,
    embedding VECTOR(1536) NOT NULL, -- OpenAI 'text-embedding-3-small' produces 1536-dim vectors
    text TEXT,
    metadata_ JSONB DEFAULT '{}'::jsonb,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create an HNSW index for efficient similarity search
CREATE INDEX ON knowledge_base USING hnsw (embedding vector_cosine_ops);

-- Create a function for similarity search
CREATE OR REPLACE FUNCTION match_documents (
  query_embedding VECTOR(1536),
  match_count INT,
  filter JSONB DEFAULT '{}'
) RETURNS TABLE (
  id UUID,
  node_id TEXT,
  text TEXT,
  metadata_ JSONB,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    id,
    node_id,
    text,
    metadata_,
    1 - (knowledge_base.embedding <=> query_embedding) AS similarity
  FROM knowledge_base
  WHERE metadata_ @> filter
  ORDER BY knowledge_base.embedding <=> query_embedding
  LIMIT match_count;
END;
;
```

### 3. Agent Trajectory Logging Table

```sql
-- Create the table for logging agent trajectories
CREATE TABLE agent_trajectory_logs (
    log_id BIGSERIAL PRIMARY KEY,
    run_id UUID NOT NULL,
    correlation_id UUID,
    timestamp TIMESTAMPTZ DEFAULT NOW(),
    step_type TEXT NOT NULL, -- e.g., 'REASON', 'ACTION', 'OBSERVATION', 'FINAL_ANSWER'
    fsm_state TEXT, -- Current FSM state
    payload JSONB,
    error_category TEXT,
    recovery_strategy TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes for efficient querying
CREATE INDEX idx_agent_trajectory_logs_run_id ON agent_trajectory_logs(run_id);
CREATE INDEX idx_agent_trajectory_logs_correlation_id ON agent_trajectory_logs(correlation_id);
CREATE INDEX idx_agent_trajectory_logs_timestamp ON agent_trajectory_logs(timestamp);
CREATE INDEX idx_agent_trajectory_logs_step_type ON agent_trajectory_logs(step_type);
```

### 4. Tool Reliability Metrics Table

```sql
-- Track tool performance and reliability
CREATE TABLE tool_reliability_metrics (
    tool_name TEXT PRIMARY KEY,
    success_count INTEGER DEFAULT 0,
    failure_count INTEGER DEFAULT 0,
    total_calls INTEGER DEFAULT 0,
    average_latency_ms REAL DEFAULT 0.0,
    last_used_at TIMESTAMP WITH TIME ZONE,
    last_error TEXT,
    error_patterns JSONB DEFAULT '[]'::jsonb,
    fallback_tools JSONB DEFAULT '[]'::jsonb,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for last_used_at for cleanup queries
CREATE INDEX idx_tool_reliability_last_used ON tool_reliability_metrics(last_used_at);
```

### 5. Clarification Patterns Table

```sql
-- Store patterns of clarification requests for learning
CREATE TABLE clarification_patterns (
    id TEXT PRIMARY KEY,
    original_query TEXT NOT NULL,
    query_embedding VECTOR(1536),  -- For similarity search
    clarification_question TEXT NOT NULL,
    user_response TEXT NOT NULL,
    query_category TEXT NOT NULL,
    frequency INTEGER DEFAULT 1,
    effectiveness_score REAL DEFAULT 0.5,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_seen_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes for efficient pattern matching
CREATE INDEX idx_clarification_patterns_category ON clarification_patterns(query_category);
CREATE INDEX idx_clarification_patterns_embedding ON clarification_patterns USING hnsw (query_embedding vector_cosine_ops);
```

### 6. Plan Corrections Table

```sql
-- Record user corrections to agent plans for improvement
CREATE TABLE plan_corrections (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    query TEXT NOT NULL,
    original_plan JSONB NOT NULL,
    corrected_plan JSONB NOT NULL,
    correction_type TEXT NOT NULL, -- 'steps_added', 'steps_removed', 'parameters_changed', etc.
    user_feedback TEXT,
    applied_to_future_plans BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for query similarity matching
CREATE INDEX idx_plan_corrections_query ON plan_corrections USING gin(to_tsvector('english', query));
```

### 7. Knowledge Lifecycle Table

```sql
-- Track document freshness and validation needs
CREATE TABLE knowledge_lifecycle (
    document_id TEXT PRIMARY KEY,
    source_url TEXT,
    document_type TEXT NOT NULL, -- 'news', 'documentation', 'research', etc.
    content_hash TEXT NOT NULL,
    ingested_at TIMESTAMP WITH TIME ZONE NOT NULL,
    last_validated_at TIMESTAMP WITH TIME ZONE NOT NULL,
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    validation_status TEXT NOT NULL, -- 'valid', 'stale', 'expired', 'source_unavailable'
    update_frequency_days INTEGER NOT NULL,
    importance_score REAL DEFAULT 0.5,
    validation_failures INTEGER DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes for lifecycle management
CREATE INDEX idx_knowledge_lifecycle_expires ON knowledge_lifecycle(expires_at);
CREATE INDEX idx_knowledge_lifecycle_validation ON knowledge_lifecycle(last_validated_at);
CREATE INDEX idx_knowledge_lifecycle_status ON knowledge_lifecycle(validation_status);
```

### 8. Resilience Tracking Tables

```sql
-- Track GraphRecursionError occurrences and resolutions
CREATE TABLE recursion_error_logs (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    correlation_id UUID,
    query TEXT NOT NULL,
    state_hash TEXT NOT NULL,
    loop_count INTEGER,
    stagnation_score INTEGER,
    resolution_strategy TEXT, -- 'force_termination', 'alternative_plan', 'user_clarification'
    final_answer TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Track state corruption incidents
CREATE TABLE state_corruption_logs (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    correlation_id UUID,
    corrupting_node TEXT NOT NULL,
    failed_field TEXT NOT NULL,
    bad_value TEXT,
    expected_type TEXT,
    stack_trace TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```

### 9. Human-in-the-Loop Approvals

```sql
-- Track human approval requests and decisions
CREATE TABLE human_approval_requests (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    correlation_id UUID,
    action_type TEXT NOT NULL, -- 'send_email', 'execute_code', 'modify_data', 'api_call'
    action_description TEXT NOT NULL,
    action_parameters JSONB NOT NULL,
    risk_level TEXT NOT NULL, -- 'low', 'medium', 'high', 'critical'
    reasoning TEXT,
    alternatives JSONB,
    approval_status TEXT DEFAULT 'pending', -- 'pending', 'approved', 'rejected', 'timeout'
    approver_id TEXT,
    approval_timestamp TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for pending approvals
CREATE INDEX idx_human_approval_pending ON human_approval_requests(approval_status) WHERE approval_status = 'pending';
```

### 10. Session Management Table

```sql
-- Track user sessions and conversation history
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id TEXT,
    conversation_history JSONB DEFAULT '[]'::jsonb,
    total_queries INTEGER DEFAULT 0,
    successful_queries INTEGER DEFAULT 0,
    failed_queries INTEGER DEFAULT 0,
    average_steps_per_query REAL DEFAULT 0.0,
    last_active_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for user lookup
CREATE INDEX idx_user_sessions_user_id ON user_sessions(user_id);
CREATE INDEX idx_user_sessions_last_active ON user_sessions(last_active_at);
```

### 11. Create Update Triggers

```sql
-- Auto-update updated_at timestamps
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS 
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ language 'plpgsql';

-- Apply trigger to tables with updated_at
CREATE TRIGGER update_knowledge_base_updated_at BEFORE UPDATE ON knowledge_base
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

CREATE TRIGGER update_tool_reliability_updated_at BEFORE UPDATE ON tool_reliability_metrics
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

CREATE TRIGGER update_knowledge_lifecycle_updated_at BEFORE UPDATE ON knowledge_lifecycle
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

CREATE TRIGGER update_user_sessions_updated_at BEFORE UPDATE ON user_sessions
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

12. Row Level Security (RLS) Setup

-- Enable RLS for security
ALTER TABLE knowledge_base ENABLE ROW LEVEL SECURITY;
ALTER TABLE agent_trajectory_logs ENABLE ROW LEVEL SECURITY;
ALTER TABLE tool_reliability_metrics ENABLE ROW LEVEL SECURITY;
ALTER TABLE user_sessions ENABLE ROW LEVEL SECURITY;

-- Create policies (adjust based on your authentication setup)
-- Example: Allow authenticated users to read knowledge base
CREATE POLICY "Allow authenticated read access" ON knowledge_base
    FOR SELECT
    TO authenticated
    USING (true);

-- Example: Allow service role full access
CREATE POLICY "Service role full access" ON knowledge_base
    TO service_role
    USING (true)
    WITH CHECK (true);

Verification Queries

After running all the setup SQL, verify your tables are created correctly:

-- Check all tables are created
SELECT table_name 
FROM information_schema.tables 
WHERE table_schema = 'public' 
ORDER BY table_name;

-- Check pgvector extension is enabled
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Test vector similarity function
SELECT match_documents(
    array_fill(0.1, ARRAY[1536])::vector,
    5
);

Maintenance Queries

Clean up old logs (run periodically)

-- Delete logs older than 30 days
DELETE FROM agent_trajectory_logs 
WHERE timestamp < NOW() - INTERVAL '30 days';

-- Delete unused tool metrics
DELETE FROM tool_reliability_metrics 
WHERE last_used_at < NOW() - INTERVAL '90 days' 
AND total_calls < 10;

Performance monitoring

-- Check table sizes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Next Steps

  1. Run these SQL commands in your Supabase SQL Editor
  2. Update your .env file with your Supabase credentials
  3. Test the connection with:
    from src.database import get_supabase_client
    client = get_supabase_client()
    print("Connection successful!")
    
  4. Consider setting up database backups in Supabase dashboard
  5. Monitor usage and costs in your Supabase project settings

Troubleshooting

  • pgvector not available: Make sure you're on a Supabase plan that supports pgvector
  • Permission denied: Check that your API key has the correct permissions
  • Connection errors: Verify your SUPABASE_URL format and network connectivity
  • Performance issues: Consider adding more specific indexes based on your query patterns