# Database Migration Scripts This folder contains scripts to migrate your JSONL data to Supabase and set up the database tables. ## Prerequisites 1. **Set up your environment variables** in `.env`: ```bash SUPABASE_URL=your_supabase_project_url SUPABASE_ANON_KEY=your_supabase_anon_key SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key ``` 2. **Get your Service Role Key**: - Go to your Supabase project dashboard - Navigate to Settings → API - Copy the "service_role" key (NOT the "anon" key) - ⚠️ **Never expose this key in client-side code!** ## Step 1: Create Database Tables (Manual) **⚠️ IMPORTANT: You must create the database tables manually before running the migration.** 1. **Open your Supabase dashboard** 2. **Go to the SQL Editor** 3. **Copy and run the contents** of `create-tables.sql` 4. **Run the verification queries** at the bottom of the SQL file to confirm setup *Note: DDL statements (CREATE TABLE, etc.) must be run manually in the Supabase SQL Editor for security reasons.* ## Step 2: Migrate Your Data ### Basic Usage ```bash # Migrate speaking data node migrate-data.js --file=data/speaking.jsonl --type=speaking # Migrate writing data node migrate-data.js --file=data/writing.jsonl --type=writing ``` ### Advanced Options ```bash # Dry run (see what would be migrated without doing it) node migrate-data.js --file=data/speaking.jsonl --type=speaking --dry-run # Create tables and migrate in one step node migrate-data.js --file=data/speaking.jsonl --type=speaking --create-tables # Show help node migrate-data.js --help ``` ## Script Features ### migrate-data.js - ✅ **Duplicate Detection**: Uses `id + task_type` combination to prevent duplicates - ✅ **Batch Processing**: Processes records in batches of 100 for better performance - ✅ **Error Handling**: Retries failed records up to 3 times - ✅ **Progress Tracking**: Shows real-time progress and statistics - ✅ **Upsert Logic**: Inserts new records or updates existing ones - ✅ **Verification**: Automatically verifies migration completion ## Database Schema ### speaking_tasks table ```sql CREATE TABLE speaking_tasks ( id BIGINT NOT NULL, task_type TEXT NOT NULL, task_description TEXT, rephrased_task TEXT, vocabulary JSONB, response TEXT, response_alternative TEXT, brainstorm JSONB, brainstorm_alternative JSONB, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), PRIMARY KEY (id, task_type) -- COMPOSITE PRIMARY KEY - ALLOWS SAME ID WITH DIFFERENT TASK_TYPES ); ``` ### writing_tasks table Same structure as `speaking_tasks` but for writing tasks. ## Migration Process 1. **Duplicate Prevention**: The script uses a unique constraint on `(id, task_type)` to prevent duplicates 2. **Batch Processing**: Records are processed in batches of 100 to avoid overwhelming the database 3. **Error Recovery**: Failed records are retried up to 3 times with exponential backoff 4. **Progress Tracking**: Real-time statistics show insertions, updates, and errors ## Example Output ``` 🚀 Starting data migration process... 📄 Reading JSONL file: data/speaking01.jsonl ✅ Loaded 25 records from JSONL file 🚀 Starting migration for speaking tasks... 📦 Processing batch 1: records 1-25 ✅ Inserted record: 24481-describing_scene ✅ Inserted record: 23674-expressing_opinions ... 📊 Batch completed: 25 successful, 0 failed 📈 Progress: 100.0% (25/25) ✅ Migration completed! 📊 Migration Statistics: Total records: 25 Processed: 25 Inserted: 25 Updated: 0 Skipped: 0 Errors: 0 Success rate: 100.0% 🔍 Verifying migration for table: speaking_tasks ✅ Table speaking_tasks contains 25 records 🎉 Migration process completed! ``` ## Troubleshooting ### Common Issues 1. **"Missing required environment variables"** - Make sure your `.env` file contains all required variables - Check that the service role key is correct 2. **"Tables may not exist"** - Run the SQL script manually in Supabase SQL editor - Check that tables were created successfully 3. **"File not found or not readable"** - Verify the file path is correct - Make sure the JSONL file is properly formatted 4. **Permission errors** - Ensure your service role key has the correct permissions - Check RLS policies in Supabase ### Verification Queries After migration, you can verify the data with these queries in Supabase SQL Editor: ```sql -- Check record counts SELECT 'speaking_tasks' as table_name, COUNT(*) as count FROM speaking_tasks UNION ALL SELECT 'writing_tasks' as table_name, COUNT(*) as count FROM writing_tasks; -- Check for duplicates SELECT id, task_type, COUNT(*) as duplicate_count FROM speaking_tasks GROUP BY id, task_type HAVING COUNT(*) > 1; -- Sample data SELECT * FROM speaking_tasks LIMIT 5; ``` ## Testing RLS Policies To verify that Row Level Security (RLS) policies are working correctly: ### 1. Test Anonymous Access ```sql -- Set role to anonymous user SET ROLE anon; -- Try to select data (should fail - no anonymous access) SELECT COUNT(*) FROM speaking_tasks; -- Try to insert data (should fail) INSERT INTO speaking_tasks (id, task_type, task_description) VALUES (999999, 'test', 'test description'); ``` ### 2. Test Authenticated User Access ```sql -- Set role to authenticated user SET ROLE authenticated; -- Try to select data (should work) SELECT COUNT(*) FROM speaking_tasks; -- Try to insert data (should fail - only service role can insert) INSERT INTO speaking_tasks (id, task_type, task_description) VALUES (999998, 'test', 'test description'); ``` ### 3. Test Service Role Access (Full Access) ```sql -- Set role to service role (bypasses RLS) SET ROLE service_role; -- Should have full access SELECT COUNT(*) FROM speaking_tasks; -- Can insert data INSERT INTO speaking_tasks (id, task_type, task_description) VALUES (999997, 'test', 'test description'); -- Clean up test data DELETE FROM speaking_tasks WHERE id IN (999997); ``` ### 4. Reset Role ```sql RESET ROLE; ``` **Current RLS Policy Summary:** - ✅ **Authenticated users**: Can READ only - ❌ **Anonymous users**: No access (cannot read or write) - ✅ **Service role**: Full access (read, write, update, delete) ## Security Notes - ⚠️ **Never commit your service role key to version control** - ⚠️ **Only use the service role key for server-side operations** - ✅ **Use the anon key for client-side operations** - ✅ **Keep your `.env` file secure and out of git**