Spaces:
Sleeping
Sleeping
| # Database Migration Scripts | |
| This folder contains scripts to migrate your JSONL data to Supabase and set up the database tables. | |
| ## Prerequisites | |
| 1. **Set up your environment variables** in `.env`: | |
| ```bash | |
| SUPABASE_URL=your_supabase_project_url | |
| SUPABASE_ANON_KEY=your_supabase_anon_key | |
| SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key | |
| ``` | |
| 2. **Get your Service Role Key**: | |
| - Go to your Supabase project dashboard | |
| - Navigate to Settings β API | |
| - Copy the "service_role" key (NOT the "anon" key) | |
| - β οΈ **Never expose this key in client-side code!** | |
| ## Step 1: Create Database Tables (Manual) | |
| **β οΈ IMPORTANT: You must create the database tables manually before running the migration.** | |
| 1. **Open your Supabase dashboard** | |
| 2. **Go to the SQL Editor** | |
| 3. **Copy and run the contents** of `create-tables.sql` | |
| 4. **Run the verification queries** at the bottom of the SQL file to confirm setup | |
| *Note: DDL statements (CREATE TABLE, etc.) must be run manually in the Supabase SQL Editor for security reasons.* | |
| ## Step 2: Migrate Your Data | |
| ### Basic Usage | |
| ```bash | |
| # Migrate speaking data | |
| node migrate-data.js --file=data/speaking.jsonl --type=speaking | |
| # Migrate writing data | |
| node migrate-data.js --file=data/writing.jsonl --type=writing | |
| ``` | |
| ### Advanced Options | |
| ```bash | |
| # Dry run (see what would be migrated without doing it) | |
| node migrate-data.js --file=data/speaking.jsonl --type=speaking --dry-run | |
| # Create tables and migrate in one step | |
| node migrate-data.js --file=data/speaking.jsonl --type=speaking --create-tables | |
| # Show help | |
| node migrate-data.js --help | |
| ``` | |
| ## Script Features | |
| ### migrate-data.js | |
| - β **Duplicate Detection**: Uses `id + task_type` combination to prevent duplicates | |
| - β **Batch Processing**: Processes records in batches of 100 for better performance | |
| - β **Error Handling**: Retries failed records up to 3 times | |
| - β **Progress Tracking**: Shows real-time progress and statistics | |
| - β **Upsert Logic**: Inserts new records or updates existing ones | |
| - β **Verification**: Automatically verifies migration completion | |
| ## Database Schema | |
| ### speaking_tasks table | |
| ```sql | |
| CREATE TABLE speaking_tasks ( | |
| id BIGINT NOT NULL, | |
| task_type TEXT NOT NULL, | |
| task_description TEXT, | |
| rephrased_task TEXT, | |
| vocabulary JSONB, | |
| response TEXT, | |
| response_alternative TEXT, | |
| brainstorm JSONB, | |
| brainstorm_alternative JSONB, | |
| created_at TIMESTAMPTZ DEFAULT NOW(), | |
| updated_at TIMESTAMPTZ DEFAULT NOW(), | |
| PRIMARY KEY (id, task_type) -- COMPOSITE PRIMARY KEY - ALLOWS SAME ID WITH DIFFERENT TASK_TYPES | |
| ); | |
| ``` | |
| ### writing_tasks table | |
| Same structure as `speaking_tasks` but for writing tasks. | |
| ## Migration Process | |
| 1. **Duplicate Prevention**: The script uses a unique constraint on `(id, task_type)` to prevent duplicates | |
| 2. **Batch Processing**: Records are processed in batches of 100 to avoid overwhelming the database | |
| 3. **Error Recovery**: Failed records are retried up to 3 times with exponential backoff | |
| 4. **Progress Tracking**: Real-time statistics show insertions, updates, and errors | |
| ## Example Output | |
| ``` | |
| π Starting data migration process... | |
| π Reading JSONL file: data/speaking01.jsonl | |
| β Loaded 25 records from JSONL file | |
| π Starting migration for speaking tasks... | |
| π¦ Processing batch 1: records 1-25 | |
| β Inserted record: 24481-describing_scene | |
| β Inserted record: 23674-expressing_opinions | |
| ... | |
| π Batch completed: 25 successful, 0 failed | |
| π Progress: 100.0% (25/25) | |
| β Migration completed! | |
| π Migration Statistics: | |
| Total records: 25 | |
| Processed: 25 | |
| Inserted: 25 | |
| Updated: 0 | |
| Skipped: 0 | |
| Errors: 0 | |
| Success rate: 100.0% | |
| π Verifying migration for table: speaking_tasks | |
| β Table speaking_tasks contains 25 records | |
| π Migration process completed! | |
| ``` | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **"Missing required environment variables"** | |
| - Make sure your `.env` file contains all required variables | |
| - Check that the service role key is correct | |
| 2. **"Tables may not exist"** | |
| - Run the SQL script manually in Supabase SQL editor | |
| - Check that tables were created successfully | |
| 3. **"File not found or not readable"** | |
| - Verify the file path is correct | |
| - Make sure the JSONL file is properly formatted | |
| 4. **Permission errors** | |
| - Ensure your service role key has the correct permissions | |
| - Check RLS policies in Supabase | |
| ### Verification Queries | |
| After migration, you can verify the data with these queries in Supabase SQL Editor: | |
| ```sql | |
| -- Check record counts | |
| SELECT 'speaking_tasks' as table_name, COUNT(*) as count FROM speaking_tasks | |
| UNION ALL | |
| SELECT 'writing_tasks' as table_name, COUNT(*) as count FROM writing_tasks; | |
| -- Check for duplicates | |
| SELECT id, task_type, COUNT(*) as duplicate_count | |
| FROM speaking_tasks | |
| GROUP BY id, task_type | |
| HAVING COUNT(*) > 1; | |
| -- Sample data | |
| SELECT * FROM speaking_tasks LIMIT 5; | |
| ``` | |
| ## Testing RLS Policies | |
| To verify that Row Level Security (RLS) policies are working correctly: | |
| ### 1. Test Anonymous Access | |
| ```sql | |
| -- Set role to anonymous user | |
| SET ROLE anon; | |
| -- Try to select data (should fail - no anonymous access) | |
| SELECT COUNT(*) FROM speaking_tasks; | |
| -- Try to insert data (should fail) | |
| INSERT INTO speaking_tasks (id, task_type, task_description) | |
| VALUES (999999, 'test', 'test description'); | |
| ``` | |
| ### 2. Test Authenticated User Access | |
| ```sql | |
| -- Set role to authenticated user | |
| SET ROLE authenticated; | |
| -- Try to select data (should work) | |
| SELECT COUNT(*) FROM speaking_tasks; | |
| -- Try to insert data (should fail - only service role can insert) | |
| INSERT INTO speaking_tasks (id, task_type, task_description) | |
| VALUES (999998, 'test', 'test description'); | |
| ``` | |
| ### 3. Test Service Role Access (Full Access) | |
| ```sql | |
| -- Set role to service role (bypasses RLS) | |
| SET ROLE service_role; | |
| -- Should have full access | |
| SELECT COUNT(*) FROM speaking_tasks; | |
| -- Can insert data | |
| INSERT INTO speaking_tasks (id, task_type, task_description) | |
| VALUES (999997, 'test', 'test description'); | |
| -- Clean up test data | |
| DELETE FROM speaking_tasks WHERE id IN (999997); | |
| ``` | |
| ### 4. Reset Role | |
| ```sql | |
| RESET ROLE; | |
| ``` | |
| **Current RLS Policy Summary:** | |
| - β **Authenticated users**: Can READ only | |
| - β **Anonymous users**: No access (cannot read or write) | |
| - β **Service role**: Full access (read, write, update, delete) | |
| ## Security Notes | |
| - β οΈ **Never commit your service role key to version control** | |
| - β οΈ **Only use the service role key for server-side operations** | |
| - β **Use the anon key for client-side operations** | |
| - β **Keep your `.env` file secure and out of git** | |