Spaces:
Sleeping
Sleeping
Database Migration Scripts
This folder contains scripts to migrate your JSONL data to Supabase and set up the database tables.
Prerequisites
Set up your environment variables in
.env:SUPABASE_URL=your_supabase_project_url SUPABASE_ANON_KEY=your_supabase_anon_key SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_keyGet your Service Role Key:
- Go to your Supabase project dashboard
- Navigate to Settings β API
- Copy the "service_role" key (NOT the "anon" key)
- β οΈ Never expose this key in client-side code!
Step 1: Create Database Tables (Manual)
β οΈ IMPORTANT: You must create the database tables manually before running the migration.
- Open your Supabase dashboard
- Go to the SQL Editor
- Copy and run the contents of
create-tables.sql - Run the verification queries at the bottom of the SQL file to confirm setup
Note: DDL statements (CREATE TABLE, etc.) must be run manually in the Supabase SQL Editor for security reasons.
Step 2: Migrate Your Data
Basic Usage
# Migrate speaking data
node migrate-data.js --file=data/speaking.jsonl --type=speaking
# Migrate writing data
node migrate-data.js --file=data/writing.jsonl --type=writing
Advanced Options
# Dry run (see what would be migrated without doing it)
node migrate-data.js --file=data/speaking.jsonl --type=speaking --dry-run
# Create tables and migrate in one step
node migrate-data.js --file=data/speaking.jsonl --type=speaking --create-tables
# Show help
node migrate-data.js --help
Script Features
migrate-data.js
- β
Duplicate Detection: Uses
id + task_typecombination to prevent duplicates - β Batch Processing: Processes records in batches of 100 for better performance
- β Error Handling: Retries failed records up to 3 times
- β Progress Tracking: Shows real-time progress and statistics
- β Upsert Logic: Inserts new records or updates existing ones
- β Verification: Automatically verifies migration completion
Database Schema
speaking_tasks table
CREATE TABLE speaking_tasks (
id BIGINT NOT NULL,
task_type TEXT NOT NULL,
task_description TEXT,
rephrased_task TEXT,
vocabulary JSONB,
response TEXT,
response_alternative TEXT,
brainstorm JSONB,
brainstorm_alternative JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id, task_type) -- COMPOSITE PRIMARY KEY - ALLOWS SAME ID WITH DIFFERENT TASK_TYPES
);
writing_tasks table
Same structure as speaking_tasks but for writing tasks.
Migration Process
- Duplicate Prevention: The script uses a unique constraint on
(id, task_type)to prevent duplicates - Batch Processing: Records are processed in batches of 100 to avoid overwhelming the database
- Error Recovery: Failed records are retried up to 3 times with exponential backoff
- Progress Tracking: Real-time statistics show insertions, updates, and errors
Example Output
π Starting data migration process...
π Reading JSONL file: data/speaking01.jsonl
β
Loaded 25 records from JSONL file
π Starting migration for speaking tasks...
π¦ Processing batch 1: records 1-25
β
Inserted record: 24481-describing_scene
β
Inserted record: 23674-expressing_opinions
...
π Batch completed: 25 successful, 0 failed
π Progress: 100.0% (25/25)
β
Migration completed!
π Migration Statistics:
Total records: 25
Processed: 25
Inserted: 25
Updated: 0
Skipped: 0
Errors: 0
Success rate: 100.0%
π Verifying migration for table: speaking_tasks
β
Table speaking_tasks contains 25 records
π Migration process completed!
Troubleshooting
Common Issues
"Missing required environment variables"
- Make sure your
.envfile contains all required variables - Check that the service role key is correct
- Make sure your
"Tables may not exist"
- Run the SQL script manually in Supabase SQL editor
- Check that tables were created successfully
"File not found or not readable"
- Verify the file path is correct
- Make sure the JSONL file is properly formatted
Permission errors
- Ensure your service role key has the correct permissions
- Check RLS policies in Supabase
Verification Queries
After migration, you can verify the data with these queries in Supabase SQL Editor:
-- Check record counts
SELECT 'speaking_tasks' as table_name, COUNT(*) as count FROM speaking_tasks
UNION ALL
SELECT 'writing_tasks' as table_name, COUNT(*) as count FROM writing_tasks;
-- Check for duplicates
SELECT id, task_type, COUNT(*) as duplicate_count
FROM speaking_tasks
GROUP BY id, task_type
HAVING COUNT(*) > 1;
-- Sample data
SELECT * FROM speaking_tasks LIMIT 5;
Testing RLS Policies
To verify that Row Level Security (RLS) policies are working correctly:
1. Test Anonymous Access
-- Set role to anonymous user
SET ROLE anon;
-- Try to select data (should fail - no anonymous access)
SELECT COUNT(*) FROM speaking_tasks;
-- Try to insert data (should fail)
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999999, 'test', 'test description');
2. Test Authenticated User Access
-- Set role to authenticated user
SET ROLE authenticated;
-- Try to select data (should work)
SELECT COUNT(*) FROM speaking_tasks;
-- Try to insert data (should fail - only service role can insert)
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999998, 'test', 'test description');
3. Test Service Role Access (Full Access)
-- Set role to service role (bypasses RLS)
SET ROLE service_role;
-- Should have full access
SELECT COUNT(*) FROM speaking_tasks;
-- Can insert data
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999997, 'test', 'test description');
-- Clean up test data
DELETE FROM speaking_tasks WHERE id IN (999997);
4. Reset Role
RESET ROLE;
Current RLS Policy Summary:
- β Authenticated users: Can READ only
- β Anonymous users: No access (cannot read or write)
- β Service role: Full access (read, write, update, delete)
Security Notes
- β οΈ Never commit your service role key to version control
- β οΈ Only use the service role key for server-side operations
- β Use the anon key for client-side operations
- β
Keep your
.envfile secure and out of git