brainstorm-ui / scripts /README.md
hsila's picture
Migrate from JSONL files to Supabase database with secure RLS policies
2451d08
# Database Migration Scripts
This folder contains scripts to migrate your JSONL data to Supabase and set up the database tables.
## Prerequisites
1. **Set up your environment variables** in `.env`:
```bash
SUPABASE_URL=your_supabase_project_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
```
2. **Get your Service Role Key**:
- Go to your Supabase project dashboard
- Navigate to Settings β†’ API
- Copy the "service_role" key (NOT the "anon" key)
- ⚠️ **Never expose this key in client-side code!**
## Step 1: Create Database Tables (Manual)
**⚠️ IMPORTANT: You must create the database tables manually before running the migration.**
1. **Open your Supabase dashboard**
2. **Go to the SQL Editor**
3. **Copy and run the contents** of `create-tables.sql`
4. **Run the verification queries** at the bottom of the SQL file to confirm setup
*Note: DDL statements (CREATE TABLE, etc.) must be run manually in the Supabase SQL Editor for security reasons.*
## Step 2: Migrate Your Data
### Basic Usage
```bash
# Migrate speaking data
node migrate-data.js --file=data/speaking.jsonl --type=speaking
# Migrate writing data
node migrate-data.js --file=data/writing.jsonl --type=writing
```
### Advanced Options
```bash
# Dry run (see what would be migrated without doing it)
node migrate-data.js --file=data/speaking.jsonl --type=speaking --dry-run
# Create tables and migrate in one step
node migrate-data.js --file=data/speaking.jsonl --type=speaking --create-tables
# Show help
node migrate-data.js --help
```
## Script Features
### migrate-data.js
- βœ… **Duplicate Detection**: Uses `id + task_type` combination to prevent duplicates
- βœ… **Batch Processing**: Processes records in batches of 100 for better performance
- βœ… **Error Handling**: Retries failed records up to 3 times
- βœ… **Progress Tracking**: Shows real-time progress and statistics
- βœ… **Upsert Logic**: Inserts new records or updates existing ones
- βœ… **Verification**: Automatically verifies migration completion
## Database Schema
### speaking_tasks table
```sql
CREATE TABLE speaking_tasks (
id BIGINT NOT NULL,
task_type TEXT NOT NULL,
task_description TEXT,
rephrased_task TEXT,
vocabulary JSONB,
response TEXT,
response_alternative TEXT,
brainstorm JSONB,
brainstorm_alternative JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id, task_type) -- COMPOSITE PRIMARY KEY - ALLOWS SAME ID WITH DIFFERENT TASK_TYPES
);
```
### writing_tasks table
Same structure as `speaking_tasks` but for writing tasks.
## Migration Process
1. **Duplicate Prevention**: The script uses a unique constraint on `(id, task_type)` to prevent duplicates
2. **Batch Processing**: Records are processed in batches of 100 to avoid overwhelming the database
3. **Error Recovery**: Failed records are retried up to 3 times with exponential backoff
4. **Progress Tracking**: Real-time statistics show insertions, updates, and errors
## Example Output
```
πŸš€ Starting data migration process...
πŸ“„ Reading JSONL file: data/speaking01.jsonl
βœ… Loaded 25 records from JSONL file
πŸš€ Starting migration for speaking tasks...
πŸ“¦ Processing batch 1: records 1-25
βœ… Inserted record: 24481-describing_scene
βœ… Inserted record: 23674-expressing_opinions
...
πŸ“Š Batch completed: 25 successful, 0 failed
πŸ“ˆ Progress: 100.0% (25/25)
βœ… Migration completed!
πŸ“Š Migration Statistics:
Total records: 25
Processed: 25
Inserted: 25
Updated: 0
Skipped: 0
Errors: 0
Success rate: 100.0%
πŸ” Verifying migration for table: speaking_tasks
βœ… Table speaking_tasks contains 25 records
πŸŽ‰ Migration process completed!
```
## Troubleshooting
### Common Issues
1. **"Missing required environment variables"**
- Make sure your `.env` file contains all required variables
- Check that the service role key is correct
2. **"Tables may not exist"**
- Run the SQL script manually in Supabase SQL editor
- Check that tables were created successfully
3. **"File not found or not readable"**
- Verify the file path is correct
- Make sure the JSONL file is properly formatted
4. **Permission errors**
- Ensure your service role key has the correct permissions
- Check RLS policies in Supabase
### Verification Queries
After migration, you can verify the data with these queries in Supabase SQL Editor:
```sql
-- Check record counts
SELECT 'speaking_tasks' as table_name, COUNT(*) as count FROM speaking_tasks
UNION ALL
SELECT 'writing_tasks' as table_name, COUNT(*) as count FROM writing_tasks;
-- Check for duplicates
SELECT id, task_type, COUNT(*) as duplicate_count
FROM speaking_tasks
GROUP BY id, task_type
HAVING COUNT(*) > 1;
-- Sample data
SELECT * FROM speaking_tasks LIMIT 5;
```
## Testing RLS Policies
To verify that Row Level Security (RLS) policies are working correctly:
### 1. Test Anonymous Access
```sql
-- Set role to anonymous user
SET ROLE anon;
-- Try to select data (should fail - no anonymous access)
SELECT COUNT(*) FROM speaking_tasks;
-- Try to insert data (should fail)
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999999, 'test', 'test description');
```
### 2. Test Authenticated User Access
```sql
-- Set role to authenticated user
SET ROLE authenticated;
-- Try to select data (should work)
SELECT COUNT(*) FROM speaking_tasks;
-- Try to insert data (should fail - only service role can insert)
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999998, 'test', 'test description');
```
### 3. Test Service Role Access (Full Access)
```sql
-- Set role to service role (bypasses RLS)
SET ROLE service_role;
-- Should have full access
SELECT COUNT(*) FROM speaking_tasks;
-- Can insert data
INSERT INTO speaking_tasks (id, task_type, task_description)
VALUES (999997, 'test', 'test description');
-- Clean up test data
DELETE FROM speaking_tasks WHERE id IN (999997);
```
### 4. Reset Role
```sql
RESET ROLE;
```
**Current RLS Policy Summary:**
- βœ… **Authenticated users**: Can READ only
- ❌ **Anonymous users**: No access (cannot read or write)
- βœ… **Service role**: Full access (read, write, update, delete)
## Security Notes
- ⚠️ **Never commit your service role key to version control**
- ⚠️ **Only use the service role key for server-side operations**
- βœ… **Use the anon key for client-side operations**
- βœ… **Keep your `.env` file secure and out of git**