Spaces:

hsila
/

brainstorm-ui

Sleeping

File size: 6,636 Bytes

2451d08

# Database Migration Scripts

This folder contains scripts to migrate your JSONL data to Supabase and set up the database tables.

## Prerequisites

1. **Set up your environment variables** in `.env`:
   ```bash
   SUPABASE_URL=your_supabase_project_url
   SUPABASE_ANON_KEY=your_supabase_anon_key
   SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
   ```

2. **Get your Service Role Key**:
   - Go to your Supabase project dashboard
   - Navigate to Settings → API
   - Copy the "service_role" key (NOT the "anon" key)
   - ⚠️ **Never expose this key in client-side code!**

## Step 1: Create Database Tables (Manual)

**⚠️ IMPORTANT: You must create the database tables manually before running the migration.**

1. **Open your Supabase dashboard**
2. **Go to the SQL Editor**
3. **Copy and run the contents** of `create-tables.sql`
4. **Run the verification queries** at the bottom of the SQL file to confirm setup

*Note: DDL statements (CREATE TABLE, etc.) must be run manually in the Supabase SQL Editor for security reasons.*

## Step 2: Migrate Your Data

### Basic Usage
```bash
# Migrate speaking data
node migrate-data.js --file=data/speaking.jsonl --type=speaking

# Migrate writing data
node migrate-data.js --file=data/writing.jsonl --type=writing
```

### Advanced Options
```bash
# Dry run (see what would be migrated without doing it)
node migrate-data.js --file=data/speaking.jsonl --type=speaking --dry-run

# Create tables and migrate in one step
node migrate-data.js --file=data/speaking.jsonl --type=speaking --create-tables

# Show help
node migrate-data.js --help
```

## Script Features

### migrate-data.js
- ✅ **Duplicate Detection**: Uses `id + task_type` combination to prevent duplicates
- ✅ **Batch Processing**: Processes records in batches of 100 for better performance
- ✅ **Error Handling**: Retries failed records up to 3 times
- ✅ **Progress Tracking**: Shows real-time progress and statistics
- ✅ **Upsert Logic**: Inserts new records or updates existing ones
- ✅ **Verification**: Automatically verifies migration completion


## Database Schema

### speaking_tasks table
```sql
CREATE TABLE speaking_tasks (
    id BIGINT NOT NULL,
    task_type TEXT NOT NULL,
    task_description TEXT,
    rephrased_task TEXT,
    vocabulary JSONB,
    response TEXT,
    response_alternative TEXT,
    brainstorm JSONB,
    brainstorm_alternative JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (id, task_type)  -- COMPOSITE PRIMARY KEY - ALLOWS SAME ID WITH DIFFERENT TASK_TYPES
);
```

### writing_tasks table
Same structure as `speaking_tasks` but for writing tasks.

## Migration Process

1. **Duplicate Prevention**: The script uses a unique constraint on `(id, task_type)` to prevent duplicates
2. **Batch Processing**: Records are processed in batches of 100 to avoid overwhelming the database
3. **Error Recovery**: Failed records are retried up to 3 times with exponential backoff
4. **Progress Tracking**: Real-time statistics show insertions, updates, and errors

## Example Output

```
🚀 Starting data migration process...

📄 Reading JSONL file: data/speaking01.jsonl
✅ Loaded 25 records from JSONL file

🚀 Starting migration for speaking tasks...

📦 Processing batch 1: records 1-25
✅ Inserted record: 24481-describing_scene
✅ Inserted record: 23674-expressing_opinions
...
📊 Batch completed: 25 successful, 0 failed

📈 Progress: 100.0% (25/25)

✅ Migration completed!

📊 Migration Statistics:
   Total records:     25
   Processed:         25
   Inserted:          25
   Updated:           0
   Skipped:           0
   Errors:            0
   Success rate:      100.0%

🔍 Verifying migration for table: speaking_tasks
✅ Table speaking_tasks contains 25 records

🎉 Migration process completed!
```

## Troubleshooting

### Common Issues

1. **"Missing required environment variables"**
   - Make sure your `.env` file contains all required variables
   - Check that the service role key is correct

2. **"Tables may not exist"**
   - Run the SQL script manually in Supabase SQL editor
   - Check that tables were created successfully

3. **"File not found or not readable"**
   - Verify the file path is correct
   - Make sure the JSONL file is properly formatted

4. **Permission errors**
   - Ensure your service role key has the correct permissions
   - Check RLS policies in Supabase

### Verification Queries

After migration, you can verify the data with these queries in Supabase SQL Editor:

```sql
-- Check record counts
SELECT 'speaking_tasks' as table_name, COUNT(*) as count FROM speaking_tasks
UNION ALL
SELECT 'writing_tasks' as table_name, COUNT(*) as count FROM writing_tasks;

-- Check for duplicates
SELECT id, task_type, COUNT(*) as duplicate_count
FROM speaking_tasks
GROUP BY id, task_type
HAVING COUNT(*) > 1;

-- Sample data
SELECT * FROM speaking_tasks LIMIT 5;
```

## Testing RLS Policies

To verify that Row Level Security (RLS) policies are working correctly:

### 1. Test Anonymous Access
```sql
-- Set role to anonymous user
SET ROLE anon;

-- Try to select data (should fail - no anonymous access)
SELECT COUNT(*) FROM speaking_tasks;

-- Try to insert data (should fail)
INSERT INTO speaking_tasks (id, task_type, task_description) 
VALUES (999999, 'test', 'test description');
```

### 2. Test Authenticated User Access
```sql
-- Set role to authenticated user
SET ROLE authenticated;

-- Try to select data (should work)
SELECT COUNT(*) FROM speaking_tasks;

-- Try to insert data (should fail - only service role can insert)
INSERT INTO speaking_tasks (id, task_type, task_description) 
VALUES (999998, 'test', 'test description');
```

### 3. Test Service Role Access (Full Access)
```sql
-- Set role to service role (bypasses RLS)
SET ROLE service_role;

-- Should have full access
SELECT COUNT(*) FROM speaking_tasks;

-- Can insert data
INSERT INTO speaking_tasks (id, task_type, task_description) 
VALUES (999997, 'test', 'test description');

-- Clean up test data
DELETE FROM speaking_tasks WHERE id IN (999997);
```

### 4. Reset Role
```sql
RESET ROLE;
```

**Current RLS Policy Summary:**
- ✅ **Authenticated users**: Can READ only
- ❌ **Anonymous users**: No access (cannot read or write)
- ✅ **Service role**: Full access (read, write, update, delete)

## Security Notes

- ⚠️ **Never commit your service role key to version control**
- ⚠️ **Only use the service role key for server-side operations**
- ✅ **Use the anon key for client-side operations**
- ✅ **Keep your `.env` file secure and out of git**