Spaces:
Sleeping
Database Population Guide
This guide explains how to populate your Supabase database with synthetic data for testing the Cognexa ML pipeline.
Overview
The populate_database.py script creates realistic synthetic data that simulates real users and their behavior. This allows you to:
- Test the full ML pipeline from database → Spring Boot → FastAPI → predictions
- Verify feature calculation matches training data
- Test your backend API endpoints with realistic data
- Validate your ML service can predict on real database rows
Prerequisites
- Supabase Database: Ensure your Supabase database is created and running
- Database Credentials: Get your Supabase connection details from the Supabase dashboard
- Python Dependencies: Install required packages
Setup
1. Configure Database Credentials
Copy .env.example to .env and update with your Supabase credentials:
cd ML
cp .env.example .env
Edit .env with your Supabase connection details:
SUPABASE_DB_URL=jdbc:postgresql://aws-1-ap-southeast-2.pooler.supabase.com:6543/postgres?prepareThreshold=0&tcpKeepAlive=true&socketTimeout=0&connectTimeout=30
SUPABASE_DB_USERNAME=postgres.your-project-ref
SUPABASE_DB_PASSWORD=your-supabase-password
2. Install Dependencies
cd ML
pip install psycopg2-binary python-dotenv numpy
Usage
Basic Usage
cd ML
python populate_database.py
This creates 50 users with 100 tasks each (5000 total tasks).
Custom Parameters
# Create 20 users with 50 tasks each
python populate_database.py --users 20 --tasks 50
# Create 100 users with 200 tasks each
python populate_database.py --users 100 --tasks 200
What Gets Created
Per User
| Data Type | Quantity | Description |
|---|---|---|
| Users | 1 | User account with personality profile |
| Tasks | 100 | Realistic tasks with completion probabilities |
| Behavior Events | 50 | Task and focus session events |
| Habits | 5 | Habit tracking with completions |
| Analytics | 30 | Daily productivity metrics |
| Task Predictions | 100 | ML predictions for each task |
| Interventions | 20 | Proactive behavioral interventions |
| Coaching Insights | 15 | AI-generated coaching recommendations |
| Notifications | 30 | User notifications |
| Prediction Feedback | 40 | User feedback on predictions |
| Activity Logs | 50 | User activity tracking |
| User Streaks | 8 | Habit streak tracking |
| Achievements | 10 | Gamification achievements |
| Focus Sessions | 25 | Pomodoro session tracking |
| User Settings | 1 | User preferences |
| Wellbeing Data | 30 | Daily mood and energy tracking |
| Goals | 5 | User goals with progress |
| Recurring Tasks | 10 | Recurring task patterns |
| Saved Filters | 5 | Task filter configurations |
| Research Data | 1 | Research consent and participation |
| Export History | 3 | Data export records |
| ML Experiments | 1 | Model training records |
| AB Testing | 1 | Feature experiment records |
| EMA Prompts | 20 | Experience sampling prompts |
| Task Templates | 10 | Reusable task templates |
Data Generation Logic
Personality Profiles
Uses the Big Five (OCEAN) personality model with realistic correlations:
- Openness: Creativity and curiosity
- Conscientiousness: Organization and dependability
- Extraversion: Sociability and energy
- Agreeableness: Cooperativeness
- Neuroticism: Emotional stability
Task Completion Probability
Calculated based on:
- Personality-task interaction (conscientiousness is strongest predictor)
- Task priority and complexity
- Time pressure
- Category-specific effects
Behavioral Patterns
- Realistic task creation timestamps
- Historical behavior tracking
- Habit completion streaks
- Focus session metrics
Testing the Pipeline
1. Start Spring Boot Backend
cd server
mvn spring-boot:run
2. Start FastAPI ML Service
cd ML
python main.py
3. Test Predictions
# Get a task ID from the database
SELECT id FROM tasks LIMIT 1;
# Test prediction endpoint
curl -X POST http://localhost:8080/api/predictions/task/{taskId}
4. Verify Feature Calculation
Check that the features calculated by Spring Boot match what was used during training:
SELECT
t.id,
t.complexity,
t.priority,
t.category,
p.openness,
p.conscientiousness,
p.extraversion,
p.agreeableness,
p.neuroticism
FROM tasks t
JOIN personality_profiles p ON t.user_id = p.user_id
LIMIT 10;
Troubleshooting
Connection Issues
If you see connection errors:
- Verify Supabase credentials in
.env - Check Supabase dashboard for database status
- Ensure your IP is whitelisted in Supabase network settings
- Verify the database URL format
Permission Errors
If you see permission errors:
- Ensure the database user has INSERT permissions
- Check table constraints are satisfied
- Verify UUID generation extension is enabled
Data Already Exists
If data already exists, the script will skip duplicate entries using ON CONFLICT DO NOTHING.
Data Cleanup
To clear the database and start fresh:
-- In Supabase SQL Editor or psql
TRUNCATE TABLE user_achievements CASCADE;
TRUNCATE TABLE user_streaks CASCADE;
TRUNCATE TABLE habit_completions CASCADE;
TRUNCATE TABLE habits CASCADE;
TRUNCATE TABLE task_predictions CASCADE;
TRUNCATE TABLE behavior_events CASCADE;
TRUNCATE TABLE tasks CASCADE;
TRUNCATE TABLE personality_profiles CASCADE;
TRUNCATE TABLE users CASCADE;
Performance Notes
- Uses batch inserts (
executemany) for optimal performance - Connection pooling recommended for large datasets
- Consider running during off-peak hours for production databases
Next Steps
After populating the database:
- Test your Spring Boot API endpoints
- Verify ML service predictions
- Check feature calculation accuracy
- Validate frontend displays data correctly
- Run end-to-end tests