Spaces:
Sleeping
Sleeping
| # Database Population Guide | |
| This guide explains how to populate your Supabase database with synthetic data for testing the Cognexa ML pipeline. | |
| ## Overview | |
| The `populate_database.py` script creates realistic synthetic data that simulates real users and their behavior. This allows you to: | |
| - Test the full ML pipeline from database → Spring Boot → FastAPI → predictions | |
| - Verify feature calculation matches training data | |
| - Test your backend API endpoints with realistic data | |
| - Validate your ML service can predict on real database rows | |
| ## Prerequisites | |
| 1. **Supabase Database**: Ensure your Supabase database is created and running | |
| 2. **Database Credentials**: Get your Supabase connection details from the Supabase dashboard | |
| 3. **Python Dependencies**: Install required packages | |
| ## Setup | |
| ### 1. Configure Database Credentials | |
| Copy `.env.example` to `.env` and update with your Supabase credentials: | |
| ```bash | |
| cd ML | |
| cp .env.example .env | |
| ``` | |
| Edit `.env` with your Supabase connection details: | |
| ```env | |
| SUPABASE_DB_URL=jdbc:postgresql://aws-1-ap-southeast-2.pooler.supabase.com:6543/postgres?prepareThreshold=0&tcpKeepAlive=true&socketTimeout=0&connectTimeout=30 | |
| SUPABASE_DB_USERNAME=postgres.your-project-ref | |
| SUPABASE_DB_PASSWORD=your-supabase-password | |
| ``` | |
| ### 2. Install Dependencies | |
| ```bash | |
| cd ML | |
| pip install psycopg2-binary python-dotenv numpy | |
| ``` | |
| ## Usage | |
| ### Basic Usage | |
| ```bash | |
| cd ML | |
| python populate_database.py | |
| ``` | |
| This creates 50 users with 100 tasks each (5000 total tasks). | |
| ### Custom Parameters | |
| ```bash | |
| # Create 20 users with 50 tasks each | |
| python populate_database.py --users 20 --tasks 50 | |
| # Create 100 users with 200 tasks each | |
| python populate_database.py --users 100 --tasks 200 | |
| ``` | |
| ## What Gets Created | |
| ### Per User | |
| | Data Type | Quantity | Description | | |
| |-----------|----------|-------------| | |
| | Users | 1 | User account with personality profile | | |
| | Tasks | 100 | Realistic tasks with completion probabilities | | |
| | Behavior Events | 50 | Task and focus session events | | |
| | Habits | 5 | Habit tracking with completions | | |
| | Analytics | 30 | Daily productivity metrics | | |
| | Task Predictions | 100 | ML predictions for each task | | |
| | Interventions | 20 | Proactive behavioral interventions | | |
| | Coaching Insights | 15 | AI-generated coaching recommendations | | |
| | Notifications | 30 | User notifications | | |
| | Prediction Feedback | 40 | User feedback on predictions | | |
| | Activity Logs | 50 | User activity tracking | | |
| | User Streaks | 8 | Habit streak tracking | | |
| | Achievements | 10 | Gamification achievements | | |
| | Focus Sessions | 25 | Pomodoro session tracking | | |
| | User Settings | 1 | User preferences | | |
| | Wellbeing Data | 30 | Daily mood and energy tracking | | |
| | Goals | 5 | User goals with progress | | |
| | Recurring Tasks | 10 | Recurring task patterns | | |
| | Saved Filters | 5 | Task filter configurations | | |
| | Research Data | 1 | Research consent and participation | | |
| | Export History | 3 | Data export records | | |
| | ML Experiments | 1 | Model training records | | |
| | AB Testing | 1 | Feature experiment records | | |
| | EMA Prompts | 20 | Experience sampling prompts | | |
| | Task Templates | 10 | Reusable task templates | | |
| ## Data Generation Logic | |
| ### Personality Profiles | |
| Uses the Big Five (OCEAN) personality model with realistic correlations: | |
| - **Openness**: Creativity and curiosity | |
| - **Conscientiousness**: Organization and dependability | |
| - **Extraversion**: Sociability and energy | |
| - **Agreeableness**: Cooperativeness | |
| - **Neuroticism**: Emotional stability | |
| ### Task Completion Probability | |
| Calculated based on: | |
| - Personality-task interaction (conscientiousness is strongest predictor) | |
| - Task priority and complexity | |
| - Time pressure | |
| - Category-specific effects | |
| ### Behavioral Patterns | |
| - Realistic task creation timestamps | |
| - Historical behavior tracking | |
| - Habit completion streaks | |
| - Focus session metrics | |
| ## Testing the Pipeline | |
| ### 1. Start Spring Boot Backend | |
| ```bash | |
| cd server | |
| mvn spring-boot:run | |
| ``` | |
| ### 2. Start FastAPI ML Service | |
| ```bash | |
| cd ML | |
| python main.py | |
| ``` | |
| ### 3. Test Predictions | |
| ```bash | |
| # Get a task ID from the database | |
| SELECT id FROM tasks LIMIT 1; | |
| # Test prediction endpoint | |
| curl -X POST http://localhost:8080/api/predictions/task/{taskId} | |
| ``` | |
| ### 4. Verify Feature Calculation | |
| Check that the features calculated by Spring Boot match what was used during training: | |
| ```sql | |
| SELECT | |
| t.id, | |
| t.complexity, | |
| t.priority, | |
| t.category, | |
| p.openness, | |
| p.conscientiousness, | |
| p.extraversion, | |
| p.agreeableness, | |
| p.neuroticism | |
| FROM tasks t | |
| JOIN personality_profiles p ON t.user_id = p.user_id | |
| LIMIT 10; | |
| ``` | |
| ## Troubleshooting | |
| ### Connection Issues | |
| If you see connection errors: | |
| 1. Verify Supabase credentials in `.env` | |
| 2. Check Supabase dashboard for database status | |
| 3. Ensure your IP is whitelisted in Supabase network settings | |
| 4. Verify the database URL format | |
| ### Permission Errors | |
| If you see permission errors: | |
| 1. Ensure the database user has INSERT permissions | |
| 2. Check table constraints are satisfied | |
| 3. Verify UUID generation extension is enabled | |
| ### Data Already Exists | |
| If data already exists, the script will skip duplicate entries using `ON CONFLICT DO NOTHING`. | |
| ## Data Cleanup | |
| To clear the database and start fresh: | |
| ```sql | |
| -- In Supabase SQL Editor or psql | |
| TRUNCATE TABLE user_achievements CASCADE; | |
| TRUNCATE TABLE user_streaks CASCADE; | |
| TRUNCATE TABLE habit_completions CASCADE; | |
| TRUNCATE TABLE habits CASCADE; | |
| TRUNCATE TABLE task_predictions CASCADE; | |
| TRUNCATE TABLE behavior_events CASCADE; | |
| TRUNCATE TABLE tasks CASCADE; | |
| TRUNCATE TABLE personality_profiles CASCADE; | |
| TRUNCATE TABLE users CASCADE; | |
| ``` | |
| ## Performance Notes | |
| - Uses batch inserts (`executemany`) for optimal performance | |
| - Connection pooling recommended for large datasets | |
| - Consider running during off-peak hours for production databases | |
| ## Next Steps | |
| After populating the database: | |
| 1. Test your Spring Boot API endpoints | |
| 2. Verify ML service predictions | |
| 3. Check feature calculation accuracy | |
| 4. Validate frontend displays data correctly | |
| 5. Run end-to-end tests | |