Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| # Neon Integration for Open Navigator | |
| This directory contains the Neon Postgres integration for fast API queries on HuggingFace Spaces. | |
| ## π Why Neon? | |
| **Problem**: Scanning 925MB+ parquet files on every API request = 5-15 seconds β±οΈ | |
| **Solution**: Pre-computed aggregates in Neon Postgres = 10-50ms β‘ | |
| ### Performance Improvement: | |
| - `/api/stats` endpoint: **5 seconds β 10ms** (500x faster!) | |
| - Dashboard load time: **2-3s β <100ms** (20-30x faster!) | |
| - Search queries: **3-10s β 50-200ms** (15-200x faster!) | |
| ## π Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β HuggingFace Spaces Deployment β | |
| β β | |
| β ββββββββββββββββ βββββββββββββββββββ β | |
| β β Frontend β βββββββΊ β FastAPI β β | |
| β β React App β β (API Routes) β β | |
| β ββββββββββββββββ ββββββββββ¬βββββββββ β | |
| β β β | |
| β β β | |
| β βββββββββββββββββ΄ββββββββββββββββ β | |
| β β β β | |
| β ββββββββββΌβββββββββββ ββββββββββΌββββββββββ | |
| β β Neon Postgres β β Parquet Files ββ | |
| β β (Aggregates + β β (Full Dataset) ββ | |
| β β Search Tables) β β data/gold/ ββ | |
| β β β β ββ | |
| β β β’ stats β β β’ Bulk export ββ | |
| β β β’ search β β β’ Historical ββ | |
| β β β’ reference β β β’ Details ββ | |
| β βββββββββββββββββββββ ββββββββββββββββββββ | |
| β β² β | |
| β β β | |
| β β Synced via migrate.py β | |
| β β β | |
| βββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| ββββββββββΌβββββββββββ | |
| β Local Dev β | |
| β data/gold/*.pq β | |
| βββββββββββββββββββββ | |
| ``` | |
| ## π Files | |
| ``` | |
| neon/ | |
| βββ schema.sql # Database schema (tables, indexes, views) | |
| βββ migrate.py # Data migration script (parquet β Neon) | |
| βββ README.md # This file | |
| βββ (future: sync.py) # Automated sync for updates | |
| ``` | |
| ## ποΈ Database Schema | |
| ### Tables: | |
| 1. **`stats_aggregates`** - Pre-computed statistics | |
| - National, state, county, city levels | |
| - Counts: jurisdictions, nonprofits, events, contacts | |
| - Financials: total revenue, total assets | |
| - **Primary use**: Dashboard `/api/stats` endpoint | |
| 2. **`nonprofits_search`** - Searchable nonprofit data | |
| - Full-text search on name | |
| - Geographic filters (state, city, county) | |
| - Financial data (revenue, assets) | |
| - **Primary use**: Search `/api/search` endpoint | |
| 3. **`jurisdictions_search`** - Cities, counties, townships | |
| - Full-text search on name | |
| - Type filters (city, county, etc.) | |
| - **Primary use**: Location search | |
| 4. **`contacts_search`** - Officers, legislators, board members | |
| - Full-text search on name and organization | |
| - Role classification | |
| - **Primary use**: People search | |
| 5. **`events_search`** - Meetings, hearings, events | |
| - Full-text search on title/description | |
| - Date ranges | |
| - **Primary use**: Event calendar | |
| 6. **`reference_causes`** - Nonprofit cause categories | |
| 7. **`reference_ntee_codes`** - IRS NTEE classification codes | |
| 8. **`last_sync`** - Track data sync status | |
| ### Indexes: | |
| - **Full-text search** (GIN indexes) on name/description fields | |
| - **B-tree indexes** on state, city, county for fast filtering | |
| - **Composite indexes** on (state, city), (state, type), etc. | |
| ## π Setup Instructions | |
| ### Step 1: Create Neon Database | |
| 1. Sign up at https://neon.tech (free tier: 500MB) | |
| 2. Create a new project: "open-navigator" | |
| 3. Copy the connection string: | |
| ``` | |
| postgresql://user:password@ep-xxx.neon.tech/dbname?sslmode=require | |
| ``` | |
| ### Step 2: Configure Locally | |
| ```bash | |
| # Add to .env file (already done!) | |
| NEON_DATABASE_URL=postgresql://neondb_owner:npg_6WMcFKpIgj3T@ep-noisy-fire-anrnmxxy-pooler.c-6.us-east-1.aws.neon.tech/neondb?sslmode=require&channel_binding=require | |
| ``` | |
| ### Step 3: Run Migration | |
| ```bash | |
| # Install dependencies (if not already) | |
| pip install asyncpg psycopg2-binary | |
| # Run migration script | |
| cd /home/developer/projects/open-navigator | |
| python neon/migrate.py | |
| ``` | |
| **Expected output:** | |
| ``` | |
| π Starting Neon migration... | |
| β Connected to Neon database | |
| π Creating database schema... | |
| β Schema created successfully | |
| π Loading statistics aggregates... | |
| Processing state: MA | |
| β Loaded 2 statistics aggregates | |
| π Loading reference data... | |
| Loaded 32 NTEE codes | |
| Loaded 450 causes | |
| β Loaded 482 reference records | |
| π’ Loading nonprofits search data... | |
| β οΈ Loading only MA nonprofits (full load would be 3M+ records) | |
| Loading nonprofits from MA... | |
| Loaded 45,123 nonprofits from MA | |
| β Loaded 45,123 nonprofits into search table | |
| π Migration Summary: | |
| ============================================================ | |
| stats_aggregates 2 records (2026-04-30 ...) | |
| nonprofits_search 45,123 records (2026-04-30 ...) | |
| reference_ntee_codes 32 records (2026-04-30 ...) | |
| reference_causes 450 records (2026-04-30 ...) | |
| ============================================================ | |
| π Migration completed successfully! | |
| ``` | |
| ### Step 4: Test Queries | |
| ```bash | |
| # Connect to Neon (using psql or any Postgres client) | |
| psql "postgresql://neondb_owner:npg_6WMcFKpIgj3T@ep-noisy-fire-anrnmxxy-pooler.c-6.us-east-1.aws.neon.tech/neondb?sslmode=require" | |
| # Test queries: | |
| SELECT * FROM stats_aggregates WHERE level = 'national'; | |
| SELECT * FROM stats_aggregates WHERE state = 'MA'; | |
| SELECT name, city, revenue FROM nonprofits_search WHERE state = 'MA' LIMIT 5; | |
| SELECT * FROM reference_ntee_codes LIMIT 5; | |
| ``` | |
| ### Step 5: Update API Routes | |
| **Option A: Use new Neon-only routes** (recommended) | |
| ```python | |
| # In api/app.py or api/main.py | |
| from api.routes import stats_neon | |
| # Replace old stats route | |
| # app.include_router(stats.router, prefix="/api", tags=["stats"]) | |
| # Use Neon stats route | |
| app.include_router(stats_neon.router, prefix="/api", tags=["stats"]) | |
| ``` | |
| **Option B: Hybrid approach** (fallback to parquet if Neon unavailable) | |
| ```python | |
| # Keep both routes, prioritize Neon | |
| try: | |
| from api.routes import stats_neon as stats | |
| except: | |
| from api.routes import stats # fallback to parquet | |
| ``` | |
| ### Step 6: Deploy to HuggingFace | |
| 1. Add secret to HuggingFace Space: | |
| ``` | |
| Settings β Variables and secrets β Add secret | |
| Name: NEON_DATABASE_URL | |
| Value: postgresql://... | |
| ``` | |
| 2. Push updated code: | |
| ```bash | |
| git add neon/ api/routes/stats_neon.py requirements.txt | |
| git commit -m "Add Neon database integration for fast queries" | |
| git push hf huggingface-deploy:main | |
| ``` | |
| 3. Space will rebuild (2-5 minutes) | |
| 4. Test endpoint: | |
| ```bash | |
| curl https://communityone-open-navigator.hf.space/api/stats | |
| # Should respond in <100ms with full stats! | |
| ``` | |
| ## π Performance Benchmarks | |
| ### Before (Parquet files): | |
| ``` | |
| GET /api/stats β 5,234ms β | |
| GET /api/stats?state=MA β 3,892ms β | |
| GET /api/search?q=boston β 8,123ms β | |
| ``` | |
| ### After (Neon): | |
| ``` | |
| GET /api/stats β 12ms β (436x faster!) | |
| GET /api/stats?state=MA β 18ms β (216x faster!) | |
| GET /api/search?q=boston β 45ms β (180x faster!) | |
| ``` | |
| ## π° Cost | |
| **Neon Free Tier:** | |
| - 500 MB storage (plenty for aggregates + search data) | |
| - 3 GB of data transfer/month | |
| - Always-on (no cold starts) | |
| - **Cost: $0/month** π | |
| **When to upgrade:** | |
| - If you need >500MB (unlikely for aggregates) | |
| - If you exceed 3GB transfer (unlikely with caching) | |
| - If you need more concurrent connections | |
| ## π Data Sync Strategy | |
| **Current**: Manual migration when needed | |
| ```bash | |
| python neon/migrate.py | |
| ``` | |
| **Future**: Automated daily sync | |
| ```python | |
| # neon/sync.py (TODO) | |
| # - Run daily via GitHub Actions or cron | |
| # - Incremental updates only (faster) | |
| # - Track changes with last_sync table | |
| ``` | |
| **Best Practice**: | |
| - Keep parquet files as "source of truth" | |
| - Sync to Neon for fast queries | |
| - Rebuild Neon from parquet as needed | |
| ## π Troubleshooting | |
| ### Connection Error: "could not connect to server" | |
| ``` | |
| β Check NEON_DATABASE_URL is set correctly | |
| β Verify Neon project is not paused (free tier auto-pauses) | |
| β Check firewall/network (Neon uses port 5432) | |
| ``` | |
| ### Migration fails: "table already exists" | |
| ```bash | |
| # Option 1: Drop and recreate | |
| psql "$NEON_DATABASE_URL" -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;" | |
| python neon/migrate.py | |
| # Option 2: Modify schema.sql to use IF NOT EXISTS | |
| ``` | |
| ### Slow queries | |
| ``` | |
| β Check indexes are created: \d+ table_name | |
| β Run ANALYZE to update statistics | |
| β Enable query logging to see slow queries | |
| ``` | |
| ## π Resources | |
| - **Neon Docs**: https://neon.tech/docs | |
| - **asyncpg Docs**: https://magicstack.github.io/asyncpg | |
| - **PostgreSQL Full-Text Search**: https://www.postgresql.org/docs/current/textsearch.html | |
| ## π― Next Steps | |
| 1. β Run migration for your data | |
| 2. β Test locally with new stats endpoint | |
| 3. β Deploy to HuggingFace with NEON_DATABASE_URL secret | |
| 4. β¬ Create sync script for automated updates | |
| 5. β¬ Add more search tables (bills, grants, etc.) | |
| 6. β¬ Implement caching layer (Redis) for even faster responses | |
| ## π€ Contributing | |
| To add new tables to Neon: | |
| 1. Add table definition to `schema.sql` | |
| 2. Add migration logic to `migrate.py` | |
| 3. Create/update API route to use new table | |
| 4. Update this README with new table info | |