Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Scripts Directory
All scripts are organized into logical subdirectories by function.
π Directory Structure
scripts/
βββ data/ # Data processing and migration
βββ datasources/ # Data source integrations and connectors
βββ deployment/ # Deployment and setup
βββ development/ # Development and debugging tools
βββ enrichment/ # Data enrichment (990s, nonprofits)
βββ huggingface/ # HuggingFace dataset management
βββ maintenance/ # Cleanup and maintenance
π Folder Descriptions
data/
Data processing pipelines, migrations, and aggregations.
- Bill aggregations from PostgreSQL
- Gold table creation
- Contact extraction
- Data splits and partitioning
datasources/
Data source integrations and API connectors organized by external data source.
Available Sources:
openstates/- OpenStates legislative data (bills, legislators, votes) - 7 scriptscensus/- US Census Bureau geographic and demographic data - 3 scriptsirs/- IRS nonprofit data (Form 990, Business Master File) - 3 scriptsfec/- Federal Election Commission campaign finance data - 2 scriptsballotpedia/- Ballotpedia election and official data - 1 scriptgoogle_civic/- Google Civic Information API - 1 scriptgrants_gov/- Grants.gov federal grant data - 1 scriptlocalview/- LocalView meeting transcripts - 1 scriptmeetingbank/- MeetingBank research dataset - 1 scriptnces/- National Center for Education Statistics - 1 scriptwikidata/- Wikidata structured knowledge - 1 scriptdbpedia/- DBpedia structured Wikipedia data - 1 scriptvoter_data/- Voter registration and turnout data - 1 script
See datasources/README.md for detailed documentation.
deployment/
Setup scripts for local development and production deployment.
- Local environment setup
- Database initialization
- Databricks deployment
development/
Development and debugging tools.
- Debug scripts for scrapers
- Testing utilities
- Development helpers
enrichment/
Scripts to enrich nonprofit data with additional metadata.
- 990 form downloads and processing
- Nonprofit profile enrichment
- Multiple data source integrations
huggingface/
HuggingFace dataset preparation and upload.
- Dataset restructuring
- Multi-dataset uploads
- Finalization scripts
maintenance/
System maintenance and cleanup utilities.
- Disk space cleanup
- File management
- Development utilities
π Finding Scripts
Use these commands to find scripts:
# List all scripts in a category
ls scripts/data/
# Search for a specific script
find scripts/ -name "*aggregate*"
# See what a script does
head -20 scripts/data/aggregate_bills_from_postgres.py
β οΈ Important Note
All scripts should be run from the project root directory:
# β
Correct
cd /home/developer/projects/open-navigator
python scripts/data/aggregate_bills_from_postgres.py
# β Wrong
cd scripts/data
python aggregate_bills_from_postgres.py