open-navigator / scripts /README.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
# Scripts Directory
All scripts are organized into logical subdirectories by function.
## πŸ“‚ Directory Structure
```
scripts/
β”œβ”€β”€ data/ # Data processing and migration
β”œβ”€β”€ datasources/ # Data source integrations and connectors
β”œβ”€β”€ deployment/ # Deployment and setup
β”œβ”€β”€ development/ # Development and debugging tools
β”œβ”€β”€ enrichment/ # Data enrichment (990s, nonprofits)
β”œβ”€β”€ huggingface/ # HuggingFace dataset management
└── maintenance/ # Cleanup and maintenance
```
## πŸ“– Folder Descriptions
### [data/](data/)
Data processing pipelines, migrations, and aggregations.
- Bill aggregations from PostgreSQL
- Gold table creation
- Contact extraction
- Data splits and partitioning
### [datasources/](datasources/)
Data source integrations and API connectors organized by external data source.
**Available Sources:**
- `openstates/` - OpenStates legislative data (bills, legislators, votes) - 7 scripts
- `census/` - US Census Bureau geographic and demographic data - 3 scripts
- `irs/` - IRS nonprofit data (Form 990, Business Master File) - 3 scripts
- `fec/` - Federal Election Commission campaign finance data - 2 scripts
- `ballotpedia/` - Ballotpedia election and official data - 1 script
- `google_civic/` - Google Civic Information API - 1 script
- `grants_gov/` - Grants.gov federal grant data - 1 script
- `localview/` - LocalView meeting transcripts - 1 script
- `meetingbank/` - MeetingBank research dataset - 1 script
- `nces/` - National Center for Education Statistics - 1 script
- `wikidata/` - Wikidata structured knowledge - 1 script
- `dbpedia/` - DBpedia structured Wikipedia data - 1 script
- `voter_data/` - Voter registration and turnout data - 1 script
See [datasources/README.md](datasources/README.md) for detailed documentation.
### [deployment/](deployment/)
Setup scripts for local development and production deployment.
- Local environment setup
- Database initialization
- Databricks deployment
### [development/](development/)
Development and debugging tools.
- Debug scripts for scrapers
- Testing utilities
- Development helpers
### [enrichment/](enrichment/)
Scripts to enrich nonprofit data with additional metadata.
- 990 form downloads and processing
- Nonprofit profile enrichment
- Multiple data source integrations
### [huggingface/](huggingface/)
HuggingFace dataset preparation and upload.
- Dataset restructuring
- Multi-dataset uploads
- Finalization scripts
### [maintenance/](maintenance/)
System maintenance and cleanup utilities.
- Disk space cleanup
- File management
- Development utilities
## πŸ” Finding Scripts
Use these commands to find scripts:
```bash
# List all scripts in a category
ls scripts/data/
# Search for a specific script
find scripts/ -name "*aggregate*"
# See what a script does
head -20 scripts/data/aggregate_bills_from_postgres.py
```
## ⚠️ Important Note
All scripts should be run from the project root directory:
```bash
# βœ… Correct
cd /home/developer/projects/open-navigator
python scripts/data/aggregate_bills_from_postgres.py
# ❌ Wrong
cd scripts/data
python aggregate_bills_from_postgres.py
```