Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
File size: 3,241 Bytes
61d29fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | # Scripts Directory
All scripts are organized into logical subdirectories by function.
## π Directory Structure
```
scripts/
βββ data/ # Data processing and migration
βββ datasources/ # Data source integrations and connectors
βββ deployment/ # Deployment and setup
βββ development/ # Development and debugging tools
βββ enrichment/ # Data enrichment (990s, nonprofits)
βββ huggingface/ # HuggingFace dataset management
βββ maintenance/ # Cleanup and maintenance
```
## π Folder Descriptions
### [data/](data/)
Data processing pipelines, migrations, and aggregations.
- Bill aggregations from PostgreSQL
- Gold table creation
- Contact extraction
- Data splits and partitioning
### [datasources/](datasources/)
Data source integrations and API connectors organized by external data source.
**Available Sources:**
- `openstates/` - OpenStates legislative data (bills, legislators, votes) - 7 scripts
- `census/` - US Census Bureau geographic and demographic data - 3 scripts
- `irs/` - IRS nonprofit data (Form 990, Business Master File) - 3 scripts
- `fec/` - Federal Election Commission campaign finance data - 2 scripts
- `ballotpedia/` - Ballotpedia election and official data - 1 script
- `google_civic/` - Google Civic Information API - 1 script
- `grants_gov/` - Grants.gov federal grant data - 1 script
- `localview/` - LocalView meeting transcripts - 1 script
- `meetingbank/` - MeetingBank research dataset - 1 script
- `nces/` - National Center for Education Statistics - 1 script
- `wikidata/` - Wikidata structured knowledge - 1 script
- `dbpedia/` - DBpedia structured Wikipedia data - 1 script
- `voter_data/` - Voter registration and turnout data - 1 script
See [datasources/README.md](datasources/README.md) for detailed documentation.
### [deployment/](deployment/)
Setup scripts for local development and production deployment.
- Local environment setup
- Database initialization
- Databricks deployment
### [development/](development/)
Development and debugging tools.
- Debug scripts for scrapers
- Testing utilities
- Development helpers
### [enrichment/](enrichment/)
Scripts to enrich nonprofit data with additional metadata.
- 990 form downloads and processing
- Nonprofit profile enrichment
- Multiple data source integrations
### [huggingface/](huggingface/)
HuggingFace dataset preparation and upload.
- Dataset restructuring
- Multi-dataset uploads
- Finalization scripts
### [maintenance/](maintenance/)
System maintenance and cleanup utilities.
- Disk space cleanup
- File management
- Development utilities
## π Finding Scripts
Use these commands to find scripts:
```bash
# List all scripts in a category
ls scripts/data/
# Search for a specific script
find scripts/ -name "*aggregate*"
# See what a script does
head -20 scripts/data/aggregate_bills_from_postgres.py
```
## β οΈ Important Note
All scripts should be run from the project root directory:
```bash
# β
Correct
cd /home/developer/projects/open-navigator
python scripts/data/aggregate_bills_from_postgres.py
# β Wrong
cd scripts/data
python aggregate_bills_from_postgres.py
```
|