File size: 3,241 Bytes
61d29fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# Scripts Directory

All scripts are organized into logical subdirectories by function.

## πŸ“‚ Directory Structure

```
scripts/
β”œβ”€β”€ data/                    # Data processing and migration
β”œβ”€β”€ datasources/             # Data source integrations and connectors
β”œβ”€β”€ deployment/              # Deployment and setup
β”œβ”€β”€ development/             # Development and debugging tools
β”œβ”€β”€ enrichment/              # Data enrichment (990s, nonprofits)
β”œβ”€β”€ huggingface/             # HuggingFace dataset management
└── maintenance/             # Cleanup and maintenance
```

## πŸ“– Folder Descriptions

### [data/](data/)
Data processing pipelines, migrations, and aggregations.
- Bill aggregations from PostgreSQL
- Gold table creation
- Contact extraction
- Data splits and partitioning

### [datasources/](datasources/)
Data source integrations and API connectors organized by external data source.

**Available Sources:**
- `openstates/` - OpenStates legislative data (bills, legislators, votes) - 7 scripts
- `census/` - US Census Bureau geographic and demographic data - 3 scripts
- `irs/` - IRS nonprofit data (Form 990, Business Master File) - 3 scripts
- `fec/` - Federal Election Commission campaign finance data - 2 scripts
- `ballotpedia/` - Ballotpedia election and official data - 1 script
- `google_civic/` - Google Civic Information API - 1 script
- `grants_gov/` - Grants.gov federal grant data - 1 script
- `localview/` - LocalView meeting transcripts - 1 script
- `meetingbank/` - MeetingBank research dataset - 1 script
- `nces/` - National Center for Education Statistics - 1 script
- `wikidata/` - Wikidata structured knowledge - 1 script
- `dbpedia/` - DBpedia structured Wikipedia data - 1 script
- `voter_data/` - Voter registration and turnout data - 1 script

See [datasources/README.md](datasources/README.md) for detailed documentation.

### [deployment/](deployment/)
Setup scripts for local development and production deployment.
- Local environment setup
- Database initialization
- Databricks deployment

### [development/](development/)
Development and debugging tools.
- Debug scripts for scrapers
- Testing utilities
- Development helpers

### [enrichment/](enrichment/)
Scripts to enrich nonprofit data with additional metadata.
- 990 form downloads and processing
- Nonprofit profile enrichment
- Multiple data source integrations

### [huggingface/](huggingface/)
HuggingFace dataset preparation and upload.
- Dataset restructuring
- Multi-dataset uploads
- Finalization scripts

### [maintenance/](maintenance/)
System maintenance and cleanup utilities.
- Disk space cleanup
- File management
- Development utilities

## πŸ” Finding Scripts

Use these commands to find scripts:

```bash
# List all scripts in a category
ls scripts/data/

# Search for a specific script
find scripts/ -name "*aggregate*"

# See what a script does
head -20 scripts/data/aggregate_bills_from_postgres.py
```

## ⚠️ Important Note

All scripts should be run from the project root directory:

```bash
# βœ… Correct
cd /home/developer/projects/open-navigator
python scripts/data/aggregate_bills_from_postgres.py

# ❌ Wrong
cd scripts/data
python aggregate_bills_from_postgres.py
```