jcbowyer's picture
Deploy: Consolidated gold tables, fixed nginx docs routing
bf3e681 verified
# FEC Campaign Finance Scripts
Scripts for working with [Federal Election Commission](https://www.fec.gov/) campaign finance data.
## Data Source
- **Website**: https://www.fec.gov/
- **API**: https://api.open.fec.gov/developers/
- **Bulk Data**: https://www.fec.gov/data/browse-data/?tab=bulk-data
- **Coverage**: Federal candidates, committees, contributions
- **Data Types**: Candidates, committees, contributions, expenditures, filings
## Scripts
### `bulk_download_fec.py`
Download all FEC bulk data files and organize them by year and type.
**Features:**
- Downloads all bulk data from 1980-present
- Organizes files matching FEC website structure
- Resume interrupted downloads
- Progress tracking and logging
- Filter by year or file type
**Usage:**
```bash
# Download everything to D:/fec_data/
python bulk_download_fec.py
# Download to custom directory
python bulk_download_fec.py --base-dir /mnt/d/fec_data
# Download specific years only
python bulk_download_fec.py --years 2020,2022,2024
# Download specific file types only
python bulk_download_fec.py --types indiv,cn,cm
# Resume interrupted download
python bulk_download_fec.py --resume
# Dry run (show what would be downloaded)
python bulk_download_fec.py --dry-run
```
**File Types:**
- `cm` - Committee Master files
- `cn` - Candidate Master files
- `ccl` - Candidate-Committee Linkages
- `indiv` - Individual Contributions
- `pas2` - PAC Summary files
- `oth` - Other Transactions
- `oppexp` - Operating Expenditures
- `weball` - All Candidates
- `webk` / `webl` - Current House/Senate Campaigns
**Output Structure:**
```
/mnt/d/fec_data/
└── bulk-downloads/
β”œβ”€β”€ candidate-master/ (Candidate master files)
β”‚ β”œβ”€β”€ 1980/cn80.zip
β”‚ β”œβ”€β”€ 2024/cn24.zip
β”‚ └── ...
β”œβ”€β”€ all-candidates/ (All candidates files)
β”‚ β”œβ”€β”€ 1980/weball80.zip
β”‚ └── 2024/weball24.zip
β”œβ”€β”€ house-senate-campaigns/ (Current campaigns)
β”‚ └── 2024/
β”‚ β”œβ”€β”€ webk24.zip
β”‚ └── webl24.zip
β”œβ”€β”€ committee-master/ (Committee master files)
β”‚ └── 2024/cm24.zip
β”œβ”€β”€ pac-summary/ (PAC summary files)
β”‚ └── 2024/pas224.zip
β”œβ”€β”€ contributions-by-individuals/ (Individual contributions)
β”‚ └── 2024/indiv24.zip
β”œβ”€β”€ candidate-committee-linkages/ (Linkages)
β”‚ └── 2024/ccl24.zip
β”œβ”€β”€ committee-to-committee/ (Committee transactions)
β”‚ └── 2024/oth24.zip
β”œβ”€β”€ operating-expenditures/ (Operating expenses)
β”‚ └── 2024/oppexp24.zip
β”œβ”€β”€ summary-reports/ (Summary CSVs)
β”‚ └── 2024/
β”‚ β”œβ”€β”€ candidate_summary_2024.csv
β”‚ β”œβ”€β”€ independent_expenditure_2024.csv
β”‚ └── ...
β”œβ”€β”€ headers/ (Data dictionaries)
β”‚ β”œβ”€β”€ cm_header_file.csv
β”‚ └── ...
└── special-files/ (Lobbyist data, etc.)
β”œβ”€β”€ lobbyist.csv
└── ...
```
### `unzip_fec_data.py` (High-Performance Edition)
Unzip all FEC bulk data files with parallel processing and 7-Zip support for maximum speed.
**Performance Modes:**
- **Parallel Processing**: 4-8x faster with `--workers 8`
- **7-Zip Extraction**: 2-3x faster than Python zipfile
- **Combined**: 10-15x faster with `--method 7z --workers 8`
**Features:**
- Multiple extraction methods (Python zipfile, 7-Zip, auto-detect)
- Parallel processing with configurable worker count
- Maintains same folder hierarchy as source
- Resume support (skip already unzipped files)
- Progress tracking and logging
- Optional: Remove ZIP files after extraction
- Filter by category or year
**Usage:**
```bash
# RECOMMENDED: Unzip latest 2 years only with 8 workers (FAST & QUICK)
python unzip_fec_data.py --latest 2 --workers 8 --base-dir /mnt/d/fec_data
# FASTEST: Use 7-Zip with 8 parallel workers (10-15x faster, all years)
python unzip_fec_data.py --method 7z --workers 8 --base-dir /mnt/d/fec_data
# Fast: Use parallel workers only (4-8x faster)
python unzip_fec_data.py --workers 8 --base-dir /mnt/d/fec_data
# Moderate: Use 7-Zip single-threaded (2-3x faster)
python unzip_fec_data.py --method 7z --base-dir /mnt/d/fec_data
# Default: Python zipfile single-threaded (portable but slow)
python unzip_fec_data.py --base-dir /mnt/d/fec_data
# Auto-detect best method and optimal workers
python unzip_fec_data.py --method auto --workers 0 --base-dir /mnt/d/fec_data
# Unzip specific category with parallel workers
python unzip_fec_data.py --category candidate-master --workers 4
# Unzip specific years with parallel workers
python unzip_fec_data.py --years 2020,2022,2024 --workers 4
# Unzip latest 5 years only (auto-detects 2020-2024)
python unzip_fec_data.py --latest 5 --workers 8
# Resume interrupted extraction
python unzip_fec_data.py --resume --workers 8
# Dry run (show what would be unzipped)
python unzip_fec_data.py --dry-run
# Remove ZIP files after successful extraction (saves 50% disk space)
python unzip_fec_data.py --remove-zips --workers 8
```
**Installation for 7-Zip (optional but recommended):**
```bash
# Ubuntu/Debian
sudo apt-get install p7zip-full
# macOS
brew install p7zip
# Verify installation
7z --help
```
**Output Structure:**
```
/mnt/d/fec_data/
β”œβ”€β”€ bulk-downloads/ # Original ZIP files (source)
β”‚ β”œβ”€β”€ candidate-master/
β”‚ β”‚ β”œβ”€β”€ 1980/cn80.zip
β”‚ β”‚ └── 2024/cn24.zip
β”‚ └── ...
└── unzipped/ # Unzipped CSV/TXT files (destination)
β”œβ”€β”€ candidate-master/
β”‚ β”œβ”€β”€ 1980/
β”‚ β”‚ β”œβ”€β”€ cn80/
β”‚ β”‚ β”‚ β”œβ”€β”€ cn.txt
β”‚ β”‚ β”‚ β”œβ”€β”€ cn_header_file.csv
β”‚ β”‚ β”‚ └── ...
β”‚ └── 2024/
β”‚ └── cn24/
β”‚ β”œβ”€β”€ cn.txt
β”‚ └── ...
β”œβ”€β”€ contributions-by-individuals/
β”‚ └── 2024/
β”‚ └── indiv24/
β”‚ β”œβ”€β”€ indiv.txt
β”‚ β”œβ”€β”€ indiv_header_file.csv
β”‚ └── ...
└── ...
```
**Workflow:**
1. Download FEC bulk data: `python bulk_download_fec.py --base-dir /mnt/d/fec_data`
2. **QUICK START** - Unzip latest 2 years only: `python unzip_fec_data.py --latest 2 --workers 8 --base-dir /mnt/d/fec_data`
- OR **FULL** - Unzip all files (FAST): `python unzip_fec_data.py --method 7z --workers 8 --base-dir /mnt/d/fec_data`
3. (Optional) Remove ZIPs to save space: Add `--remove-zips` flag to step 2
**Performance Comparison:**
| Method | Workers | Speed | Time (100 files) |
|--------|---------|-------|------------------|
| Python zipfile | 1 | 1x | ~100 min |
| Python zipfile | 8 | 4-6x | ~15-20 min |
| 7-Zip | 1 | 2-3x | ~30-40 min |
| 7-Zip | 8 | 10-15x | ~7-10 min ⚑ |
**Recommended Settings:**
- **Maximum speed**: `--method 7z --workers 8` (requires 7z installed)
- **Good balance**: `--workers 4` (no additional software needed)
- **Portable**: Default (works everywhere, no setup)
### `fec_integration.py`
Integrate FEC API data for real-time queries.
## Key Datasets
### Candidates
- Federal candidates (President, Senate, House)
- Party affiliation
- Election years
### Committees
- Campaign committees
- PACs, Super PACs
- Party committees
### Financial Data
- Individual contributions
- Committee expenditures
- Independent expenditures
- Disbursements
## Usage Examples
```bash
# Download candidate data
python fec_integration.py --data-type candidates --state MA
# Download committee data
python fec_integration.py --data-type committees --cycle 2024
```
## API Key
Requires FEC API key. Set environment variable:
```bash
export FEC_API_KEY=your_key_here
```
Register at: https://api.data.gov/signup/
## Data License
FEC data is public domain.