Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| # FEC Campaign Finance Scripts | |
| Scripts for working with [Federal Election Commission](https://www.fec.gov/) campaign finance data. | |
| ## Data Source | |
| - **Website**: https://www.fec.gov/ | |
| - **API**: https://api.open.fec.gov/developers/ | |
| - **Bulk Data**: https://www.fec.gov/data/browse-data/?tab=bulk-data | |
| - **Coverage**: Federal candidates, committees, contributions | |
| - **Data Types**: Candidates, committees, contributions, expenditures, filings | |
| ## Scripts | |
| ### `bulk_download_fec.py` | |
| Download all FEC bulk data files and organize them by year and type. | |
| **Features:** | |
| - Downloads all bulk data from 1980-present | |
| - Organizes files matching FEC website structure | |
| - Resume interrupted downloads | |
| - Progress tracking and logging | |
| - Filter by year or file type | |
| **Usage:** | |
| ```bash | |
| # Download everything to D:/fec_data/ | |
| python bulk_download_fec.py | |
| # Download to custom directory | |
| python bulk_download_fec.py --base-dir /mnt/d/fec_data | |
| # Download specific years only | |
| python bulk_download_fec.py --years 2020,2022,2024 | |
| # Download specific file types only | |
| python bulk_download_fec.py --types indiv,cn,cm | |
| # Resume interrupted download | |
| python bulk_download_fec.py --resume | |
| # Dry run (show what would be downloaded) | |
| python bulk_download_fec.py --dry-run | |
| ``` | |
| **File Types:** | |
| - `cm` - Committee Master files | |
| - `cn` - Candidate Master files | |
| - `ccl` - Candidate-Committee Linkages | |
| - `indiv` - Individual Contributions | |
| - `pas2` - PAC Summary files | |
| - `oth` - Other Transactions | |
| - `oppexp` - Operating Expenditures | |
| - `weball` - All Candidates | |
| - `webk` / `webl` - Current House/Senate Campaigns | |
| **Output Structure:** | |
| ``` | |
| /mnt/d/fec_data/ | |
| βββ bulk-downloads/ | |
| βββ candidate-master/ (Candidate master files) | |
| β βββ 1980/cn80.zip | |
| β βββ 2024/cn24.zip | |
| β βββ ... | |
| βββ all-candidates/ (All candidates files) | |
| β βββ 1980/weball80.zip | |
| β βββ 2024/weball24.zip | |
| βββ house-senate-campaigns/ (Current campaigns) | |
| β βββ 2024/ | |
| β βββ webk24.zip | |
| β βββ webl24.zip | |
| βββ committee-master/ (Committee master files) | |
| β βββ 2024/cm24.zip | |
| βββ pac-summary/ (PAC summary files) | |
| β βββ 2024/pas224.zip | |
| βββ contributions-by-individuals/ (Individual contributions) | |
| β βββ 2024/indiv24.zip | |
| βββ candidate-committee-linkages/ (Linkages) | |
| β βββ 2024/ccl24.zip | |
| βββ committee-to-committee/ (Committee transactions) | |
| β βββ 2024/oth24.zip | |
| βββ operating-expenditures/ (Operating expenses) | |
| β βββ 2024/oppexp24.zip | |
| βββ summary-reports/ (Summary CSVs) | |
| β βββ 2024/ | |
| β βββ candidate_summary_2024.csv | |
| β βββ independent_expenditure_2024.csv | |
| β βββ ... | |
| βββ headers/ (Data dictionaries) | |
| β βββ cm_header_file.csv | |
| β βββ ... | |
| βββ special-files/ (Lobbyist data, etc.) | |
| βββ lobbyist.csv | |
| βββ ... | |
| ``` | |
| ### `unzip_fec_data.py` (High-Performance Edition) | |
| Unzip all FEC bulk data files with parallel processing and 7-Zip support for maximum speed. | |
| **Performance Modes:** | |
| - **Parallel Processing**: 4-8x faster with `--workers 8` | |
| - **7-Zip Extraction**: 2-3x faster than Python zipfile | |
| - **Combined**: 10-15x faster with `--method 7z --workers 8` | |
| **Features:** | |
| - Multiple extraction methods (Python zipfile, 7-Zip, auto-detect) | |
| - Parallel processing with configurable worker count | |
| - Maintains same folder hierarchy as source | |
| - Resume support (skip already unzipped files) | |
| - Progress tracking and logging | |
| - Optional: Remove ZIP files after extraction | |
| - Filter by category or year | |
| **Usage:** | |
| ```bash | |
| # RECOMMENDED: Unzip latest 2 years only with 8 workers (FAST & QUICK) | |
| python unzip_fec_data.py --latest 2 --workers 8 --base-dir /mnt/d/fec_data | |
| # FASTEST: Use 7-Zip with 8 parallel workers (10-15x faster, all years) | |
| python unzip_fec_data.py --method 7z --workers 8 --base-dir /mnt/d/fec_data | |
| # Fast: Use parallel workers only (4-8x faster) | |
| python unzip_fec_data.py --workers 8 --base-dir /mnt/d/fec_data | |
| # Moderate: Use 7-Zip single-threaded (2-3x faster) | |
| python unzip_fec_data.py --method 7z --base-dir /mnt/d/fec_data | |
| # Default: Python zipfile single-threaded (portable but slow) | |
| python unzip_fec_data.py --base-dir /mnt/d/fec_data | |
| # Auto-detect best method and optimal workers | |
| python unzip_fec_data.py --method auto --workers 0 --base-dir /mnt/d/fec_data | |
| # Unzip specific category with parallel workers | |
| python unzip_fec_data.py --category candidate-master --workers 4 | |
| # Unzip specific years with parallel workers | |
| python unzip_fec_data.py --years 2020,2022,2024 --workers 4 | |
| # Unzip latest 5 years only (auto-detects 2020-2024) | |
| python unzip_fec_data.py --latest 5 --workers 8 | |
| # Resume interrupted extraction | |
| python unzip_fec_data.py --resume --workers 8 | |
| # Dry run (show what would be unzipped) | |
| python unzip_fec_data.py --dry-run | |
| # Remove ZIP files after successful extraction (saves 50% disk space) | |
| python unzip_fec_data.py --remove-zips --workers 8 | |
| ``` | |
| **Installation for 7-Zip (optional but recommended):** | |
| ```bash | |
| # Ubuntu/Debian | |
| sudo apt-get install p7zip-full | |
| # macOS | |
| brew install p7zip | |
| # Verify installation | |
| 7z --help | |
| ``` | |
| **Output Structure:** | |
| ``` | |
| /mnt/d/fec_data/ | |
| βββ bulk-downloads/ # Original ZIP files (source) | |
| β βββ candidate-master/ | |
| β β βββ 1980/cn80.zip | |
| β β βββ 2024/cn24.zip | |
| β βββ ... | |
| βββ unzipped/ # Unzipped CSV/TXT files (destination) | |
| βββ candidate-master/ | |
| β βββ 1980/ | |
| β β βββ cn80/ | |
| β β β βββ cn.txt | |
| β β β βββ cn_header_file.csv | |
| β β β βββ ... | |
| β βββ 2024/ | |
| β βββ cn24/ | |
| β βββ cn.txt | |
| β βββ ... | |
| βββ contributions-by-individuals/ | |
| β βββ 2024/ | |
| β βββ indiv24/ | |
| β βββ indiv.txt | |
| β βββ indiv_header_file.csv | |
| β βββ ... | |
| βββ ... | |
| ``` | |
| **Workflow:** | |
| 1. Download FEC bulk data: `python bulk_download_fec.py --base-dir /mnt/d/fec_data` | |
| 2. **QUICK START** - Unzip latest 2 years only: `python unzip_fec_data.py --latest 2 --workers 8 --base-dir /mnt/d/fec_data` | |
| - OR **FULL** - Unzip all files (FAST): `python unzip_fec_data.py --method 7z --workers 8 --base-dir /mnt/d/fec_data` | |
| 3. (Optional) Remove ZIPs to save space: Add `--remove-zips` flag to step 2 | |
| **Performance Comparison:** | |
| | Method | Workers | Speed | Time (100 files) | | |
| |--------|---------|-------|------------------| | |
| | Python zipfile | 1 | 1x | ~100 min | | |
| | Python zipfile | 8 | 4-6x | ~15-20 min | | |
| | 7-Zip | 1 | 2-3x | ~30-40 min | | |
| | 7-Zip | 8 | 10-15x | ~7-10 min β‘ | | |
| **Recommended Settings:** | |
| - **Maximum speed**: `--method 7z --workers 8` (requires 7z installed) | |
| - **Good balance**: `--workers 4` (no additional software needed) | |
| - **Portable**: Default (works everywhere, no setup) | |
| ### `fec_integration.py` | |
| Integrate FEC API data for real-time queries. | |
| ## Key Datasets | |
| ### Candidates | |
| - Federal candidates (President, Senate, House) | |
| - Party affiliation | |
| - Election years | |
| ### Committees | |
| - Campaign committees | |
| - PACs, Super PACs | |
| - Party committees | |
| ### Financial Data | |
| - Individual contributions | |
| - Committee expenditures | |
| - Independent expenditures | |
| - Disbursements | |
| ## Usage Examples | |
| ```bash | |
| # Download candidate data | |
| python fec_integration.py --data-type candidates --state MA | |
| # Download committee data | |
| python fec_integration.py --data-type committees --cycle 2024 | |
| ``` | |
| ## API Key | |
| Requires FEC API key. Set environment variable: | |
| ```bash | |
| export FEC_API_KEY=your_key_here | |
| ``` | |
| Register at: https://api.data.gov/signup/ | |
| ## Data License | |
| FEC data is public domain. | |