Spaces:
Runtime error
Raw Data Catalog
This document provides a comprehensive list of all raw data sources identified and tracked for the TYT Paper Trail project.
Core Legislative & Campaign Finance Sources
1. Voteview (UCLA)
- Description: The authoritative historical record of U.S. Congressional roll-call voting.
- Data Provided: Legislator metadata (Bioguide IDs, ICPSR codes), individual vote records (yea/nay/present), and rollcall metadata (dates, descriptions, results).
- Raw CSV Download URLs:
- HSall_members.csv: Metadata for all historical and current Members of Congress.
- HSall_votes.csv: Transactional records of every individual vote cast by members.
- HSall_rollcalls.csv: Metadata for rollcall votes (dates, descriptions, results).
- Parquet (HuggingFace): Dustinhax/tyt/tyt/voteview
HSall_members.parquet(1.5 MB) - 51K members with NOMINATE scoresHSall_rollcalls.parquet(6.6 MB) - 113K roll call votesHSall_votes.parquet(31 MB) - 26M individual vote records
2. DIME - Database on Ideology, Money in Politics, and Elections (Stanford)
- Description: A massive dataset linking campaign contributions to politician ideology and behavior.
- Data Provided: Transactional contribution records, contributor profiles, recipient metadata, and bill topic weights.
- Download URLs:
- Aggregate Recipients/Candidates (1979β2024): Metadata for all recipients in the DIME v4.0 dataset.
- Individual Donors/PACs (1979β2024): Aggregated metadata for individual and committee donors.
- Full SQLite Database (v4.0): The complete DIME v4.0 dataset in a single relational database file.
- Sparse Matrix (dime_mimp_1979_2024.rdata): A high-dimensional mapping of donor-recipient relationships, optimized for R.
Itemized Contribution Records (By Cycle)
Individual gzipped CSV files containing every itemized donation for a given two-year cycle:
View Annual Download Links (1980-2024)
- 1980 (contribDB_1980.csv.gz)
- 1982 (contribDB_1982.csv.gz)
- 1984 (contribDB_1984.csv.gz)
- 1986 (contribDB_1986.csv.gz)
- 1988 (contribDB_1988.csv.gz)
- 1990 (contribDB_1990.csv.gz)
- 1992 (contribDB_1992.csv.gz)
- 1994 (contribDB_1994.csv.gz)
- 1996 (contribDB_1996.csv.gz)
- 1998 (contribDB_1998.csv.gz)
- 2000 (contribDB_2000.csv.gz)
- 2002 (contribDB_2002.csv.gz)
- 2004 (contribDB_2004.csv.gz)
- 2006 (contribDB_2006.csv.gz)
- 2008 (contribDB_2008.csv.gz)
- 2010 (contribDB_2010.csv.gz)
- 2012 (contribDB_2012.csv.gz)
- 2014 (contribDB_2014.csv.gz)
- 2016 (contribDB_2016.csv.gz)
- 2018 (contribDB_2018.csv.gz)
- 2020 (contribDB_2020.csv.gz)
- 2022 (contribDB_2022.csv.gz)
- 2024 (contribDB_2024.csv.gz)
Contribution Records Grouped by Office
Gzipped CSV files containing every itemized donation for specific offices (1979-2024):
- President (contribDB_president.csv.gz): All itemized contributions to presidential candidates.
- Governor (contribDB_governor.csv.gz): All itemized contributions to gubernatorial candidates.
- Judicial (contribDB_judicial.csv.gz): All itemized contributions to judicial candidates (state and local).
Curated Datasets of Political Elites
Standardized datasets containing ideology scores and biographic metadata for specific elite populations:
View Curated Elite Datasets
- Federal Court Judges: Campaign finance profiles and ideology scores for federal judges (Updated 2024).
- Fortune 500 Directors and CEOs: Political contribution history and scores for top corporate leadership.
- State Supreme Court Justices: Database of state supreme court justice ideology and campaign finance.
- Executive Appointees to Federal Agencies: Data on contributions and scores for federal agency appointees.
- Medical Professionals: Curated dataset of political donations and ideology for healthcare professionals.
3. DIME PLUS (Legislative Voting Data)
- Description: A refined subset of voting data curated by Adam Bonica for alignment with DIME ideology scores (107thβ114th Congress).
- Data Provided: Individual votes linked to specific bill IDs and Bonica recipient IDs.
- Download URLs:
- vote_db.csv: Transactional roll call vote records linked to DIME recipient IDs.
- bills_db.csv: Metadata for bills and amendments including sponsor and co-sponsor lists.
- text_db.csv: Parsed text from the Congressional Record for legislative speech analysis.
4. Congressional Bills Project (CBP)
- Description: Manual coded topic classifications for U.S. congressional bills (1947β2016).
- Data Provided: Bill IDs linked to the Policy Agendas Project (PAP) taxonomy.
- Download URL: Congressional Bills Dataset (CSV): Direct download for the master bill file (1947β2016) from the Comparative Agendas Project.
5. Congress.gov (Library of Congress / GPO)
- Description: Official source for federal legislative information and parliamentary subject indexing.
- Data Provided: Granular bill status metadata, CRS subject headings, and detailed sponsor/cosponsor info (2015β2024).
- Download URL: https://www.govinfo.gov/bulkdata/BILLSTATUS: GPO portal for bulk XML downloads.
Bill Status Bulk Data Structure
The GPO provides Bill Status data in XML format, organized by Congress and bill type. Each bill type directory contains individual XML files for each bill, as well as a consolidated bulk ZIP file.
Bulk ZIP URL Pattern:
https://www.govinfo.gov/bulkdata/BILLSTATUS/{Congress}/{BillType}/BILLSTATUS-{Congress}-{BillType}.zip
https://www.govinfo.gov/bulkdata/BILLSTATUS/119/hr/BILLSTATUS-119-hr.zip
Supported Bill Types:
hr: House Billshres: House Resolutions- And other types:
hjres,hconres,s,sres,sjres,sconres.
View Bulk Data by Congress (108thβ119th)
- 119th Congress (2025β2026)
- 118th Congress (2023β2024)
- 117th Congress (2021β2022)
- 116th Congress (2019β2020)
- 115th Congress (2017β2018)
- 114th Congress (2015β2016)
- 113th Congress (2013β2014)
- 112th Congress (2011β2012)
- 111th Congress (2009β2010)
- 110th Congress (2007β2008)
- 109th Congress (2005β2006)
- 108th Congress (2003β2004)
Legislator & Entity Metadata
6. GitHub @unitedstates Project
- Description: Community-maintained repository of U.S. legislative metadata.
- Data Provided: Biographic profiles, term history, social media handles, and cross-reference IDs for current and historical Members of Congress.
- Download URLs:
- legislators-current.yaml: Metadata for all currently serving members.
- legislators-historical.yaml: Metadata for all members who have left office.
7. FEC (Federal Election Commission)
- Description: Official federal campaign finance disclosure data.
- Data Provided: Candidates, Committees, Contributions (Individuals, PACs), and Linkages.
- Download URL: https://www.fec.gov/data/browse-data/?tab=bulk-data: FEC portal for weekly-updated bulk campaign finance archives.
FEC Bulk Data Structure
The FEC provides bulk data as gzipped ZIP files organized by election cycle (two-year periods ending in even years).
Bulk Download URL Pattern:
https://www.fec.gov/files/bulk-downloads/[YEAR]/[PREFIX][YY].zip
Supported Data Types:
- Candidate Master (
cn): Basic candidate information and IDs. - Committee Master (
cm): Registered committee metadata and IDs. - PAC to Candidate Contributions (
pas2): Itemized contributions from committees to candidates. - Committee to Committee Transactions (
oth): Transfers and contributions between committees. - Individual Contributions (
indiv): Itemized receipts from individuals to committees. - Candidate-Committee Linkage (
ccl): Crosswalk linking candidates to their authorized committees.
View Bulk ZIP Links by Cycle (2012β2024)
| Cycle | Candidate Master | Committee Master | Individual Contribs | PAC to Candidate |
|---|---|---|---|---|
| 2024 | cn24.zip | cm24.zip | indiv24.zip | pas224.zip |
| 2022 | cn22.zip | cm22.zip | indiv22.zip | pas222.zip |
| 2020 | cn20.zip | cm20.zip | indiv20.zip | pas220.zip |
| 2018 | cn18.zip | cm18.zip | indiv18.zip | pas218.zip |
| 2016 | cn16.zip | cm16.zip | indiv16.zip | pas216.zip |
| 2014 | cn14.zip | cm14.zip | indiv14.zip | pas214.zip |
| 2012 | cn12.zip | cm12.zip | indiv12.zip | pas212.zip |
8. Open States
- Description: Comprehensive data on state-level legislators across all 50 states.
- Data Provided: Legislator metadata (current and historical), contact info, party affiliation, and district data.
- Download URL: https://github.com/openstates/people: Source repository for state legislator metadata files.
Open States Bulk Data Formats
Open States (now Plural Policy) provides data in three primary formats:
- YAML Repository (Full Archive): Download all as ZIP (Contains all current and historical YAML files).
- Nightly CSV Exports: Per-state CSV files updated nightly.
- PostgreSQL Database Dumps: Monthly full database snapshots available at data.openstates.org/postgres/monthly/.
CSV Download URL Pattern:
https://data.openstates.org/people/current/[ABBR].csv
View Nightly CSV Downloads by State
| State | CSV Download Link | State | CSV Download Link |
|---|---|---|---|
| Alabama | al.csv | Montana | mt.csv |
| Alaska | ak.csv | Nebraska | ne.csv |
| Arizona | az.csv | Nevada | nv.csv |
| Arkansas | ar.csv | New Hampshire | nh.csv |
| California | ca.csv | New Jersey | nj.csv |
| Colorado | co.csv | New Mexico | nm.csv |
| Connecticut | ct.csv | New York | ny.csv |
| Delaware | de.csv | North Carolina | nc.csv |
| Florida | fl.csv | North Dakota | nd.csv |
| Georgia | ga.csv | Ohio | oh.csv |
| Hawaii | hi.csv | Oklahoma | ok.csv |
| Idaho | id.csv | Oregon | or.csv |
| Illinois | il.csv | Pennsylvania | pa.csv |
| Indiana | in.csv | Rhode Island | ri.csv |
| Iowa | ia.csv | South Carolina | sc.csv |
| Kansas | ks.csv | South Dakota | sd.csv |
| Kentucky | ky.csv | Tennessee | tn.csv |
| Louisiana | la.csv | Texas | tx.csv |
| Maine | me.csv | Utah | ut.csv |
| Maryland | md.csv | Vermont | vt.csv |
| Massachusetts | ma.csv | Virginia | va.csv |
| Michigan | mi.csv | Washington | wa.csv |
| Minnesota | mn.csv | West Virginia | wv.csv |
| Mississippi | ms.csv | Wisconsin | wi.csv |
| Missouri | mo.csv | Wyoming | wy.csv |
9. United States Governors (1775β2020)
- Description: A comprehensive historical dataset of U.S. state and territorial governors, including biographical information, party affiliation, and tenure dates.
- Provider: Jacob Kaplan
- Data Provided: Governor names, state, party affiliation, and yearly served indicators (year-expanded CSV).
- Download URLs:
- Full CSV (Dataverse): Direct download for the year-expanded CSV file.
- Open ICPSR Project: Official project page for historical versions and metadata.
- Harvard Dataverse Project: Documentation and alternative access.
Citation:
Kaplan, Jacob. United States Governors 1775-2020. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-07-01. https://doi.org/10.3886/E102000V3
Supplemental Classification Sources (Entity Verification)
These specialized datasets are used to explicitly classify contribution donors as labor unions, corporations, or non-profits.
DOL (Department of Labor) - Union Disclosures
- Description: Annual financial disclosure reports (LM-2, LM-3, LM-4) filed by labor unions.
- Data Provided: Officer information, membership counts, and detailed financial receipts/disbursements.
- Download URL: OLMS Yearly Data Download: Direct manual portal for annual pipe-delimited ZIP files.
IRS (Non-profits) - EO BMF and Form 990
- Description: Metadata and financial snapshots for tax-exempt organizations.
- Data Provided: EINs (Tax IDs), organization names, and Form 990 filing indices.
- Download URLs:
- EO BMF (Northeast): eo1.csv
- EO BMF (Mid-Atlantic): eo2.csv
- EO BMF (Gulf Coast): eo3.csv
- EO BMF (Great Lakes): eo4.csv
- Form 990 Index (AWS Mirror): index.json
IRS Political (527 Organizations)
- Description: Registration and disclosure filings for 527 political organizations.
- Data Provided: Form 8871 (Registration) and Form 8872 (Contributions/Expenditures).
- Download URL: Full Bulk Data Download: A single compressed file containing all electronically filed data.
LDA (Lobbying Disclosure Act)
- Description: Federal lobbying activity reports filed with the Senate and House.
- Data Provided: Itemized political contributions (LD-203) made by lobbyists.
- Download URLs:
- Itemized Contributions API: https://lda.senate.gov/api/v1/contributions/
- Senate Bulk Filings API: https://lda.senate.gov/api/v1/filings/
SEC (Corporate) - CIK/Ticker Mappings
- Description: Official mappings between corporate entities, their SEC Central Index Keys (CIKs), and stock tickers.
- Data Provided: JSON and text-based crosswalks for major public filers.
- Download URLs:
GLEIF (Legal Entity Identifier)
- Description: Global standard for uniquely identifying legal entities participating in financial transactions.
- Data Provided: "Golden Copy" of all Legal Entity Identifiers (LEIs) and associated reference data.
- Download URL: Daily Golden Copy (CSV ZIP): The most recent daily snapshot.