paper-trail-api / docs /RAW_DATA_CATALOG.md
Hoe
Deploying Backend API
b339b93

Raw Data Catalog

This document provides a comprehensive list of all raw data sources identified and tracked for the TYT Paper Trail project.

Core Legislative & Campaign Finance Sources

1. Voteview (UCLA)

  • Description: The authoritative historical record of U.S. Congressional roll-call voting.
  • Data Provided: Legislator metadata (Bioguide IDs, ICPSR codes), individual vote records (yea/nay/present), and rollcall metadata (dates, descriptions, results).
  • Raw CSV Download URLs:
  • Parquet (HuggingFace): Dustinhax/tyt/tyt/voteview
    • HSall_members.parquet (1.5 MB) - 51K members with NOMINATE scores
    • HSall_rollcalls.parquet (6.6 MB) - 113K roll call votes
    • HSall_votes.parquet (31 MB) - 26M individual vote records

2. DIME - Database on Ideology, Money in Politics, and Elections (Stanford)

Itemized Contribution Records (By Cycle)

Individual gzipped CSV files containing every itemized donation for a given two-year cycle:

View Annual Download Links (1980-2024)

Contribution Records Grouped by Office

Gzipped CSV files containing every itemized donation for specific offices (1979-2024):

Curated Datasets of Political Elites

Standardized datasets containing ideology scores and biographic metadata for specific elite populations:

View Curated Elite Datasets

3. DIME PLUS (Legislative Voting Data)

  • Description: A refined subset of voting data curated by Adam Bonica for alignment with DIME ideology scores (107th–114th Congress).
  • Data Provided: Individual votes linked to specific bill IDs and Bonica recipient IDs.
  • Download URLs:
    • vote_db.csv: Transactional roll call vote records linked to DIME recipient IDs.
    • bills_db.csv: Metadata for bills and amendments including sponsor and co-sponsor lists.
    • text_db.csv: Parsed text from the Congressional Record for legislative speech analysis.

4. Congressional Bills Project (CBP)

  • Description: Manual coded topic classifications for U.S. congressional bills (1947–2016).
  • Data Provided: Bill IDs linked to the Policy Agendas Project (PAP) taxonomy.
  • Download URL: Congressional Bills Dataset (CSV): Direct download for the master bill file (1947–2016) from the Comparative Agendas Project.

5. Congress.gov (Library of Congress / GPO)

  • Description: Official source for federal legislative information and parliamentary subject indexing.
  • Data Provided: Granular bill status metadata, CRS subject headings, and detailed sponsor/cosponsor info (2015–2024).
  • Download URL: https://www.govinfo.gov/bulkdata/BILLSTATUS: GPO portal for bulk XML downloads.

Bill Status Bulk Data Structure

The GPO provides Bill Status data in XML format, organized by Congress and bill type. Each bill type directory contains individual XML files for each bill, as well as a consolidated bulk ZIP file.

Bulk ZIP URL Pattern: https://www.govinfo.gov/bulkdata/BILLSTATUS/{Congress}/{BillType}/BILLSTATUS-{Congress}-{BillType}.zip

https://www.govinfo.gov/bulkdata/BILLSTATUS/119/hr/BILLSTATUS-119-hr.zip

Supported Bill Types:

  • hr: House Bills
  • hres: House Resolutions
  • And other types: hjres, hconres, s, sres, sjres, sconres.
View Bulk Data by Congress (108th–119th)

Legislator & Entity Metadata

6. GitHub @unitedstates Project

  • Description: Community-maintained repository of U.S. legislative metadata.
  • Data Provided: Biographic profiles, term history, social media handles, and cross-reference IDs for current and historical Members of Congress.
  • Download URLs:

7. FEC (Federal Election Commission)

  • Description: Official federal campaign finance disclosure data.
  • Data Provided: Candidates, Committees, Contributions (Individuals, PACs), and Linkages.
  • Download URL: https://www.fec.gov/data/browse-data/?tab=bulk-data: FEC portal for weekly-updated bulk campaign finance archives.

FEC Bulk Data Structure

The FEC provides bulk data as gzipped ZIP files organized by election cycle (two-year periods ending in even years).

Bulk Download URL Pattern: https://www.fec.gov/files/bulk-downloads/[YEAR]/[PREFIX][YY].zip

Supported Data Types:

  • Candidate Master (cn): Basic candidate information and IDs.
  • Committee Master (cm): Registered committee metadata and IDs.
  • PAC to Candidate Contributions (pas2): Itemized contributions from committees to candidates.
  • Committee to Committee Transactions (oth): Transfers and contributions between committees.
  • Individual Contributions (indiv): Itemized receipts from individuals to committees.
  • Candidate-Committee Linkage (ccl): Crosswalk linking candidates to their authorized committees.
View Bulk ZIP Links by Cycle (2012–2024)

8. Open States

  • Description: Comprehensive data on state-level legislators across all 50 states.
  • Data Provided: Legislator metadata (current and historical), contact info, party affiliation, and district data.
  • Download URL: https://github.com/openstates/people: Source repository for state legislator metadata files.

Open States Bulk Data Formats

Open States (now Plural Policy) provides data in three primary formats:

  1. YAML Repository (Full Archive): Download all as ZIP (Contains all current and historical YAML files).
  2. Nightly CSV Exports: Per-state CSV files updated nightly.
  3. PostgreSQL Database Dumps: Monthly full database snapshots available at data.openstates.org/postgres/monthly/.

CSV Download URL Pattern: https://data.openstates.org/people/current/[ABBR].csv

View Nightly CSV Downloads by State
State CSV Download Link State CSV Download Link
Alabama al.csv Montana mt.csv
Alaska ak.csv Nebraska ne.csv
Arizona az.csv Nevada nv.csv
Arkansas ar.csv New Hampshire nh.csv
California ca.csv New Jersey nj.csv
Colorado co.csv New Mexico nm.csv
Connecticut ct.csv New York ny.csv
Delaware de.csv North Carolina nc.csv
Florida fl.csv North Dakota nd.csv
Georgia ga.csv Ohio oh.csv
Hawaii hi.csv Oklahoma ok.csv
Idaho id.csv Oregon or.csv
Illinois il.csv Pennsylvania pa.csv
Indiana in.csv Rhode Island ri.csv
Iowa ia.csv South Carolina sc.csv
Kansas ks.csv South Dakota sd.csv
Kentucky ky.csv Tennessee tn.csv
Louisiana la.csv Texas tx.csv
Maine me.csv Utah ut.csv
Maryland md.csv Vermont vt.csv
Massachusetts ma.csv Virginia va.csv
Michigan mi.csv Washington wa.csv
Minnesota mn.csv West Virginia wv.csv
Mississippi ms.csv Wisconsin wi.csv
Missouri mo.csv Wyoming wy.csv

9. United States Governors (1775–2020)

  • Description: A comprehensive historical dataset of U.S. state and territorial governors, including biographical information, party affiliation, and tenure dates.
  • Provider: Jacob Kaplan
  • Data Provided: Governor names, state, party affiliation, and yearly served indicators (year-expanded CSV).
  • Download URLs:

Citation:

Kaplan, Jacob. United States Governors 1775-2020. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-07-01. https://doi.org/10.3886/E102000V3


Supplemental Classification Sources (Entity Verification)

These specialized datasets are used to explicitly classify contribution donors as labor unions, corporations, or non-profits.

DOL (Department of Labor) - Union Disclosures

  • Description: Annual financial disclosure reports (LM-2, LM-3, LM-4) filed by labor unions.
  • Data Provided: Officer information, membership counts, and detailed financial receipts/disbursements.
  • Download URL: OLMS Yearly Data Download: Direct manual portal for annual pipe-delimited ZIP files.

IRS (Non-profits) - EO BMF and Form 990

  • Description: Metadata and financial snapshots for tax-exempt organizations.
  • Data Provided: EINs (Tax IDs), organization names, and Form 990 filing indices.
  • Download URLs:

IRS Political (527 Organizations)

  • Description: Registration and disclosure filings for 527 political organizations.
  • Data Provided: Form 8871 (Registration) and Form 8872 (Contributions/Expenditures).
  • Download URL: Full Bulk Data Download: A single compressed file containing all electronically filed data.

LDA (Lobbying Disclosure Act)

SEC (Corporate) - CIK/Ticker Mappings

  • Description: Official mappings between corporate entities, their SEC Central Index Keys (CIKs), and stock tickers.
  • Data Provided: JSON and text-based crosswalks for major public filers.
  • Download URLs:

GLEIF (Legal Entity Identifier)

  • Description: Global standard for uniquely identifying legal entities participating in financial transactions.
  • Data Provided: "Golden Copy" of all Legal Entity Identifiers (LEIs) and associated reference data.
  • Download URL: Daily Golden Copy (CSV ZIP): The most recent daily snapshot.