cmp / dataset_bundle /README.md
cjc0013's picture
Upload 30 files
bfdd027 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Congress Public Records Slice

A neutral, review-oriented slice of House public-record linkages across financial disclosures, sector overlap, and community project funding recipient relationships.

Key Counts

  • Members: 438
  • Scored events: 3918
  • Public graph links: 7765
  • Recipient relationship links: 5367
  • Sector relationship links: 2398
  • Source artifacts in the public audit index: 48591

Required Caveats

  • This release is a slice of public-record data, not a complete accounting of all potentially relevant data.
  • Future releases may update or expand this slice as source recovery, parsing, and evidence linkage improve.
  • This release does not assign guilt, wrongdoing, intent, or causality to any person or organization.
  • The release shows public-record overlaps, timing, and linkage strength, not proof of illegality or corruption.
  • Some rows remain review-tier or include unresolved official source references and should be read with those labels in mind.
  • The public package includes verification summaries and SHA-backed artifact indexes, but it does not include the full internal raw corpus, so external verification is bounded by what is published here.

Current Review Notes

  • Recipient links still marked needs_review: 154
  • True parse failures still present in the source slice: 45
  • Source-unavailable rows still present in the source slice: 0
  • Public-facing source URLs are limited to stable artifact links; unresolved or unavailable refs remain represented by counts and labels.

Included Public Files

  • members.csv
  • scored_events.csv
  • graph_links.csv
  • recipient_link_quality_report.json
  • source_quality_report.json
  • provenance_coverage_report.json
  • sample_cases.json
  • network_graph/nodes.csv
  • network_graph/edges.csv
  • network_graph/graph_config.json
  • evidence_audit/source_artifact_index.csv
  • evidence_audit/scored_event_index.csv
  • evidence_audit/scored_event_provenance.jsonl
  • evidence_audit/claim_supporting_index.csv
  • evidence_audit/claim_supporting_provenance.jsonl
  • evidence_audit/consistency_report.json

Hugging Face Publishing Shape

  • Dataset repo id: cjc0013/cmp-data
  • Space repo id: cjc0013/cmp

This release is a slice of public-record data and may be updated in future releases.