cmp / dataset_bundle /README.md
cjc0013's picture
Upload 30 files
bfdd027 verified
# Congress Public Records Slice
A neutral, review-oriented slice of House public-record linkages across financial disclosures, sector overlap, and community project funding recipient relationships.
## Key Counts
- Members: `438`
- Scored events: `3918`
- Public graph links: `7765`
- Recipient relationship links: `5367`
- Sector relationship links: `2398`
- Source artifacts in the public audit index: `48591`
## Required Caveats
- This release is a slice of public-record data, not a complete accounting of all potentially relevant data.
- Future releases may update or expand this slice as source recovery, parsing, and evidence linkage improve.
- This release does not assign guilt, wrongdoing, intent, or causality to any person or organization.
- The release shows public-record overlaps, timing, and linkage strength, not proof of illegality or corruption.
- Some rows remain review-tier or include unresolved official source references and should be read with those labels in mind.
- The public package includes verification summaries and SHA-backed artifact indexes, but it does not include the full internal raw corpus, so external verification is bounded by what is published here.
## Current Review Notes
- Recipient links still marked `needs_review`: `154`
- True parse failures still present in the source slice: `45`
- Source-unavailable rows still present in the source slice: `0`
- Public-facing source URLs are limited to stable artifact links; unresolved or unavailable refs remain represented by counts and labels.
## Included Public Files
- `members.csv`
- `scored_events.csv`
- `graph_links.csv`
- `recipient_link_quality_report.json`
- `source_quality_report.json`
- `provenance_coverage_report.json`
- `sample_cases.json`
- `network_graph/nodes.csv`
- `network_graph/edges.csv`
- `network_graph/graph_config.json`
- `evidence_audit/source_artifact_index.csv`
- `evidence_audit/scored_event_index.csv`
- `evidence_audit/scored_event_provenance.jsonl`
- `evidence_audit/claim_supporting_index.csv`
- `evidence_audit/claim_supporting_provenance.jsonl`
- `evidence_audit/consistency_report.json`
## Hugging Face Publishing Shape
- Dataset repo id: `cjc0013/cmp-data`
- Space repo id: `cjc0013/cmp`
This release is a slice of public-record data and may be updated in future releases.