Spaces:
Sleeping
Sleeping
UPPSC PCS 2024 Statistical Audit
Problem
The UPPSC PCS 2024 examination results (Prelims β Mains β Final) are publicly available PDFs, but no one had run a systematic numerical analysis on the distribution of selections across roll number series. A visual scan of the data suggested an unusual concentration of final seats going to candidates with roll numbers starting with 00 and 01.
Solution
A two-script Python pipeline that extracts every roll number from the official UPPSC PDFs using regex, groups them by their first two digits (the series prefix), and tracks how each series group survived across all three exam stages. The output is a structured counts.json and a full statistical report.
Key Findings
- 00 & 01 Series: 4,927 candidates β 441 final seats (selection rate: 8.95%)
- 02β05 Series: 10,139 candidates β 492 final seats (selection rate: 4.85%)
- Candidates from
02β05were more than double in number but secured almost identical seats - Statistical excess: +136 seats above the proportional expectation for the
00 & 01group - The concentration compounded at every stage: 32.7% at prelims β 41.8% at mains β 47.3% of final selections
Features
- Fully verifiable: all scripts run directly on the official public PDFs
- Handles UPPSC-specific edge cases: 6-digit and 7-digit roll number formats, page-1 skip for mains PDF
- Outputs structured
counts.jsonwith per-stage, per-series breakdowns - Includes a detailed
report.mdwith stage-by-stage tables and statistical variance calculation - Designed for public transparency β anyone can reproduce the numbers in minutes