A newer version of the Streamlit SDK is available: 1.58.0
title: Utility Efficiency & Rates
emoji: ⚡
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.56.0
app_file: streamlit_app.py
pinned: false
Electricity Utility Fairness Residential Rate Analysis
Research Question: Are system-level inefficiencies — high energy losses and poor load factors — statistically correlated with higher retail rates for residential consumers?
Overview
This project investigates a fundamental fairness question in the U.S. electricity sector: do residential customers end up paying more when their utility operates inefficiently? Using the CORGIS Electricity Dataset, this analysis builds a set of derived efficiency and equity metrics, then examines their statistical relationships through a suite of interactive visualizations.
New York State serves as the primary case study. NY was selected through a data-driven ranking process (get_state_variance) that scores all 50 states across five analytical criteria: number of utilities, number of ownership types, residential price standard deviation, maximum system loss percentage, and industrial revenue dependency. New York ranks at or near the top on all five: it has over 100 utilities across 6 distinct ownership models, exhibits high residential price variance, and sits within one of the most actively scrutinized regulatory environments in the U.S.
Key Metrics
| Metric | Description |
|---|---|
System Loss Percentage |
Energy lost in transmission/distribution as % of total supply |
Load Factor |
Ratio of actual energy delivered to theoretical maximum (demand efficiency) |
Residential Unit Price |
Residential rate in $/MWh |
Industrial Unit Price |
Industrial rate in $/MWh |
Price Spread |
Gap between residential and industrial rates (equity indicator) |
Visualizations
- State Selection Table — Ranking of top 10 states by analytical richness; justifies NY case study
- Price Spread Strip Plot — Residential premium over industrial rates by ownership model
- Correlation Heatmap — Statistical significance matrix across all key metrics
- Fairness Audit Scatter — System loss (%) and load factor (dual y-axis) vs. residential price by ownership model, with OLS trendline
- Rate Disparity Dumbbell — Top 10 utilities by residential–industrial price gap
- Energy Flow Sankey Diagram (State) — Per-state (or utility) breakdown of energy sources and uses as percentages
- Energy Flow Sankey Diagram (US) — National aggregate energy flow
Project Structure
.
├── utility_efficiency_fairness.ipynb
├── streamlit_app.py
├── requirements.txt
├── README.md
├── data/
│ └── electricity.py
│ └── app.py
│ └── electricity.data
│ └── electricity.data
├── src/
│ └── util/
The three modules work as a clean pipeline:
electricity— CORGIS data loader (unmodified third-party)data_util— all data preparation, filtering, and metric engineeringplot_util— all chart construction and SVG expor
util.py — Helper Functions
| Function | Purpose |
|---|---|
prepare_data(df) |
Calculates key metrics and other features from raw columns |
get_state_data(state, df) |
Filters and subsets data for a given state |
get_state_variance(df) |
Ranks states by residential price variance (for state selection) |
get_customer_utilities(df, customer) |
Filters utilities by customer type served |
get_residential_load_factor(df) |
Subset with valid load factor for residential utilities |
get_residential_sys_loss(df) |
Subset with valid loss % for residential utilities |
get_utility_usage(utility) |
Converts raw MWh values to % of total supply for Sankey |
Setup & Usage
- The dataset is pre-bundled — no external downloads required for it.
- Run
jupytr notebook utility_efficiency_fairness.ipynb. - Run
pip install pandas plotly scipy kaleido streamlit
Data Source
CORGIS Electricity Dataset — a cleaned and structured snapshot of U.S. Energy Information Administration (EIA) Form 861 data, covering utility-level electricity generation, sales, revenues, and customer counts across all U.S. states.
Potential Extensions
- Expand the analysis to all 50 states and compare regional patterns
- Incorporate time-series data to track changes in efficiency and pricing
- Apply regression modeling to control for utility size and customer density
- Explore the role of renewable energy mix on system loss rates