--- title: Utility Efficiency & Rates emoji: ⚡ colorFrom: blue colorTo: yellow sdk: streamlit sdk_version: "1.56.0" app_file: streamlit_app.py pinned: false --- # Electricity Utility Fairness Residential Rate Analysis > **Research Question:** Are system-level inefficiencies — high energy losses and poor load factors — statistically correlated with higher retail rates for residential consumers? --- ## Overview This project investigates a fundamental fairness question in the U.S. electricity sector: do residential customers end up paying more when their utility operates inefficiently? Using the [CORGIS Electricity Dataset](https://corgis-edu.github.io/corgis/), this analysis builds a set of derived efficiency and equity metrics, then examines their statistical relationships through a suite of interactive visualizations. New York State serves as the primary case study. NY was selected through a data-driven ranking process (get_state_variance) that scores all 50 states across five analytical criteria: number of utilities, number of ownership types, residential price standard deviation, maximum system loss percentage, and industrial revenue dependency. New York ranks at or near the top on all five: it has over 100 utilities across 6 distinct ownership models, exhibits high residential price variance, and sits within one of the most actively scrutinized regulatory environments in the U.S. [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://your-app-url.streamlit.app) --- ## Key Metrics | Metric | Description | |---|---| | `System Loss Percentage` | Energy lost in transmission/distribution as % of total supply | | `Load Factor` | Ratio of actual energy delivered to theoretical maximum (demand efficiency) | | `Residential Unit Price` | Residential rate in $/MWh | | `Industrial Unit Price` | Industrial rate in $/MWh | | `Price Spread` | Gap between residential and industrial rates (equity indicator) | --- ## Visualizations - **State Selection Table** — Ranking of top 10 states by analytical richness; justifies NY case study - **Price Spread Strip Plot** — Residential premium over industrial rates by ownership model - **Correlation Heatmap** — Statistical significance matrix across all key metrics - **Fairness Audit Scatter** — System loss (%) and load factor (dual y-axis) vs. residential price by ownership model, with OLS trendline - **Rate Disparity Dumbbell** — Top 10 utilities by residential–industrial price gap - **Energy Flow Sankey Diagram (State)** — Per-state (or utility) breakdown of energy sources and uses as percentages - **Energy Flow Sankey Diagram (US)** — National aggregate energy flow --- ## Project Structure ``` . ├── utility_efficiency_fairness.ipynb ├── streamlit_app.py ├── requirements.txt ├── README.md ├── data/ │ └── electricity.py │ └── app.py │ └── electricity.data │ └── electricity.data ├── src/ │ └── util/ ``` The three modules work as a clean pipeline: - **`electricity`** — CORGIS data loader (unmodified third-party) - **`data_util`** — all data preparation, filtering, and metric engineering - **`plot_util`** — all chart construction and SVG expor ### `util.py` — Helper Functions | Function | Purpose | |---|---| | `prepare_data(df)` | Calculates key metrics and other features from raw columns | | `get_state_data(state, df)` | Filters and subsets data for a given state | | `get_state_variance(df)` | Ranks states by residential price variance (for state selection) | | `get_customer_utilities(df, customer)` | Filters utilities by customer type served | | `get_residential_load_factor(df)` | Subset with valid load factor for residential utilities | | `get_residential_sys_loss(df)` | Subset with valid loss % for residential utilities | | `get_utility_usage(utility)` | Converts raw MWh values to % of total supply for Sankey | --- ## Setup & Usage - The dataset is pre-bundled — no external downloads required for it. - Run `jupytr notebook utility_efficiency_fairness.ipynb.` - Run `pip install pandas plotly scipy kaleido streamlit` --- ## Data Source **[CORGIS Electricity Dataset](https://corgis-edu.github.io/corgis/python/electricity/)** — a cleaned and structured snapshot of U.S. Energy Information Administration (EIA) [Form 861](https://www.eia.gov/electricity/data/eia861/) data, covering utility-level electricity generation, sales, revenues, and customer counts across all U.S. states. --- ## Potential Extensions - Expand the analysis to all 50 states and compare regional patterns - Incorporate time-series data to track changes in efficiency and pricing - Apply regression modeling to control for utility size and customer density - Explore the role of renewable energy mix on system loss rates