uk-epc-model / README.md
kulbinderdio's picture
Upload README.md with huggingface_hub
aac46c1 verified
---
language:
- en
license: mit
tags:
- lightgbm
- tabular-regression
- energy
- epc
- uk
- property
datasets: []
metrics:
- mae
model-index:
- name: uk-epc-model
results:
- task:
type: tabular-regression
metrics:
- type: mae
value: 3.09
name: MAE (SAP score, test set 2024–2026)
---
# UK EPC Rating Predictor
A LightGBM gradient-boosted tree model that predicts residential Energy Performance Certificate (EPC) ratings for properties in England and Wales.
Given property characteristics a homeowner already knows (wall type, heating system, floor area, age band, etc.), the model predicts:
- A numeric SAP 2012 efficiency score (1–100)
- A letter grade (A–G)
## Model details
| Detail | Value |
|--------|-------|
| Algorithm | LightGBM (gradient-boosted trees) |
| Objective | MAE regression (`regression_l1`) |
| Trees | 5,000 |
| Leaves per tree | up to 857 |
| Features | 40 |
| Training rows | 19,279,916 |
| Test rows | 4,045,192 |
| MAE (test set) | 3.09 SAP points |
| Exact grade accuracy | 77.4% (calibrated) |
| Within-1-band accuracy | 98.7% |
## Training data
Trained on the public EPC register maintained by MHCLG, covering all domestic EPC assessments lodged in England and Wales from 2012 to 2023.
- **Source:** https://get-energy-performance-data.communities.gov.uk/
- **Train split:** 2012–2023 (19.3M records)
- **Test split:** 2024–2026 (4.0M records) — time-based split to simulate real deployment
## Features
The model uses 40 features across four categories:
- **Component efficiency ratings** (9): walls, roof, floor, windows, main heating, heating controls, hot water, lighting, secondary heating
- **Binary flags** (5): mains gas, solar water heating, solar PV, low energy lighting, flat top storey
- **Numeric** (10): floor area, room counts, floor level, storey count, glazing proportion, age band, etc.
- **Categorical** (16): property type, built form, fuel type, heating system description, wall/roof/floor descriptions, etc.
Assessor-only fields not available to homeowners (transaction type, floor height, lighting outlet counts) are excluded from the feature set.
## Usage
```python
import lightgbm as lgb
import json
import numpy as np
# Load model
booster = lgb.Booster(model_file="lgbm_epc.txt")
meta = json.loads(open("feature_meta.json").read())
# See the full inference pipeline at:
# https://github.com/kulbinderdio/uk-epc-model/blob/main/src/model/predict.py
```
The full inference pipeline (including categorical encoding and grade threshold calibration) is in [`src/model/predict.py`](https://github.com/kulbinderdio/uk-epc-model/blob/main/src/model/predict.py) in the GitHub repository.
## Grade calibration
After training, grade boundaries are optimised using Nelder-Mead minimisation on the first 100K test rows. Calibrated boundaries (vs SAP 2012 standard):
| Boundary | SAP standard | Calibrated |
|----------|-------------|------------|
| G/F | 21.0 | 22.3 |
| F/E | 39.0 | 38.3 |
| E/D | 55.0 | 53.7 |
| D/C | 69.0 | 68.0 |
| C/B | 81.0 | 80.1 |
| B/A | 92.0 | 91.1 |
Calibrated thresholds are stored in `feature_meta.json`.
## Accuracy by property type
| Type | MAE | Exact grade accuracy |
|------|-----|----------------------|
| House | 2.99 | 75.3% |
| Flat | 3.04 | 74.6% |
| Maisonette | 3.13 | 75.3% |
| Park home | 3.84 | 76.8% |
| Bungalow | 3.92 | 70.0% |
## Top features (by gain)
1. `walls_description` — wall construction and insulation type
2. `construction_age_band` — decade the property was built
3. `floor_description` — floor construction and insulation
4. `total_floor_area` — property size in m²
5. `roof_description` — roof type and insulation level
## Limitations
- Predictions are estimates only — not a substitute for an official EPC from an accredited assessor
- Higher uncertainty near grade boundaries (±3 SAP points)
- Bungalows have lower accuracy (70%) due to higher variance in insulation setups
- Model trained on assessor-submitted data; self-reported inputs add a further layer of uncertainty
## Repository
Full source code, training pipeline, API, and web frontend:
https://github.com/kulbinderdio/uk-epc-model
## License
MIT