UK EPC Rating Predictor
A LightGBM gradient-boosted tree model that predicts residential Energy Performance Certificate (EPC) ratings for properties in England and Wales.
Given property characteristics a homeowner already knows (wall type, heating system, floor area, age band, etc.), the model predicts:
- A numeric SAP 2012 efficiency score (1–100)
- A letter grade (A–G)
Model details
| Detail | Value |
|---|---|
| Algorithm | LightGBM (gradient-boosted trees) |
| Objective | MAE regression (regression_l1) |
| Trees | 5,000 |
| Leaves per tree | up to 857 |
| Features | 40 |
| Training rows | 19,279,916 |
| Test rows | 4,045,192 |
| MAE (test set) | 3.09 SAP points |
| Exact grade accuracy | 77.4% (calibrated) |
| Within-1-band accuracy | 98.7% |
Training data
Trained on the public EPC register maintained by MHCLG, covering all domestic EPC assessments lodged in England and Wales from 2012 to 2023.
- Source: https://get-energy-performance-data.communities.gov.uk/
- Train split: 2012–2023 (19.3M records)
- Test split: 2024–2026 (4.0M records) — time-based split to simulate real deployment
Features
The model uses 40 features across four categories:
- Component efficiency ratings (9): walls, roof, floor, windows, main heating, heating controls, hot water, lighting, secondary heating
- Binary flags (5): mains gas, solar water heating, solar PV, low energy lighting, flat top storey
- Numeric (10): floor area, room counts, floor level, storey count, glazing proportion, age band, etc.
- Categorical (16): property type, built form, fuel type, heating system description, wall/roof/floor descriptions, etc.
Assessor-only fields not available to homeowners (transaction type, floor height, lighting outlet counts) are excluded from the feature set.
Usage
import lightgbm as lgb
import json
import numpy as np
# Load model
booster = lgb.Booster(model_file="lgbm_epc.txt")
meta = json.loads(open("feature_meta.json").read())
# See the full inference pipeline at:
# https://github.com/kulbinderdio/uk-epc-model/blob/main/src/model/predict.py
The full inference pipeline (including categorical encoding and grade threshold calibration) is in src/model/predict.py in the GitHub repository.
Grade calibration
After training, grade boundaries are optimised using Nelder-Mead minimisation on the first 100K test rows. Calibrated boundaries (vs SAP 2012 standard):
| Boundary | SAP standard | Calibrated |
|---|---|---|
| G/F | 21.0 | 22.3 |
| F/E | 39.0 | 38.3 |
| E/D | 55.0 | 53.7 |
| D/C | 69.0 | 68.0 |
| C/B | 81.0 | 80.1 |
| B/A | 92.0 | 91.1 |
Calibrated thresholds are stored in feature_meta.json.
Accuracy by property type
| Type | MAE | Exact grade accuracy |
|---|---|---|
| House | 2.99 | 75.3% |
| Flat | 3.04 | 74.6% |
| Maisonette | 3.13 | 75.3% |
| Park home | 3.84 | 76.8% |
| Bungalow | 3.92 | 70.0% |
Top features (by gain)
walls_description— wall construction and insulation typeconstruction_age_band— decade the property was builtfloor_description— floor construction and insulationtotal_floor_area— property size in m²roof_description— roof type and insulation level
Limitations
- Predictions are estimates only — not a substitute for an official EPC from an accredited assessor
- Higher uncertainty near grade boundaries (±3 SAP points)
- Bungalows have lower accuracy (70%) due to higher variance in insulation setups
- Model trained on assessor-submitted data; self-reported inputs add a further layer of uncertainty
Repository
Full source code, training pipeline, API, and web frontend:
https://github.com/kulbinderdio/uk-epc-model
License
MIT
Evaluation results
- MAE (SAP score, test set 2024–2026)self-reported3.090