UK EPC Rating Predictor

A LightGBM gradient-boosted tree model that predicts residential Energy Performance Certificate (EPC) ratings for properties in England and Wales.

Given property characteristics a homeowner already knows (wall type, heating system, floor area, age band, etc.), the model predicts:

A numeric SAP 2012 efficiency score (1–100)
A letter grade (A–G)

Model details

Detail	Value
Algorithm	LightGBM (gradient-boosted trees)
Objective	MAE regression (`regression_l1`)
Trees	5,000
Leaves per tree	up to 857
Features	40
Training rows	19,279,916
Test rows	4,045,192
MAE (test set)	3.09 SAP points
Exact grade accuracy	77.4% (calibrated)
Within-1-band accuracy	98.7%

Training data

Trained on the public EPC register maintained by MHCLG, covering all domestic EPC assessments lodged in England and Wales from 2012 to 2023.

Source: https://get-energy-performance-data.communities.gov.uk/
Train split: 2012–2023 (19.3M records)
Test split: 2024–2026 (4.0M records) — time-based split to simulate real deployment

Features

The model uses 40 features across four categories:

Component efficiency ratings (9): walls, roof, floor, windows, main heating, heating controls, hot water, lighting, secondary heating
Binary flags (5): mains gas, solar water heating, solar PV, low energy lighting, flat top storey
Numeric (10): floor area, room counts, floor level, storey count, glazing proportion, age band, etc.
Categorical (16): property type, built form, fuel type, heating system description, wall/roof/floor descriptions, etc.

Assessor-only fields not available to homeowners (transaction type, floor height, lighting outlet counts) are excluded from the feature set.

Usage

import lightgbm as lgb
import json
import numpy as np

# Load model
booster = lgb.Booster(model_file="lgbm_epc.txt")
meta = json.loads(open("feature_meta.json").read())

# See the full inference pipeline at:
# https://github.com/kulbinderdio/uk-epc-model/blob/main/src/model/predict.py

The full inference pipeline (including categorical encoding and grade threshold calibration) is in src/model/predict.py in the GitHub repository.

Grade calibration

After training, grade boundaries are optimised using Nelder-Mead minimisation on the first 100K test rows. Calibrated boundaries (vs SAP 2012 standard):

Boundary	SAP standard	Calibrated
G/F	21.0	22.3
F/E	39.0	38.3
E/D	55.0	53.7
D/C	69.0	68.0
C/B	81.0	80.1
B/A	92.0	91.1

Calibrated thresholds are stored in feature_meta.json.

Accuracy by property type

Type	MAE	Exact grade accuracy
House	2.99	75.3%
Flat	3.04	74.6%
Maisonette	3.13	75.3%
Park home	3.84	76.8%
Bungalow	3.92	70.0%

Top features (by gain)

walls_description — wall construction and insulation type
construction_age_band — decade the property was built
floor_description — floor construction and insulation
total_floor_area — property size in m²
roof_description — roof type and insulation level

Limitations

Predictions are estimates only — not a substitute for an official EPC from an accredited assessor
Higher uncertainty near grade boundaries (±3 SAP points)
Bungalows have lower accuracy (70%) due to higher variance in insulation setups
Model trained on assessor-submitted data; self-reported inputs add a further layer of uncertainty

Repository

Full source code, training pipeline, API, and web frontend:
https://github.com/kulbinderdio/uk-epc-model

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

MAE (SAP score, test set 2024–2026)
self-reported

3.090