alanoee commited on
Commit
470deb6
·
0 Parent(s):

Initial commit: Energy consumption prediction model

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -0
  2. README.md +105 -0
  3. energy_model_latest.joblib +3 -0
  4. example.py +90 -0
  5. model.py +290 -0
  6. requirements.txt +4 -0
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ *.joblib filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Energy Consumption Prediction Model
2
+
3
+ A Random Forest model for predicting household energy consumption patterns and costs.
4
+
5
+ ## Model Description
6
+
7
+ This model predicts monthly energy consumption in kWh and associated costs in PLN (Polish Złoty) based on historical consumption patterns and seasonal features.
8
+
9
+ **Model Type:** Random Forest Regressor
10
+ **Framework:** scikit-learn
11
+ **Performance:** R² = 0.848
12
+
13
+ ## Features
14
+
15
+ The model uses 17 engineered features including:
16
+ - **Moving averages** (3-month and 6-month windows)
17
+ - **Lag features** (1, 2, 3 months back)
18
+ - **Seasonal indicators** (winter, summer, transition periods)
19
+ - **Temporal features** (month, year, day of year, quarter)
20
+ - **Cyclical encoding** (sin/cos transforms for monthly patterns)
21
+
22
+ ## Usage
23
+
24
+ ```python
25
+ import sys
26
+ sys.path.append('.') # Add current directory to path
27
+
28
+ from model import EnergyConsumptionPredictor
29
+
30
+ # Load the pre-trained model
31
+ model = EnergyConsumptionPredictor.from_file('energy_model_latest.joblib')
32
+
33
+ # Make predictions for next 6 months
34
+ predictions = model.predict_future(months=6)
35
+
36
+ # Display results
37
+ print(predictions[['Date', 'Predicted_Consumption', 'Predicted_Cost']])
38
+ ```
39
+
40
+ ## Output Format
41
+
42
+ The model returns a pandas DataFrame with columns:
43
+ - `Date`: Month start date
44
+ - `Predicted_Consumption`: Predicted consumption in kWh
45
+ - `Predicted_Cost`: Predicted cost in PLN
46
+ - `Month`: Month number (1-12)
47
+ - `Year`: Year
48
+
49
+ ## Requirements
50
+
51
+ ```
52
+ pandas>=2.0.0
53
+ scikit-learn>=1.3.0
54
+ numpy>=1.24.0
55
+ joblib>=1.3.0
56
+ ```
57
+
58
+ ## Model Training Data
59
+
60
+ The model was trained on residential energy consumption data with:
61
+ - **17 data points** spanning multiple months
62
+ - Features include seasonal patterns, consumption history, and temporal indicators
63
+ - Target variable: Monthly energy consumption in kWh
64
+
65
+ ## Performance Metrics
66
+
67
+ - **R² Score:** 0.848
68
+ - **Model Type:** Random Forest (100 estimators)
69
+ - **Cross-validation:** 3-fold CV used for model selection
70
+
71
+ ## Feature Importance
72
+
73
+ Top 5 most important features:
74
+ 1. `consumption_ma_3` (3-month moving average)
75
+ 2. `consumption_ma_6` (6-month moving average)
76
+ 3. `consumption_lag_1` (1-month lag)
77
+ 4. `consumption_lag_3` (3-month lag)
78
+ 5. `month_sin` (seasonal encoding)
79
+
80
+ ## Cost Calculation
81
+
82
+ The model calculates costs using Polish energy pricing structure:
83
+ - Energy rate per kWh
84
+ - Distribution fees
85
+ - VAT (Value Added Tax)
86
+
87
+ ## Limitations
88
+
89
+ - Model is trained on Polish residential data
90
+ - Cost calculations use Polish energy pricing
91
+ - Designed for monthly predictions
92
+ - Performance may vary for different consumption patterns
93
+
94
+ ## Example Output
95
+
96
+ ```
97
+ Date Predicted_Consumption Predicted_Cost
98
+ 0 2025-06-01 191 216
99
+ 1 2025-07-01 135 153
100
+ 2 2025-08-01 199 224
101
+ ```
102
+
103
+ ## License
104
+
105
+ This model is provided as-is for demonstration and educational purposes.
energy_model_latest.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2096ae0d78afc040c226d02ddc89be94ac7cc5212b1afd753c228c29eb9adf2
3
+ size 307968
example.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Example usage of Energy Consumption Prediction Model
3
+ Download this file along with model.py and energy_model_latest.joblib
4
+ """
5
+
6
+ import pandas as pd
7
+ import numpy as np
8
+ from datetime import datetime
9
+ import os
10
+
11
+ def main():
12
+ # Check if model file exists
13
+ model_path = 'energy_model_latest.joblib'
14
+ if not os.path.exists(model_path):
15
+ print(f"Error: {model_path} not found!")
16
+ print("Please download energy_model_latest.joblib from this repository")
17
+ return
18
+
19
+ # Import and load model
20
+ from model import EnergyConsumptionPredictor
21
+
22
+ print("Loading energy consumption prediction model...")
23
+ model = EnergyConsumptionPredictor.from_file(model_path)
24
+
25
+ print(f"Model loaded successfully: {model.best_model_name}")
26
+ print(f"Features used: {len(model.feature_columns)}")
27
+
28
+ # Make predictions
29
+ months_to_predict = 6
30
+ print(f"\nPredicting energy consumption for next {months_to_predict} months...")
31
+
32
+ predictions = model.predict_future(months=months_to_predict)
33
+
34
+ # Display results
35
+ print("\n" + "="*60)
36
+ print("ENERGY CONSUMPTION PREDICTIONS")
37
+ print("="*60)
38
+
39
+ total_consumption = predictions['Predicted_Consumption'].sum()
40
+ total_cost = predictions['Predicted_Cost'].sum()
41
+ avg_consumption = total_consumption / months_to_predict
42
+ avg_cost = total_cost / months_to_predict
43
+
44
+ print(f"Total predicted consumption: {total_consumption:.0f} kWh")
45
+ print(f"Total predicted cost: {total_cost:.0f} PLN")
46
+ print(f"Average monthly consumption: {avg_consumption:.0f} kWh")
47
+ print(f"Average monthly cost: {avg_cost:.0f} PLN")
48
+
49
+ print(f"\nMonthly breakdown:")
50
+ print("-" * 55)
51
+ print(f"{'Month':<15} {'Consumption':<15} {'Cost (PLN)'}")
52
+ print("-" * 55)
53
+
54
+ for _, row in predictions.iterrows():
55
+ month_name = row['Date'].strftime('%B %Y')
56
+ consumption = row['Predicted_Consumption']
57
+ cost = row['Predicted_Cost']
58
+ print(f"{month_name:<15} {consumption:>8.0f} kWh {cost:>12.0f}")
59
+
60
+ print("-" * 55)
61
+
62
+ # Show feature importance
63
+ importance = model.get_feature_importance()
64
+ if importance:
65
+ print(f"\nTop 5 most important prediction features:")
66
+ for i, (feature, score) in enumerate(list(importance.items())[:5], 1):
67
+ print(f" {i}. {feature}: {score:.3f}")
68
+
69
+ # Save predictions to CSV
70
+ output_file = 'energy_predictions.csv'
71
+ predictions.to_csv(output_file, index=False)
72
+ print(f"\nPredictions saved to: {output_file}")
73
+
74
+ return predictions
75
+
76
+ if __name__ == "__main__":
77
+ print("Energy Consumption Prediction Model - Example Usage")
78
+ print("=" * 55)
79
+ print("Required files: model.py, energy_model_latest.joblib")
80
+ print("=" * 55)
81
+
82
+ try:
83
+ predictions = main()
84
+ print(f"\n✓ Success! Generated {len(predictions)} monthly predictions")
85
+ except Exception as e:
86
+ print(f"\n✗ Error: {str(e)}")
87
+ print("\nMake sure you have:")
88
+ print("1. model.py")
89
+ print("2. energy_model_latest.joblib")
90
+ print("3. Required packages: pandas, numpy, scikit-learn, joblib")
model.py ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
4
+ from sklearn.model_selection import train_test_split, cross_val_score
5
+ from sklearn.preprocessing import StandardScaler
6
+ from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
7
+ from sklearn.linear_model import LinearRegression
8
+ import pickle
9
+ import joblib
10
+ import os
11
+ from datetime import datetime
12
+ import warnings
13
+ warnings.filterwarnings('ignore')
14
+
15
+ class EnergyConsumptionPredictor:
16
+ def __init__(self):
17
+ self.models = {
18
+ 'random_forest': RandomForestRegressor(n_estimators=100, random_state=42),
19
+ 'gradient_boosting': GradientBoostingRegressor(n_estimators=100, random_state=42),
20
+ 'linear_regression': LinearRegression()
21
+ }
22
+
23
+ self.best_model = None
24
+ self.best_model_name = None
25
+ self.scaler = StandardScaler()
26
+ self.feature_columns = None
27
+ self.data_stats = {}
28
+
29
+ def _create_features(self, df):
30
+ features_df = df.copy()
31
+
32
+ # Moving averages
33
+ for window in [3, 6]:
34
+ if len(df) > window:
35
+ features_df[f'consumption_ma_{window}'] = features_df['Consumption'].rolling(window=window).mean()
36
+ features_df[f'consumption_std_{window}'] = features_df['Consumption'].rolling(window=window).std()
37
+
38
+ # Lag features
39
+ for lag in [1, 2, 3]:
40
+ if len(df) > lag:
41
+ features_df[f'consumption_lag_{lag}'] = features_df['Consumption'].shift(lag)
42
+
43
+ # Seasonal indicators
44
+ features_df['is_winter'] = features_df['Month'].isin([12, 1, 2]).astype(int)
45
+ features_df['is_summer'] = features_df['Month'].isin([6, 7, 8]).astype(int)
46
+ features_df['is_transition'] = features_df['Month'].isin([3, 4, 5, 9, 10, 11]).astype(int)
47
+
48
+ return features_df
49
+
50
+ def _prepare_training_data(self, df):
51
+ features_df = self._create_features(df)
52
+ features_df = features_df.dropna()
53
+
54
+ exclude_columns = ['Date', 'Consumption', 'Reading', 'Cost']
55
+ feature_columns = [col for col in features_df.columns if col not in exclude_columns]
56
+ self.feature_columns = feature_columns
57
+
58
+ X = features_df[feature_columns].values
59
+ y = features_df['Consumption'].values
60
+
61
+ return X, y
62
+
63
+ def train(self, df):
64
+ # Store data statistics for predictions
65
+ self.data_stats = {
66
+ 'mean_consumption': df['Consumption'].mean(),
67
+ 'std_consumption': df['Consumption'].std(),
68
+ 'min_date': df['Date'].min(),
69
+ 'max_date': df['Date'].max(),
70
+ 'seasonal_patterns': df.groupby('Month')['Consumption'].mean().to_dict()
71
+ }
72
+
73
+ X, y = self._prepare_training_data(df)
74
+
75
+ if len(X) < 5:
76
+ return self._train_baseline_model(df)
77
+
78
+ X_scaled = self.scaler.fit_transform(X)
79
+ X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, shuffle=False)
80
+
81
+ model_scores = {}
82
+
83
+ for model_name, model in self.models.items():
84
+ model.fit(X_train, y_train)
85
+ y_pred = model.predict(X_test)
86
+
87
+ r2 = r2_score(y_test, y_pred)
88
+ rmse = np.sqrt(mean_squared_error(y_test, y_pred))
89
+ mae = mean_absolute_error(y_test, y_pred)
90
+ cv_scores = cross_val_score(model, X_scaled, y, cv=3, scoring='r2')
91
+
92
+ model_scores[model_name] = {
93
+ 'r2_score': r2,
94
+ 'rmse': rmse,
95
+ 'mae': mae,
96
+ 'cv_score': cv_scores.mean()
97
+ }
98
+
99
+ # Select best model based on cross-validation
100
+ self.best_model_name = max(model_scores.keys(), key=lambda k: model_scores[k]['cv_score'])
101
+ self.best_model = self.models[self.best_model_name]
102
+ self.best_model.fit(X_scaled, y)
103
+
104
+ final_predictions = self.best_model.predict(X_scaled)
105
+ return {
106
+ 'r2_score': r2_score(y, final_predictions),
107
+ 'rmse': np.sqrt(mean_squared_error(y, final_predictions)),
108
+ 'mae': mean_absolute_error(y, final_predictions),
109
+ 'model_name': self.best_model_name,
110
+ 'all_models': model_scores
111
+ }
112
+
113
+ def _train_baseline_model(self, df):
114
+ monthly_avg = df.groupby('Month')['Consumption'].mean()
115
+ overall_mean = df['Consumption'].mean()
116
+ self.baseline_predictions = monthly_avg.fillna(overall_mean).to_dict()
117
+ self.best_model_name = "baseline_seasonal"
118
+
119
+ return {
120
+ 'r2_score': 0.0,
121
+ 'rmse': df['Consumption'].std(),
122
+ 'mae': df['Consumption'].std() * 0.8,
123
+ 'model_name': 'baseline_seasonal'
124
+ }
125
+
126
+ def predict_future(self, months=12):
127
+ if self.best_model_name == "baseline_seasonal":
128
+ return self._predict_baseline(months)
129
+
130
+ last_date = self.data_stats['max_date']
131
+ future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=months, freq='MS')
132
+
133
+ predictions = []
134
+
135
+ for date in future_dates:
136
+ features = {
137
+ 'Month': date.month,
138
+ 'Year': date.year,
139
+ 'DayOfYear': date.timetuple().tm_yday,
140
+ 'Quarter': date.quarter,
141
+ 'days_since_start': (date - self.data_stats['min_date']).days,
142
+ 'month_sin': np.sin(2 * np.pi * date.month / 12),
143
+ 'month_cos': np.cos(2 * np.pi * date.month / 12),
144
+ 'is_winter': int(date.month in [12, 1, 2]),
145
+ 'is_summer': int(date.month in [6, 7, 8]),
146
+ 'is_transition': int(date.month in [3, 4, 5, 9, 10, 11])
147
+ }
148
+
149
+ # Use seasonal patterns for lag/moving average features
150
+ seasonal_consumption = self.data_stats['seasonal_patterns'].get(date.month, self.data_stats['mean_consumption'])
151
+
152
+ for window in [3, 6]:
153
+ features[f'consumption_ma_{window}'] = seasonal_consumption
154
+ features[f'consumption_std_{window}'] = self.data_stats['std_consumption']
155
+
156
+ for lag in [1, 2, 3]:
157
+ features[f'consumption_lag_{lag}'] = seasonal_consumption
158
+
159
+ feature_vector = np.array([[features[col] for col in self.feature_columns]])
160
+ feature_vector_scaled = self.scaler.transform(feature_vector)
161
+
162
+ prediction = self.best_model.predict(feature_vector_scaled)[0]
163
+ # Add some noise to make predictions more realistic
164
+ prediction = max(0, prediction + np.random.normal(0, self.data_stats['std_consumption'] * 0.1))
165
+
166
+ predictions.append(prediction)
167
+
168
+ # Calculate costs - using hardcoded values for standalone model
169
+ ENERGY_RATE = 0.6972
170
+ DISTRIBUTION_MULTIPLIER = 0.5068
171
+ VAT_RATE = 0.23
172
+
173
+ results_df = pd.DataFrame({
174
+ 'Date': future_dates,
175
+ 'Predicted_Consumption': predictions,
176
+ 'Month': future_dates.month,
177
+ 'Year': future_dates.year
178
+ })
179
+
180
+ energy_cost = results_df['Predicted_Consumption'] * ENERGY_RATE
181
+ distribution_fee = energy_cost * DISTRIBUTION_MULTIPLIER
182
+ subtotal = energy_cost + distribution_fee
183
+ vat = subtotal * VAT_RATE
184
+ results_df['Predicted_Cost'] = subtotal + vat
185
+
186
+ return results_df
187
+
188
+ def _predict_baseline(self, months):
189
+ last_date = self.data_stats['max_date']
190
+ future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=months, freq='MS')
191
+
192
+ predictions = []
193
+ for date in future_dates:
194
+ seasonal_pred = self.baseline_predictions.get(date.month, self.data_stats['mean_consumption'])
195
+ predictions.append(max(0, seasonal_pred * (1 + np.random.normal(0, 0.1))))
196
+
197
+ ENERGY_RATE = 0.6972
198
+ DISTRIBUTION_MULTIPLIER = 0.5068
199
+ VAT_RATE = 0.23
200
+
201
+ results_df = pd.DataFrame({
202
+ 'Date': future_dates,
203
+ 'Predicted_Consumption': predictions,
204
+ 'Month': future_dates.month,
205
+ 'Year': future_dates.year
206
+ })
207
+
208
+ energy_cost = results_df['Predicted_Consumption'] * ENERGY_RATE
209
+ distribution_fee = energy_cost * DISTRIBUTION_MULTIPLIER
210
+ subtotal = energy_cost + distribution_fee
211
+ vat = subtotal * VAT_RATE
212
+ results_df['Predicted_Cost'] = subtotal + vat
213
+
214
+ return results_df
215
+
216
+ def get_feature_importance(self):
217
+ if hasattr(self.best_model, 'feature_importances_'):
218
+ importance_dict = dict(zip(self.feature_columns, self.best_model.feature_importances_))
219
+ return dict(sorted(importance_dict.items(), key=lambda x: x[1], reverse=True))
220
+ return {}
221
+
222
+ def save_model(self, filepath=None, format='joblib'):
223
+ if self.best_model is None:
224
+ raise ValueError("Model must be trained first. Use train() method.")
225
+
226
+ if filepath is None:
227
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
228
+ extension = 'joblib' if format == 'joblib' else 'pkl'
229
+ filepath = f"energy_model_{self.best_model_name}_{timestamp}.{extension}"
230
+
231
+ os.makedirs(os.path.dirname(filepath) if os.path.dirname(filepath) else '.', exist_ok=True)
232
+
233
+ model_data = {
234
+ 'best_model': self.best_model,
235
+ 'best_model_name': self.best_model_name,
236
+ 'scaler': self.scaler,
237
+ 'feature_columns': self.feature_columns,
238
+ 'data_stats': self.data_stats,
239
+ 'models': self.models,
240
+ 'baseline_predictions': getattr(self, 'baseline_predictions', None),
241
+ 'metadata': {
242
+ 'saved_at': datetime.now().isoformat(),
243
+ 'model_type': self.best_model_name,
244
+ 'feature_count': len(self.feature_columns) if self.feature_columns else 0
245
+ }
246
+ }
247
+
248
+ if format == 'joblib':
249
+ joblib.dump(model_data, filepath)
250
+ else:
251
+ with open(filepath, 'wb') as f:
252
+ pickle.dump(model_data, f)
253
+
254
+ return filepath
255
+
256
+ def load_model(self, filepath, format='auto'):
257
+ if not os.path.exists(filepath):
258
+ raise FileNotFoundError(f"File {filepath} does not exist.")
259
+
260
+ if format == 'auto':
261
+ if filepath.endswith('.joblib'):
262
+ format = 'joblib'
263
+ elif filepath.endswith('.pkl'):
264
+ format = 'pickle'
265
+ else:
266
+ format = 'joblib'
267
+
268
+ try:
269
+ if format == 'joblib':
270
+ model_data = joblib.load(filepath)
271
+ else:
272
+ with open(filepath, 'rb') as f:
273
+ model_data = pickle.load(f)
274
+
275
+ self.best_model = model_data['best_model']
276
+ self.best_model_name = model_data['best_model_name']
277
+ self.scaler = model_data['scaler']
278
+ self.feature_columns = model_data['feature_columns']
279
+ self.data_stats = model_data['data_stats']
280
+ self.models = model_data['models']
281
+ self.baseline_predictions = model_data.get('baseline_predictions')
282
+
283
+ except Exception as e:
284
+ raise ValueError(f"Error loading model: {str(e)}")
285
+
286
+ @classmethod
287
+ def from_file(cls, filepath, format='auto'):
288
+ model = cls()
289
+ model.load_model(filepath, format)
290
+ return model
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ pandas>=2.0.0
2
+ scikit-learn>=1.3.0
3
+ numpy>=1.24.0
4
+ joblib>=1.3.0