Commit ·
470deb6
0
Parent(s):
Initial commit: Energy consumption prediction model
Browse files- .gitattributes +1 -0
- README.md +105 -0
- energy_model_latest.joblib +3 -0
- example.py +90 -0
- model.py +290 -0
- requirements.txt +4 -0
.gitattributes
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Energy Consumption Prediction Model
|
| 2 |
+
|
| 3 |
+
A Random Forest model for predicting household energy consumption patterns and costs.
|
| 4 |
+
|
| 5 |
+
## Model Description
|
| 6 |
+
|
| 7 |
+
This model predicts monthly energy consumption in kWh and associated costs in PLN (Polish Złoty) based on historical consumption patterns and seasonal features.
|
| 8 |
+
|
| 9 |
+
**Model Type:** Random Forest Regressor
|
| 10 |
+
**Framework:** scikit-learn
|
| 11 |
+
**Performance:** R² = 0.848
|
| 12 |
+
|
| 13 |
+
## Features
|
| 14 |
+
|
| 15 |
+
The model uses 17 engineered features including:
|
| 16 |
+
- **Moving averages** (3-month and 6-month windows)
|
| 17 |
+
- **Lag features** (1, 2, 3 months back)
|
| 18 |
+
- **Seasonal indicators** (winter, summer, transition periods)
|
| 19 |
+
- **Temporal features** (month, year, day of year, quarter)
|
| 20 |
+
- **Cyclical encoding** (sin/cos transforms for monthly patterns)
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
import sys
|
| 26 |
+
sys.path.append('.') # Add current directory to path
|
| 27 |
+
|
| 28 |
+
from model import EnergyConsumptionPredictor
|
| 29 |
+
|
| 30 |
+
# Load the pre-trained model
|
| 31 |
+
model = EnergyConsumptionPredictor.from_file('energy_model_latest.joblib')
|
| 32 |
+
|
| 33 |
+
# Make predictions for next 6 months
|
| 34 |
+
predictions = model.predict_future(months=6)
|
| 35 |
+
|
| 36 |
+
# Display results
|
| 37 |
+
print(predictions[['Date', 'Predicted_Consumption', 'Predicted_Cost']])
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Output Format
|
| 41 |
+
|
| 42 |
+
The model returns a pandas DataFrame with columns:
|
| 43 |
+
- `Date`: Month start date
|
| 44 |
+
- `Predicted_Consumption`: Predicted consumption in kWh
|
| 45 |
+
- `Predicted_Cost`: Predicted cost in PLN
|
| 46 |
+
- `Month`: Month number (1-12)
|
| 47 |
+
- `Year`: Year
|
| 48 |
+
|
| 49 |
+
## Requirements
|
| 50 |
+
|
| 51 |
+
```
|
| 52 |
+
pandas>=2.0.0
|
| 53 |
+
scikit-learn>=1.3.0
|
| 54 |
+
numpy>=1.24.0
|
| 55 |
+
joblib>=1.3.0
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## Model Training Data
|
| 59 |
+
|
| 60 |
+
The model was trained on residential energy consumption data with:
|
| 61 |
+
- **17 data points** spanning multiple months
|
| 62 |
+
- Features include seasonal patterns, consumption history, and temporal indicators
|
| 63 |
+
- Target variable: Monthly energy consumption in kWh
|
| 64 |
+
|
| 65 |
+
## Performance Metrics
|
| 66 |
+
|
| 67 |
+
- **R² Score:** 0.848
|
| 68 |
+
- **Model Type:** Random Forest (100 estimators)
|
| 69 |
+
- **Cross-validation:** 3-fold CV used for model selection
|
| 70 |
+
|
| 71 |
+
## Feature Importance
|
| 72 |
+
|
| 73 |
+
Top 5 most important features:
|
| 74 |
+
1. `consumption_ma_3` (3-month moving average)
|
| 75 |
+
2. `consumption_ma_6` (6-month moving average)
|
| 76 |
+
3. `consumption_lag_1` (1-month lag)
|
| 77 |
+
4. `consumption_lag_3` (3-month lag)
|
| 78 |
+
5. `month_sin` (seasonal encoding)
|
| 79 |
+
|
| 80 |
+
## Cost Calculation
|
| 81 |
+
|
| 82 |
+
The model calculates costs using Polish energy pricing structure:
|
| 83 |
+
- Energy rate per kWh
|
| 84 |
+
- Distribution fees
|
| 85 |
+
- VAT (Value Added Tax)
|
| 86 |
+
|
| 87 |
+
## Limitations
|
| 88 |
+
|
| 89 |
+
- Model is trained on Polish residential data
|
| 90 |
+
- Cost calculations use Polish energy pricing
|
| 91 |
+
- Designed for monthly predictions
|
| 92 |
+
- Performance may vary for different consumption patterns
|
| 93 |
+
|
| 94 |
+
## Example Output
|
| 95 |
+
|
| 96 |
+
```
|
| 97 |
+
Date Predicted_Consumption Predicted_Cost
|
| 98 |
+
0 2025-06-01 191 216
|
| 99 |
+
1 2025-07-01 135 153
|
| 100 |
+
2 2025-08-01 199 224
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
## License
|
| 104 |
+
|
| 105 |
+
This model is provided as-is for demonstration and educational purposes.
|
energy_model_latest.joblib
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e2096ae0d78afc040c226d02ddc89be94ac7cc5212b1afd753c228c29eb9adf2
|
| 3 |
+
size 307968
|
example.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Example usage of Energy Consumption Prediction Model
|
| 3 |
+
Download this file along with model.py and energy_model_latest.joblib
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import pandas as pd
|
| 7 |
+
import numpy as np
|
| 8 |
+
from datetime import datetime
|
| 9 |
+
import os
|
| 10 |
+
|
| 11 |
+
def main():
|
| 12 |
+
# Check if model file exists
|
| 13 |
+
model_path = 'energy_model_latest.joblib'
|
| 14 |
+
if not os.path.exists(model_path):
|
| 15 |
+
print(f"Error: {model_path} not found!")
|
| 16 |
+
print("Please download energy_model_latest.joblib from this repository")
|
| 17 |
+
return
|
| 18 |
+
|
| 19 |
+
# Import and load model
|
| 20 |
+
from model import EnergyConsumptionPredictor
|
| 21 |
+
|
| 22 |
+
print("Loading energy consumption prediction model...")
|
| 23 |
+
model = EnergyConsumptionPredictor.from_file(model_path)
|
| 24 |
+
|
| 25 |
+
print(f"Model loaded successfully: {model.best_model_name}")
|
| 26 |
+
print(f"Features used: {len(model.feature_columns)}")
|
| 27 |
+
|
| 28 |
+
# Make predictions
|
| 29 |
+
months_to_predict = 6
|
| 30 |
+
print(f"\nPredicting energy consumption for next {months_to_predict} months...")
|
| 31 |
+
|
| 32 |
+
predictions = model.predict_future(months=months_to_predict)
|
| 33 |
+
|
| 34 |
+
# Display results
|
| 35 |
+
print("\n" + "="*60)
|
| 36 |
+
print("ENERGY CONSUMPTION PREDICTIONS")
|
| 37 |
+
print("="*60)
|
| 38 |
+
|
| 39 |
+
total_consumption = predictions['Predicted_Consumption'].sum()
|
| 40 |
+
total_cost = predictions['Predicted_Cost'].sum()
|
| 41 |
+
avg_consumption = total_consumption / months_to_predict
|
| 42 |
+
avg_cost = total_cost / months_to_predict
|
| 43 |
+
|
| 44 |
+
print(f"Total predicted consumption: {total_consumption:.0f} kWh")
|
| 45 |
+
print(f"Total predicted cost: {total_cost:.0f} PLN")
|
| 46 |
+
print(f"Average monthly consumption: {avg_consumption:.0f} kWh")
|
| 47 |
+
print(f"Average monthly cost: {avg_cost:.0f} PLN")
|
| 48 |
+
|
| 49 |
+
print(f"\nMonthly breakdown:")
|
| 50 |
+
print("-" * 55)
|
| 51 |
+
print(f"{'Month':<15} {'Consumption':<15} {'Cost (PLN)'}")
|
| 52 |
+
print("-" * 55)
|
| 53 |
+
|
| 54 |
+
for _, row in predictions.iterrows():
|
| 55 |
+
month_name = row['Date'].strftime('%B %Y')
|
| 56 |
+
consumption = row['Predicted_Consumption']
|
| 57 |
+
cost = row['Predicted_Cost']
|
| 58 |
+
print(f"{month_name:<15} {consumption:>8.0f} kWh {cost:>12.0f}")
|
| 59 |
+
|
| 60 |
+
print("-" * 55)
|
| 61 |
+
|
| 62 |
+
# Show feature importance
|
| 63 |
+
importance = model.get_feature_importance()
|
| 64 |
+
if importance:
|
| 65 |
+
print(f"\nTop 5 most important prediction features:")
|
| 66 |
+
for i, (feature, score) in enumerate(list(importance.items())[:5], 1):
|
| 67 |
+
print(f" {i}. {feature}: {score:.3f}")
|
| 68 |
+
|
| 69 |
+
# Save predictions to CSV
|
| 70 |
+
output_file = 'energy_predictions.csv'
|
| 71 |
+
predictions.to_csv(output_file, index=False)
|
| 72 |
+
print(f"\nPredictions saved to: {output_file}")
|
| 73 |
+
|
| 74 |
+
return predictions
|
| 75 |
+
|
| 76 |
+
if __name__ == "__main__":
|
| 77 |
+
print("Energy Consumption Prediction Model - Example Usage")
|
| 78 |
+
print("=" * 55)
|
| 79 |
+
print("Required files: model.py, energy_model_latest.joblib")
|
| 80 |
+
print("=" * 55)
|
| 81 |
+
|
| 82 |
+
try:
|
| 83 |
+
predictions = main()
|
| 84 |
+
print(f"\n✓ Success! Generated {len(predictions)} monthly predictions")
|
| 85 |
+
except Exception as e:
|
| 86 |
+
print(f"\n✗ Error: {str(e)}")
|
| 87 |
+
print("\nMake sure you have:")
|
| 88 |
+
print("1. model.py")
|
| 89 |
+
print("2. energy_model_latest.joblib")
|
| 90 |
+
print("3. Required packages: pandas, numpy, scikit-learn, joblib")
|
model.py
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pandas as pd
|
| 2 |
+
import numpy as np
|
| 3 |
+
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
|
| 4 |
+
from sklearn.model_selection import train_test_split, cross_val_score
|
| 5 |
+
from sklearn.preprocessing import StandardScaler
|
| 6 |
+
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
|
| 7 |
+
from sklearn.linear_model import LinearRegression
|
| 8 |
+
import pickle
|
| 9 |
+
import joblib
|
| 10 |
+
import os
|
| 11 |
+
from datetime import datetime
|
| 12 |
+
import warnings
|
| 13 |
+
warnings.filterwarnings('ignore')
|
| 14 |
+
|
| 15 |
+
class EnergyConsumptionPredictor:
|
| 16 |
+
def __init__(self):
|
| 17 |
+
self.models = {
|
| 18 |
+
'random_forest': RandomForestRegressor(n_estimators=100, random_state=42),
|
| 19 |
+
'gradient_boosting': GradientBoostingRegressor(n_estimators=100, random_state=42),
|
| 20 |
+
'linear_regression': LinearRegression()
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
self.best_model = None
|
| 24 |
+
self.best_model_name = None
|
| 25 |
+
self.scaler = StandardScaler()
|
| 26 |
+
self.feature_columns = None
|
| 27 |
+
self.data_stats = {}
|
| 28 |
+
|
| 29 |
+
def _create_features(self, df):
|
| 30 |
+
features_df = df.copy()
|
| 31 |
+
|
| 32 |
+
# Moving averages
|
| 33 |
+
for window in [3, 6]:
|
| 34 |
+
if len(df) > window:
|
| 35 |
+
features_df[f'consumption_ma_{window}'] = features_df['Consumption'].rolling(window=window).mean()
|
| 36 |
+
features_df[f'consumption_std_{window}'] = features_df['Consumption'].rolling(window=window).std()
|
| 37 |
+
|
| 38 |
+
# Lag features
|
| 39 |
+
for lag in [1, 2, 3]:
|
| 40 |
+
if len(df) > lag:
|
| 41 |
+
features_df[f'consumption_lag_{lag}'] = features_df['Consumption'].shift(lag)
|
| 42 |
+
|
| 43 |
+
# Seasonal indicators
|
| 44 |
+
features_df['is_winter'] = features_df['Month'].isin([12, 1, 2]).astype(int)
|
| 45 |
+
features_df['is_summer'] = features_df['Month'].isin([6, 7, 8]).astype(int)
|
| 46 |
+
features_df['is_transition'] = features_df['Month'].isin([3, 4, 5, 9, 10, 11]).astype(int)
|
| 47 |
+
|
| 48 |
+
return features_df
|
| 49 |
+
|
| 50 |
+
def _prepare_training_data(self, df):
|
| 51 |
+
features_df = self._create_features(df)
|
| 52 |
+
features_df = features_df.dropna()
|
| 53 |
+
|
| 54 |
+
exclude_columns = ['Date', 'Consumption', 'Reading', 'Cost']
|
| 55 |
+
feature_columns = [col for col in features_df.columns if col not in exclude_columns]
|
| 56 |
+
self.feature_columns = feature_columns
|
| 57 |
+
|
| 58 |
+
X = features_df[feature_columns].values
|
| 59 |
+
y = features_df['Consumption'].values
|
| 60 |
+
|
| 61 |
+
return X, y
|
| 62 |
+
|
| 63 |
+
def train(self, df):
|
| 64 |
+
# Store data statistics for predictions
|
| 65 |
+
self.data_stats = {
|
| 66 |
+
'mean_consumption': df['Consumption'].mean(),
|
| 67 |
+
'std_consumption': df['Consumption'].std(),
|
| 68 |
+
'min_date': df['Date'].min(),
|
| 69 |
+
'max_date': df['Date'].max(),
|
| 70 |
+
'seasonal_patterns': df.groupby('Month')['Consumption'].mean().to_dict()
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
X, y = self._prepare_training_data(df)
|
| 74 |
+
|
| 75 |
+
if len(X) < 5:
|
| 76 |
+
return self._train_baseline_model(df)
|
| 77 |
+
|
| 78 |
+
X_scaled = self.scaler.fit_transform(X)
|
| 79 |
+
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, shuffle=False)
|
| 80 |
+
|
| 81 |
+
model_scores = {}
|
| 82 |
+
|
| 83 |
+
for model_name, model in self.models.items():
|
| 84 |
+
model.fit(X_train, y_train)
|
| 85 |
+
y_pred = model.predict(X_test)
|
| 86 |
+
|
| 87 |
+
r2 = r2_score(y_test, y_pred)
|
| 88 |
+
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
|
| 89 |
+
mae = mean_absolute_error(y_test, y_pred)
|
| 90 |
+
cv_scores = cross_val_score(model, X_scaled, y, cv=3, scoring='r2')
|
| 91 |
+
|
| 92 |
+
model_scores[model_name] = {
|
| 93 |
+
'r2_score': r2,
|
| 94 |
+
'rmse': rmse,
|
| 95 |
+
'mae': mae,
|
| 96 |
+
'cv_score': cv_scores.mean()
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
# Select best model based on cross-validation
|
| 100 |
+
self.best_model_name = max(model_scores.keys(), key=lambda k: model_scores[k]['cv_score'])
|
| 101 |
+
self.best_model = self.models[self.best_model_name]
|
| 102 |
+
self.best_model.fit(X_scaled, y)
|
| 103 |
+
|
| 104 |
+
final_predictions = self.best_model.predict(X_scaled)
|
| 105 |
+
return {
|
| 106 |
+
'r2_score': r2_score(y, final_predictions),
|
| 107 |
+
'rmse': np.sqrt(mean_squared_error(y, final_predictions)),
|
| 108 |
+
'mae': mean_absolute_error(y, final_predictions),
|
| 109 |
+
'model_name': self.best_model_name,
|
| 110 |
+
'all_models': model_scores
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
def _train_baseline_model(self, df):
|
| 114 |
+
monthly_avg = df.groupby('Month')['Consumption'].mean()
|
| 115 |
+
overall_mean = df['Consumption'].mean()
|
| 116 |
+
self.baseline_predictions = monthly_avg.fillna(overall_mean).to_dict()
|
| 117 |
+
self.best_model_name = "baseline_seasonal"
|
| 118 |
+
|
| 119 |
+
return {
|
| 120 |
+
'r2_score': 0.0,
|
| 121 |
+
'rmse': df['Consumption'].std(),
|
| 122 |
+
'mae': df['Consumption'].std() * 0.8,
|
| 123 |
+
'model_name': 'baseline_seasonal'
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
def predict_future(self, months=12):
|
| 127 |
+
if self.best_model_name == "baseline_seasonal":
|
| 128 |
+
return self._predict_baseline(months)
|
| 129 |
+
|
| 130 |
+
last_date = self.data_stats['max_date']
|
| 131 |
+
future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=months, freq='MS')
|
| 132 |
+
|
| 133 |
+
predictions = []
|
| 134 |
+
|
| 135 |
+
for date in future_dates:
|
| 136 |
+
features = {
|
| 137 |
+
'Month': date.month,
|
| 138 |
+
'Year': date.year,
|
| 139 |
+
'DayOfYear': date.timetuple().tm_yday,
|
| 140 |
+
'Quarter': date.quarter,
|
| 141 |
+
'days_since_start': (date - self.data_stats['min_date']).days,
|
| 142 |
+
'month_sin': np.sin(2 * np.pi * date.month / 12),
|
| 143 |
+
'month_cos': np.cos(2 * np.pi * date.month / 12),
|
| 144 |
+
'is_winter': int(date.month in [12, 1, 2]),
|
| 145 |
+
'is_summer': int(date.month in [6, 7, 8]),
|
| 146 |
+
'is_transition': int(date.month in [3, 4, 5, 9, 10, 11])
|
| 147 |
+
}
|
| 148 |
+
|
| 149 |
+
# Use seasonal patterns for lag/moving average features
|
| 150 |
+
seasonal_consumption = self.data_stats['seasonal_patterns'].get(date.month, self.data_stats['mean_consumption'])
|
| 151 |
+
|
| 152 |
+
for window in [3, 6]:
|
| 153 |
+
features[f'consumption_ma_{window}'] = seasonal_consumption
|
| 154 |
+
features[f'consumption_std_{window}'] = self.data_stats['std_consumption']
|
| 155 |
+
|
| 156 |
+
for lag in [1, 2, 3]:
|
| 157 |
+
features[f'consumption_lag_{lag}'] = seasonal_consumption
|
| 158 |
+
|
| 159 |
+
feature_vector = np.array([[features[col] for col in self.feature_columns]])
|
| 160 |
+
feature_vector_scaled = self.scaler.transform(feature_vector)
|
| 161 |
+
|
| 162 |
+
prediction = self.best_model.predict(feature_vector_scaled)[0]
|
| 163 |
+
# Add some noise to make predictions more realistic
|
| 164 |
+
prediction = max(0, prediction + np.random.normal(0, self.data_stats['std_consumption'] * 0.1))
|
| 165 |
+
|
| 166 |
+
predictions.append(prediction)
|
| 167 |
+
|
| 168 |
+
# Calculate costs - using hardcoded values for standalone model
|
| 169 |
+
ENERGY_RATE = 0.6972
|
| 170 |
+
DISTRIBUTION_MULTIPLIER = 0.5068
|
| 171 |
+
VAT_RATE = 0.23
|
| 172 |
+
|
| 173 |
+
results_df = pd.DataFrame({
|
| 174 |
+
'Date': future_dates,
|
| 175 |
+
'Predicted_Consumption': predictions,
|
| 176 |
+
'Month': future_dates.month,
|
| 177 |
+
'Year': future_dates.year
|
| 178 |
+
})
|
| 179 |
+
|
| 180 |
+
energy_cost = results_df['Predicted_Consumption'] * ENERGY_RATE
|
| 181 |
+
distribution_fee = energy_cost * DISTRIBUTION_MULTIPLIER
|
| 182 |
+
subtotal = energy_cost + distribution_fee
|
| 183 |
+
vat = subtotal * VAT_RATE
|
| 184 |
+
results_df['Predicted_Cost'] = subtotal + vat
|
| 185 |
+
|
| 186 |
+
return results_df
|
| 187 |
+
|
| 188 |
+
def _predict_baseline(self, months):
|
| 189 |
+
last_date = self.data_stats['max_date']
|
| 190 |
+
future_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=months, freq='MS')
|
| 191 |
+
|
| 192 |
+
predictions = []
|
| 193 |
+
for date in future_dates:
|
| 194 |
+
seasonal_pred = self.baseline_predictions.get(date.month, self.data_stats['mean_consumption'])
|
| 195 |
+
predictions.append(max(0, seasonal_pred * (1 + np.random.normal(0, 0.1))))
|
| 196 |
+
|
| 197 |
+
ENERGY_RATE = 0.6972
|
| 198 |
+
DISTRIBUTION_MULTIPLIER = 0.5068
|
| 199 |
+
VAT_RATE = 0.23
|
| 200 |
+
|
| 201 |
+
results_df = pd.DataFrame({
|
| 202 |
+
'Date': future_dates,
|
| 203 |
+
'Predicted_Consumption': predictions,
|
| 204 |
+
'Month': future_dates.month,
|
| 205 |
+
'Year': future_dates.year
|
| 206 |
+
})
|
| 207 |
+
|
| 208 |
+
energy_cost = results_df['Predicted_Consumption'] * ENERGY_RATE
|
| 209 |
+
distribution_fee = energy_cost * DISTRIBUTION_MULTIPLIER
|
| 210 |
+
subtotal = energy_cost + distribution_fee
|
| 211 |
+
vat = subtotal * VAT_RATE
|
| 212 |
+
results_df['Predicted_Cost'] = subtotal + vat
|
| 213 |
+
|
| 214 |
+
return results_df
|
| 215 |
+
|
| 216 |
+
def get_feature_importance(self):
|
| 217 |
+
if hasattr(self.best_model, 'feature_importances_'):
|
| 218 |
+
importance_dict = dict(zip(self.feature_columns, self.best_model.feature_importances_))
|
| 219 |
+
return dict(sorted(importance_dict.items(), key=lambda x: x[1], reverse=True))
|
| 220 |
+
return {}
|
| 221 |
+
|
| 222 |
+
def save_model(self, filepath=None, format='joblib'):
|
| 223 |
+
if self.best_model is None:
|
| 224 |
+
raise ValueError("Model must be trained first. Use train() method.")
|
| 225 |
+
|
| 226 |
+
if filepath is None:
|
| 227 |
+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 228 |
+
extension = 'joblib' if format == 'joblib' else 'pkl'
|
| 229 |
+
filepath = f"energy_model_{self.best_model_name}_{timestamp}.{extension}"
|
| 230 |
+
|
| 231 |
+
os.makedirs(os.path.dirname(filepath) if os.path.dirname(filepath) else '.', exist_ok=True)
|
| 232 |
+
|
| 233 |
+
model_data = {
|
| 234 |
+
'best_model': self.best_model,
|
| 235 |
+
'best_model_name': self.best_model_name,
|
| 236 |
+
'scaler': self.scaler,
|
| 237 |
+
'feature_columns': self.feature_columns,
|
| 238 |
+
'data_stats': self.data_stats,
|
| 239 |
+
'models': self.models,
|
| 240 |
+
'baseline_predictions': getattr(self, 'baseline_predictions', None),
|
| 241 |
+
'metadata': {
|
| 242 |
+
'saved_at': datetime.now().isoformat(),
|
| 243 |
+
'model_type': self.best_model_name,
|
| 244 |
+
'feature_count': len(self.feature_columns) if self.feature_columns else 0
|
| 245 |
+
}
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
if format == 'joblib':
|
| 249 |
+
joblib.dump(model_data, filepath)
|
| 250 |
+
else:
|
| 251 |
+
with open(filepath, 'wb') as f:
|
| 252 |
+
pickle.dump(model_data, f)
|
| 253 |
+
|
| 254 |
+
return filepath
|
| 255 |
+
|
| 256 |
+
def load_model(self, filepath, format='auto'):
|
| 257 |
+
if not os.path.exists(filepath):
|
| 258 |
+
raise FileNotFoundError(f"File {filepath} does not exist.")
|
| 259 |
+
|
| 260 |
+
if format == 'auto':
|
| 261 |
+
if filepath.endswith('.joblib'):
|
| 262 |
+
format = 'joblib'
|
| 263 |
+
elif filepath.endswith('.pkl'):
|
| 264 |
+
format = 'pickle'
|
| 265 |
+
else:
|
| 266 |
+
format = 'joblib'
|
| 267 |
+
|
| 268 |
+
try:
|
| 269 |
+
if format == 'joblib':
|
| 270 |
+
model_data = joblib.load(filepath)
|
| 271 |
+
else:
|
| 272 |
+
with open(filepath, 'rb') as f:
|
| 273 |
+
model_data = pickle.load(f)
|
| 274 |
+
|
| 275 |
+
self.best_model = model_data['best_model']
|
| 276 |
+
self.best_model_name = model_data['best_model_name']
|
| 277 |
+
self.scaler = model_data['scaler']
|
| 278 |
+
self.feature_columns = model_data['feature_columns']
|
| 279 |
+
self.data_stats = model_data['data_stats']
|
| 280 |
+
self.models = model_data['models']
|
| 281 |
+
self.baseline_predictions = model_data.get('baseline_predictions')
|
| 282 |
+
|
| 283 |
+
except Exception as e:
|
| 284 |
+
raise ValueError(f"Error loading model: {str(e)}")
|
| 285 |
+
|
| 286 |
+
@classmethod
|
| 287 |
+
def from_file(cls, filepath, format='auto'):
|
| 288 |
+
model = cls()
|
| 289 |
+
model.load_model(filepath, format)
|
| 290 |
+
return model
|
requirements.txt
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
pandas>=2.0.0
|
| 2 |
+
scikit-learn>=1.3.0
|
| 3 |
+
numpy>=1.24.0
|
| 4 |
+
joblib>=1.3.0
|