--- library_name: sklearn tags: - energy-consumption - regression - random-forest - xgboost - building-energy - sustainability - carbon-footprint pipeline_tag: tabular-regression --- # Ecologia Gas Consumption Model ## Model Description This model predicts **gas_consumption (m³)** for buildings using machine learning ensemble methods. - **Model Architecture**: Random Forest Regressor (Best Model) - **Task**: Regression (Energy Consumption Prediction) - **Target Variable**: gas_consumption (m³) - **Input Features**: 22 features - **Training Dataset**: Building Data Genome Project 2 - **Training Samples**: ~15 million ## Model Performance ### Random Forest Model - **RMSE**: 459.7374 - **MAE**: 131.9079 - **R² Score**: 0.9090 ### XGBoost Model - **RMSE**: 499.6148 - **MAE**: 156.0127 - **R² Score**: 0.8925 ### Best Model The best performing model (based on validation RMSE) is saved as `gas_model.joblib`. ## Training Details ### Dataset - **Source**: [Building Data Genome Project 2](https://www.kaggle.com/datasets/claytonmiller/buildingdatagenomeproject2) - **Training Samples**: ~15 million - **Data Preprocessing**: - Outlier removal (99th percentile) - Feature engineering (temporal, building, weather features) - Missing value imputation - Normalization ### Training Method - **Algorithm**: Ensemble (Random Forest + XGBoost) - **Best Model Selection**: Based on validation RMSE - **Cross-Validation**: Train/Validation/Test split (60/20/20) - **Hyperparameters**: Optimized for large-scale datasets ### Feature Engineering The model uses 22 engineered features including: - **Building Features**: Type, area, age, location - **Temporal Features**: Hour, day, month, season, day of week - **Weather Features**: Temperature, humidity, dew point - **Interaction Features**: Building-weather interactions - **Lag Features**: Previous consumption patterns ## Usage ### Installation ```bash pip install scikit-learn xgboost joblib huggingface_hub ``` ### Load Model ```python from huggingface_hub import hf_hub_download import joblib # Download model and features model_path = hf_hub_download( repo_id="codealchemist01/ecologia-gas-model", filename="gas_model.joblib", token="YOUR_HF_TOKEN" # Optional if public ) features_path = hf_hub_download( repo_id="codealchemist01/ecologia-gas-model", filename="gas_features.joblib", token="YOUR_HF_TOKEN" # Optional if public ) # Load model and features model = joblib.load(model_path) feature_columns = joblib.load(features_path) ``` ### Prediction Example ```python import pandas as pd import numpy as np # Prepare input data (example) input_data = pd.DataFrame({ 'building_type': ['Office'], 'area_sqm': [1000], 'year_built': [2020], 'temperature': [20.5], 'humidity': [65], 'hour': [14], 'day_of_week': [1], 'month': [6], # ... other required features }) # Ensure all features are present for col in feature_columns: if col not in input_data.columns: input_data[col] = 0 # Select features in correct order input_data = input_data[feature_columns] # Make prediction prediction = model.predict(input_data) print(f"Predicted gas_consumption (m³): {prediction[0]:.2f}") ``` ## Model Limitations - Model performance may vary based on building characteristics and regional differences - Training data is primarily from North American buildings - Predictions are estimates and should be validated with actual consumption data - Model requires all input features to be provided ## Ethical Considerations - Model is designed to help reduce energy consumption and carbon footprint - No personal or sensitive data is used in training - Model predictions should be used responsibly for sustainability purposes ## Citation If you use this model, please cite: ```bibtex @software{ecologia_energy_model, title = {Ecologia Gas Consumption Model}, author = {Ecologia Energy Team}, year = {2024}, url = {https://huggingface.co/codealchemist01/ecologia-gas-model}, note = {Trained on Building Data Genome Project 2 dataset} } ``` ## License This model is released under the MIT License. ## Contact For questions or issues, please open an issue on the repository or contact the Ecologia Energy team. ## Acknowledgments - Building Data Genome Project 2 dataset creators - scikit-learn and XGBoost communities - HuggingFace for model hosting --- *This model is part of the Ecologia sustainability platform for energy consumption prediction and carbon footprint calculation.*