ecologia-gas-model / README.md

Upload README.md with huggingface_hub

fa120e2 verified 3 months ago

4.72 kB

	---
	library_name: sklearn
	tags:
	- energy-consumption
	- regression
	- random-forest
	- xgboost
	- building-energy
	- sustainability
	- carbon-footprint
	pipeline_tag: tabular-regression
	---

	# Ecologia Gas Consumption Model

	## Model Description

	This model predicts gas_consumption (m³) for buildings using machine learning ensemble methods.

	- Model Architecture: Random Forest Regressor (Best Model)
	- Task: Regression (Energy Consumption Prediction)
	- Target Variable: gas_consumption (m³)
	- Input Features: 22 features
	- Training Dataset: Building Data Genome Project 2
	- Training Samples: ~15 million

	## Model Performance

	### Random Forest Model
	- RMSE: 459.7374
	- MAE: 131.9079
	- R² Score: 0.9090

	### XGBoost Model
	- RMSE: 499.6148
	- MAE: 156.0127
	- R² Score: 0.8925

	### Best Model
	The best performing model (based on validation RMSE) is saved as `gas_model.joblib`.

	## Training Details

	### Dataset
	- Source: [Building Data Genome Project 2](https://www.kaggle.com/datasets/claytonmiller/buildingdatagenomeproject2)
	- Training Samples: ~15 million
	- Data Preprocessing:
	- Outlier removal (99th percentile)
	- Feature engineering (temporal, building, weather features)
	- Missing value imputation
	- Normalization

	### Training Method
	- Algorithm: Ensemble (Random Forest + XGBoost)
	- Best Model Selection: Based on validation RMSE
	- Cross-Validation: Train/Validation/Test split (60/20/20)
	- Hyperparameters: Optimized for large-scale datasets

	### Feature Engineering
	The model uses 22 engineered features including:
	- Building Features: Type, area, age, location
	- Temporal Features: Hour, day, month, season, day of week
	- Weather Features: Temperature, humidity, dew point
	- Interaction Features: Building-weather interactions
	- Lag Features: Previous consumption patterns

	## Usage

	### Installation
	```bash
	pip install scikit-learn xgboost joblib huggingface_hub
	```

	### Load Model
	```python
	from huggingface_hub import hf_hub_download
	import joblib

	# Download model and features
	model_path = hf_hub_download(
	repo_id="codealchemist01/ecologia-gas-model",
	filename="gas_model.joblib",
	token="YOUR_HF_TOKEN" # Optional if public
	)

	features_path = hf_hub_download(
	repo_id="codealchemist01/ecologia-gas-model",
	filename="gas_features.joblib",
	token="YOUR_HF_TOKEN" # Optional if public
	)

	# Load model and features
	model = joblib.load(model_path)
	feature_columns = joblib.load(features_path)
	```

	### Prediction Example
	```python
	import pandas as pd
	import numpy as np

	# Prepare input data (example)
	input_data = pd.DataFrame({
	'building_type': ['Office'],
	'area_sqm': [1000],
	'year_built': [2020],
	'temperature': [20.5],
	'humidity': [65],
	'hour': [14],
	'day_of_week': [1],
	'month': [6],
	# ... other required features
	})

	# Ensure all features are present
	for col in feature_columns:
	if col not in input_data.columns:
	input_data[col] = 0

	# Select features in correct order
	input_data = input_data[feature_columns]

	# Make prediction
	prediction = model.predict(input_data)
	print(f"Predicted gas_consumption (m³): {prediction[0]:.2f}")
	```

	## Model Limitations

	- Model performance may vary based on building characteristics and regional differences
	- Training data is primarily from North American buildings
	- Predictions are estimates and should be validated with actual consumption data
	- Model requires all input features to be provided

	## Ethical Considerations

	- Model is designed to help reduce energy consumption and carbon footprint
	- No personal or sensitive data is used in training
	- Model predictions should be used responsibly for sustainability purposes

	## Citation

	If you use this model, please cite:

	```bibtex
	@software{ecologia_energy_model,
	title = {Ecologia Gas Consumption Model},
	author = {Ecologia Energy Team},
	year = {2024},
	url = {https://huggingface.co/codealchemist01/ecologia-gas-model},
	note = {Trained on Building Data Genome Project 2 dataset}
	}
	```

	## License

	This model is released under the MIT License.

	## Contact

	For questions or issues, please open an issue on the repository or contact the Ecologia Energy team.

	## Acknowledgments

	- Building Data Genome Project 2 dataset creators
	- scikit-learn and XGBoost communities
	- HuggingFace for model hosting

	---
	This model is part of the Ecologia sustainability platform for energy consumption prediction and carbon footprint calculation.