Update README.md

09306ed verified over 1 year ago

12 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	{{ card_data }}
	---

	# Total Return Prediction

	<!-- Provide a quick summary of what the model is/does. -->

	The Climate Index AI LSTM model is designed to predict the total returns of commercial real estate (CRE) investments by incorporating climate-related risks, such as extreme temperatures, alongside financial indicators like interest rates and inflation. The model forecasts property values in 138 Core-Based Statistical Areas (CBSA) over a 12-quarter forecast horizon (3 years).

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model leverages machine learning (ML), specifically a Long-Short-Term Memory (LSTM) neural network, to capture the complex relationships between climate variables and financial conditions impacting property valuations and returns. It provides investors with insights into how extreme weather events, shifting temperature patterns, and long-term environmental changes affect the commercial real estate market.

	- Developed by: Climate Index AI
	- Model type: LSTM neural network
	- License: Private Proprietary License (contact Climate Index AI Inc. for access)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://huggingface.co/climateindexai/total_return_prediction
	- Paper: Observing the Effect of Climate Change on Total Returns in Commercial Real Estate with Machine Learning (2024)

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	The model can be used to assess the impact of climate and economic factors on commercial real estate (CRE) returns, supporting investors in portfolio management and regional investment planning.

	### Downstream Use

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	The model can be integrated with other forecasting tools for broader financial analysis.

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	This model is not for high-frequency trading, or precise short-term valuations.

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	The model is limited by the scope of its training data, which primarily focuses on U.S. CBSA regions. As such, it may not generalize well to international markets. Additionally, extreme and unprecedented climate events outside historical patterns may affect its accuracy.

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	While the Climate Index AI LSTM model provides valuable insights into the potential impacts of climate and economic factors on commercial real estate returns, users should exercise caution due to certain inherent limitations:

	#### Local vs. Global Climate Changes

	Since climate change effects can vary significantly by region, the model may have varying accuracy across different CBSAs. Users are encouraged to consider local climate conditions and risks that may not be well-represented in historical data when interpreting model results.

	#### Sensitivity to Macroeconomic

	The model includes financial indicators such as interest rates and inflation but may not account for sudden, significant economic shocks (e.g., rapid inflation spikes or economic downturns). Users should pair model insights with ongoing economic assessments to better gauge the model’s predictions.

	#### Complementary Use with Expert Judgment

	The model’s predictions should be considered part of a broader decision-making process. It’s advisable to incorporate expert judgment and complementary climate or economic analyses, especially for high-stakes investments or in regions with volatile climate trends.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from tensorflow.keras.models import load_model

	model = load_model('total_return_model.keras')
	```

	## Try it live

	[Live Demo](https://huggingface.co/spaces/climateindexai/total_return_prediction)

	## Training Details

	The Climate Index AI LSTM model leverages a comprehensive dataset that integrates historical climate, economic, and real estate data. This structure allows the model to capture the complex relationships between climate variables and financial conditions influencing property returns.

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	The dataset used for this model comprises three main types of information, organized in quarterly time steps spanning from 1981 to 2023 across 138 Core-Based Statistical Areas (CBSAs). Each data point includes:

	- Climate Data: We use temperature and precipitation data from the PRISM dataset, which provides high-resolution historical weather data, including metrics on extreme temperatures and long-term climate trends. The data was compiled by counting high and low temperatures over the past 24 quarters, starting from the specified quarter.
	- Financial Indicators: Interest rate and inflation data are sourced from the Federal Reserve Economic Data (FRED) repository, a comprehensive resource for historical U.S. economic metrics.
	- Commercial Real Estate Data: Total return data for commercial properties is sourced from the NECRIF NPI dataset, which provides historical financial performance metrics for commercial real estate in various U.S. regions.

	The combined dataset enables the model to make regional-level predictions based on various influencing factors. The model can provide localized forecasts that account for region-specific climate risks and economic conditions by focusing on these core features.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing

	Time-series data were processed with a lookback window of 48 quarters (12 years) to capture long-term dependencies. Data normalization was applied using MinMax scaling.

	- Sequence Creation: To prepare the data for time-series forecasting, each input sequence to the model spans a lookback window of 48 quarters (12 years). This approach allows the model to consider both short-term fluctuations and long-term trends in the data, enhancing its ability to capture dependencies between climate, economic, and real estate variables over time.
	- Normalization: MinMax scaling is applied to each feature, transforming values to a range between 0 and 1. This scaling is significant for neural networks, as it standardizes the range of inputs, enabling the model to converge more efficiently and ensuring that features contribute proportionally to the predictions.
	- Model Architecture: The model uses an LSTM neural network architecture designed for sequential data. It includes two stacked LSTM layers with 70 and 35 units, respectively, and uses dropout and L2 regularization to reduce the risk of overfitting in high-dimensional data.
	- Training and Testing Split: The dataset is split into training and testing sets using an 80-20 split. The training set fits the model, while the testing set evaluates its performance on unseen data. Early stopping is applied to prevent overfitting, halting training if the validation loss does not improve after a set number of epochs.

	#### Training Hyperparameters

	- Training regime: fp32
	- Batch processing: Enabled, with early stopping and learning rate scheduling.
	- Optimizer: Adam with gradient clipping (clip value = 1.0).
	- Loss function: Mean Squared Error (MSE).

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	Testing was conducted on an 80-20 train-test split, with early stopping implemented to prevent overfitting

	#### Metrics and Results

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	Root Mean Squared Error (RMSE) was used, achieving an RMSE of 0.001, highlighting the model’s efficiency in balancing prediction accuracy with computational resources.

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Experiments were conducted using Google Cloud Platform in the US-central1 region, which has a carbon efficiency of 0.57 kgCO₂eq/kWh. A total of 40 hours of computation were performed on an RTX 4090 GPU (TDP of 300W), resulting in estimated emissions of 6.84 kgCO₂eq. These emissions were fully offset by Google Cloud Platform, meaning there was no net carbon impact from these experiments.

	Estimations were conducted using the [MachineLearning Impact calculator](https://mlco2.github.io/impact#compute}{MachineLearning).


	```
	\usepackage{hyperref}

	\subsection{CO2 Emission Related to Experiments}

	Experiments were conducted using Google Cloud Platform in region us-central1, which has a carbon efficiency of 0.57 kgCO$_2$eq/kWh. A cumulative of 40 hours of computation was performed on hardware of type RTX 4090 (TDP of 300W).

	Total emissions are estimated to be 6.84 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider.

	Estimations were conducted using the \href{https://mlco2.github.io/impact#compute}{MachineLearning Impact calculator} presented in \cite{lacoste2019quantifying}.

	@article{lacoste2019quantifying,
	title={Quantifying the Carbon Emissions of Machine Learning},
	author={Lacoste, Alexandre and Luccioni, Alexandra and Schmidt, Victor and Dandres, Thomas},
	journal={arXiv preprint arXiv:1910.09700},
	year={2019}
	}

	```

	## Glossary
	- LSTM (Long Short-Term Memory): a type of recurrent neural network effective for time-series forecasting.
	- CBSA (Core-Based Statistical Area): A U.S. Census Bureau-defined region consisting of one or more counties anchored by a large population center. CBSAs are widely used in economic and demographic analysis as they reflect metropolitan and micropolitan areas.
	- CRE (Commercial Real Estate): Property primarily used for business purposes, including office spaces, retail, industrial buildings, and multifamily housing.
	- MinMax Scaling: A preprocessing technique that transforms data by scaling each feature to a specified range, typically [0, 1]. This approach is common in ML pipelines to enhance model performance.
	- NECRIF (National Council of Real Estate Investment Fiduciaries): A nonprofit organization providing performance data for commercial real estate in the U.S., often through its National Property Index (NPI).
	- NPI (National Property Index): An index from NECRIF that tracks the financial performance of a large sample of commercial real estate properties across the U.S. It is a widely used benchmark for CRE investment performance.
	- PRISM (Parameter-elevation Regressions on Independent Slopes Model): A climate dataset that offers high-resolution historical weather data, widely used in environmental research.
	- RMSE (Root Mean Squared Error): A metric used to measure the accuracy of predictions in regression tasks. RMSE represents the square root of the average squared difference between predicted and actual values, with lower values indicating better performance.

	## Model Card Contact

	- Vitor Barros - vitor@climateindex.ai