DataSynthis_ML_JobTask / README.md

AnnNaserNabil

Update README.md

9a5c0cc verified 4 months ago

preview code

raw

history blame contribute delete

4.42 kB

metadata

license: mit
language:
  - en

Stock Price Forecasting for Google (2004–2022)

This repository contains the implementation and results of stock price forecasting for Google's historical data (2004–2022) using two models: ARIMA and Temporal Convolutional Network (TCN). The goal was to predict future stock prices and evaluate model performance using various metrics.

Experiment Overview

The experiment aimed to forecast Google's stock prices using two distinct approaches:

ARIMA: A statistical time-series model optimized using the pmdarima library.
Temporal Convolutional Network (TCN): A deep learning model designed for sequence modeling, tuned with a grid search over hyperparameters.

Both models were trained and evaluated on Google's stock price data from 2004 to 2022. The evaluation metrics include Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), R², and Direction Accuracy (for ARIMA).

Methodology

1. ARIMA Model

Library: Used pmdarima for automatic ARIMA order selection based on the Akaike Information Criterion (AIC).
Approach: Employed a sliding window technique to evaluate different window sizes: [30, 60, 90, 120, 180, 200, 250] days.
Training: For each window size, the model was trained on historical data and tested on the next time step.
Evaluation Metrics:
- MAE (Log): Mean Absolute Error on log-transformed prices.
- RMSE (Log): Root Mean Squared Error on log-transformed prices.
- MAE (Price): Mean Absolute Error on raw stock prices.
- RMSE (Price): Root Mean Squared Error on raw stock prices.
- Direction Accuracy: Percentage of correct predictions for price movement direction.
Best Window: Determined by the lowest RMSE (Price).

ARIMA Results

The best-performing window size was 90 days, with the following metrics:

Window Size	MAE (Log)	RMSE (Log)	MAE (Price)	RMSE (Price)	Direction Accuracy
90	0.013572	0.019500	589.623929	767.501125	0.220339
250	0.013374	0.019387	591.532659	794.248966	0.325424
200	0.013437	0.019336	608.357138	805.855776	0.277966
30	0.013885	0.019906	595.957410	813.015790	0.170621
180	0.013556	0.019484	618.712818	820.254764	0.240678
60	0.013644	0.019658	662.900353	875.255252	0.198870
120	0.013619	0.019585	718.594515	919.414804	0.190960

Best Window for RMSE (Price): 90 days (RMSE: 767.501125)

2. Temporal Convolutional Network (TCN)

Approach: A TCN model was implemented with a grid search over multiple hyperparameters to identify the best configuration.
Hyperparameters:
- Sequence Lengths: [20, 50]
- Batch Sizes: [16, 32]
- Learning Rates: [0.001, 0.0005]
- Kernel Sizes: [3, 5]
- Number of Channels: [[32, 64, 128], [64, 128, 256]]
- Dropout Rates: [0.1, 0.2]
Training: The model was trained on all unique combinations of the hyperparameter grid, and performance was evaluated on a test set.
Evaluation Metrics:
- MAE: Mean Absolute Error on stock prices.
- RMSE: Root Mean Squared Error on stock prices.
- MAPE: Mean Absolute Percentage Error.
- R²: Coefficient of determination.

TCN Results

The best-performing TCN configuration was:

Sequence Length: 50
Batch Size: 16
Learning Rate: 0.0005
Kernel Size: 3
Number of Channels: [32, 64, 128]
Dropout: 0.1

Metrics for Best TCN Model:

MAE: 9.25931
RMSE: 14.984981
MAPE: 1.977077%
R²: 0.999459953983791

The full results for all hyperparameter combinations are available in the tcn_results.csv file in the repository.

Model Comparison

The TCN model significantly outperformed the ARIMA model across key metrics. Below is a comparison of the best configurations for each model:

Model	MAE (Price)	RMSE (Price)	MAPE (%)	R²
ARIMA (90 days)	589.623929	767.501125	N/A	N/A
TCN (Best)	9.25931	14.984981	1.977077	0.99946