|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- time-series |
|
|
- forecasting |
|
|
- sales |
|
|
- lstm |
|
|
- arima |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- rmse |
|
|
- mae |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 📈 Retail Sales Forecasting with ARIMA and LSTM |
|
|
|
|
|
## Model Details |
|
|
This project compares two forecasting approaches for retail sales prediction: |
|
|
- **ARIMA (AutoRegressive Integrated Moving Average)** |
|
|
- **LSTM (Long Short-Term Memory neural network)** |
|
|
|
|
|
The models were trained and evaluated using a **rolling window evaluation** across 40 windows. Both models were assessed using RMSE, MAE, and stability metrics. |
|
|
|
|
|
**Frameworks used:** |
|
|
- ARIMA: `statsmodels`, `pmdarima` |
|
|
- LSTM: `TensorFlow / Keras` |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation Summary |
|
|
|
|
|
| Model | Avg RMSE | Std RMSE | Avg MAE | Std MAE | Min RMSE | Max RMSE | |
|
|
|-------|----------|----------|---------|---------|----------|----------| |
|
|
| **ARIMA** | 3,175,011.32 | 763,036.27 | 2,448,829.54 | 950,375.79 | 2,195,796.46 | 6,171,402.14 | |
|
|
| **LSTM** | 3,169,389.04 | 729,371.84 | 2,435,767.99 | 912,025.28 | 2,252,916.59 | 6,016,384.05 | |
|
|
|
|
|
🏆 **Winner: LSTM** – better generalization and stability. |
|
|
|
|
|
--- |
|
|
|
|
|
## Short Report: Model Generalization Discussion |
|
|
|
|
|
Both ARIMA and LSTM models were evaluated using a rolling window approach over 40 windows. The results indicate that **LSTM slightly outperforms ARIMA**, with an average RMSE of **3,169,389** compared to ARIMA’s **3,175,011**. While the absolute difference is small (≈0.2%), the consistency of LSTM predictions is notable, as reflected in its lower RMSE standard deviation (**729,371 vs. 763,036**). |
|
|
|
|
|
The reason LSTM generalizes better is that it can **capture complex nonlinear temporal dependencies** in the sales data, which ARIMA (a linear statistical model) cannot fully represent. This advantage is especially relevant for retail sales, where seasonality, promotions, and external factors often introduce nonlinear fluctuations. |
|
|
|
|
|
In terms of stability, LSTM’s tighter error range suggests that it adapts more consistently across different rolling windows, further supporting its robustness. While ARIMA remains a strong baseline for time series forecasting, LSTM demonstrates better **generalization capability** due to its ability to learn hidden patterns that extend beyond simple trend and seasonality. |
|
|
|
|
|
**Conclusion:** |
|
|
The **LSTM model generalizes better** than ARIMA for this dataset because it handles complex patterns and provides more stable performance, making it the preferred choice for future forecasting. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
- **Task:** Time Series Forecasting (Daily Retail Sales Prediction) |
|
|
- **Domain:** Retail / E-commerce demand forecasting |
|
|
- **Forecast Horizon:** 30 days |
|
|
|
|
|
This model can be applied to **predict daily sales**, assist in **inventory management**, **staff scheduling**, and **strategic planning**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Procedure |
|
|
- **Dataset:** Rossmann-like retail sales dataset (merged `train.csv` + `store.csv`) |
|
|
- **Preprocessing:** |
|
|
- Removed closed days and zero-sales days |
|
|
- Aggregated to daily level |
|
|
- Scaled features for LSTM |
|
|
- **Models:** |
|
|
- ARIMA (`pmdarima.auto_arima`) |
|
|
- LSTM with 3 stacked layers (25 units each, dropout 0.2) |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
- Requires sufficient historical data (at least 1–2 years of daily sales). |
|
|
- LSTM model may overfit with small datasets. |
|
|
- External events (holidays, promotions, economic shocks) are not explicitly modeled. |
|
|
|
|
|
--- |
|
|
|
|
|
## Future Work |
|
|
- Add exogenous features (holidays, weather, promotions). |
|
|
- Compare with hybrid models (Prophet + LSTM, ARIMA + XGBoost). |
|
|
- Deploy as an API for real-time forecasting. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
If you use this model, please cite: |
|
|
|
|
|
|
|
|
|
|
|
|