Spaces:
Sleeping
Sleeping
| title: Retail Demand Forecaster | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: "5.9.1" | |
| app_file: app/gradio_app.py | |
| pinned: false | |
| python_version: "3.11" | |
| # Retail Demand Forecasting | |
|  | |
|  | |
|  | |
| [](https://huggingface.co/spaces/fikri0o0/demand-forecasting) | |
|  | |
| End-to-end retail demand forecasting pipeline. Compares **5 approaches** from naive baseline to Amazon Chronos-2 (2025 SOTA foundation model), with probabilistic prediction intervals, MLflow tracking, and a live Gradio demo. | |
| **[Live Demo β](https://huggingface.co/spaces/fikri0o0/demand-forecasting)** | **[GitHub β](https://github.com/Fikri645/demand-forecasting)** | |
| --- | |
| ## Highlights | |
| | What | Detail | | |
| |---|---| | |
| | **Dataset** | Store Sales (CorporaciΓ³n Favorita) β 54 stores, 33 families, 4.5 years + oil price + holidays | | |
| | **Models** | Seasonal Naive β AutoARIMA β LightGBM β Amazon Chronos-2 β Ensemble | | |
| | **Best model** | Ensemble (LightGBM-Optuna + Chronos-ft) β RMSLE **0.1610**, MASE **0.835** | | |
| | **Fine-tuning** | Chronos-2: zero-shot 0.2040 β 1000-step fine-tune **0.1690** β ensemble **0.1610** | | |
| | **Key insight** | Ensemble wins only when both components are strong β fine-tuning Chronos was the unlock | | |
| | **Prediction intervals** | 80% + 90% bands via conformal prediction | | |
| | **Metric** | RMSLE β penalises under-forecasting (stockout > overstock in cost) | | |
| | **Experiment tracking** | MLflow β all model runs logged | | |
| | **API** | FastAPI `/forecast` endpoint | | |
| | **UI** | Gradio β interactive 28-day forecast chart | | |
| | **Deployment** | HuggingFace Spaces | | |
| --- | |
| ## Architecture | |
| ``` | |
| Store Sales CSV (Kaggle) / M5 fallback (datasetsforecast) | |
| βββΊ data_loader.py (load, fill date gaps, train/test split) | |
| βββΊ features.py (lag, rolling, calendar, oil price, holiday features) | |
| βββΊ train_lgbm.py (LightGBM via mlforecast + MLflow) | |
| βββΊ train_chronos.py (Chronos-2 zero-shot β no training, requires GPU) | |
| βββΊ experiments.py (5-model comparison -> model_meta.json) | |
| βββΊ evaluate.py (forecast plots, metrics comparison) | |
| βββΊ api/main.py (FastAPI /forecast) | |
| βββΊ app/gradio_app.py (HF Spaces UI) | |
| ``` | |
| --- | |
| ## Quickstart | |
| ```bash | |
| # 1. Clone & install | |
| git clone https://github.com/Fikri645/demand-forecasting | |
| cd demand-forecasting | |
| pip install -r requirements-dev.txt | |
| # 2a. (Option A) Download Store Sales from Kaggle β put zip in data/raw/ then: | |
| python scripts/download_data.py | |
| # 2b. (Option B) Auto-download M5 via datasetsforecast (no Kaggle needed) | |
| # Just run the script β it will use M5 as fallback automatically | |
| python scripts/download_data.py | |
| # 3. Run full experiment (5 models + MLflow logging) | |
| python -m src.experiments | |
| # 4. Generate evaluation plots | |
| python -m src.evaluate | |
| # 5. Run API locally | |
| uvicorn api.main:app --reload | |
| # 6. Run Gradio UI | |
| python app/gradio_app.py | |
| ``` | |
| Or via `make`: | |
| ```bash | |
| make install && make data && make experiments && make evaluate | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| demand-forecasting/ | |
| βββ data/processed/ # train.parquet, test.parquet | |
| βββ src/ | |
| β βββ config.py # paths, constants | |
| β βββ data_loader.py # Store Sales (Favorita) loading + gap fill + M5 fallback | |
| β βββ features.py # lag, rolling, calendar feature engineering | |
| β βββ metrics.py # RMSE, MAE, RMSLE, MASE, coverage | |
| β βββ train_lgbm.py # LightGBM via mlforecast | |
| β βββ train_chronos.py # Amazon Chronos-2 (zero-shot) | |
| β βββ experiments.py # 5-model comparison + MLflow | |
| β βββ evaluate.py # forecast + comparison plots | |
| βββ api/main.py # FastAPI /forecast endpoint | |
| βββ app/gradio_app.py # Gradio UI (HF Spaces) | |
| βββ notebooks/01_eda.ipynb # Exploratory Data Analysis | |
| βββ tests/ # pytest (metrics, features, API schemas) | |
| βββ Makefile | |
| βββ requirements-dev.txt | |
| ``` | |
| --- | |
| ## Dataset β Store Sales (Corporacion Favorita) | |
| The **Store Sales - Time Series Forecasting** competition (Kaggle) uses real data from Ecuador's largest grocery chain: | |
| - **54 stores**, 33 product families, daily unit sales | |
| - **4.5 years**: 2013-01-01 to 2017-08-15 (1,684 days) | |
| - External features: **oil price** (Ecuador is oil-dependent β economic shocks affect spending), **national/regional holidays**, **promotions** | |
| - Portfolio uses top 300 series by total volume | |
| Source: [Kaggle Store Sales Competition](https://www.kaggle.com/competitions/store-sales-time-series-forecasting) | |
| > M5 (Walmart, via `datasetsforecast`) available as automatic fallback if CSV not present. | |
| --- | |
| ## Model Details | |
| ### Seasonal Naive (baseline) | |
| Forecast = same weekday last week. Any real model must beat this. | |
| ### AutoARIMA | |
| `statsforecast` AutoARIMA with weekly seasonality. Automatic order selection via AIC. | |
| ### LightGBM + Feature Engineering | |
| `mlforecast` with automatic lag generation: | |
| - **Lags**: t-7, t-14, t-21, t-28, t-35, t-42, t-56, t-364 (same day last year) | |
| - **Rolling**: 7-day and 28-day mean, std, max per series | |
| - **Calendar**: day-of-week, month, quarter, is-weekend, month-start/end | |
| - **Price**: normalised sell price, price change % | |
| - **External**: oil price, promotion flag, holiday flag (Store Sales specific) | |
| ### Amazon Chronos-2 (2025 SOTA) | |
| Zero-shot foundation model β no training data needed. Loads pre-trained weights (`amazon/chronos-t5-small`, 250M params) from HuggingFace. Generates 100 probabilistic samples -> P10/P50/P90 quantiles. | |
| > Chronos-2 (Oct 2025) natively supports cross-series dependencies, exogenous features, and multivariate forecasting. Zero-shot performance competitive with fully-supervised models. | |
| **Requirements:** Chronos needs PyTorch with CUDA and sufficient virtual memory (page file >= 8GB on Windows). Run `python -m src.train_chronos` after increasing virtual memory. Code is complete and ready. | |
| ### Ensemble | |
| Weighted average: LightGBM x 0.6 + Chronos x 0.4. Combines domain-feature awareness with temporal pattern recognition. Run `python -m src.experiments` after Chronos is available. | |
| --- | |
| ## Results β 28-Day Forecast on Store Sales (300 series) | |
| | Model | RMSLE | MASE | Notes | | |
| |---|---|---|---| | |
| | Seasonal Naive | 0.2145 | 1.109 | Benchmark floor | | |
| | AutoARIMA | 0.2105 | 1.121 | Worse than naive on this dataset | | |
| | Chronos-2 (zero-shot) | 0.2040 | 1.038 | Beats AutoARIMA with zero training | | |
| | LightGBM (default) | 0.1672 | 0.877 | Strong baseline | | |
| | LightGBM (Optuna, 50 trials) | 0.1671 | 0.880 | Marginal gain β default was already good | | |
| | Chronos-2 (fine-tuned, 1000 steps) | 0.1690 | 0.863 | +17.2% vs zero-shot | | |
| | Chronos-2 (extended, 3000 steps) | 0.1688 | 0.863 | Converged at ~1000 steps | | |
| | **Ensemble (LGB-Optuna Γ 0.5 + Chronos-ft Γ 0.5)** | **0.1610** | **0.835** | **π Best β 25% vs naive** | | |
| **Key findings:** | |
| - **Ensemble wins β but only when both components are strong.** Zero-shot Chronos dragged the first ensemble down. Once Chronos was fine-tuned, a 50/50 ensemble cuts RMSLE to 0.1610 (3.7% better than either alone). | |
| - **LightGBM was already near-optimal.** 50 Optuna trials only improved RMSLE by 0.0001 β the default hyperparameters were well-calibrated. Lesson: diminishing returns on HPO when the model class fits the data well. | |
| - **Chronos converges fast.** The jump from zero-shot (0.2040) to 1000 steps (0.1690) is massive; from 1000 to 3000 steps only 0.0002 more. Pre-training provides a warm start that requires very few gradient updates. | |
| - **Foundation models + feature engineering are complementary.** Chronos captures long-range temporal patterns; LightGBM captures domain features (oil price, promotions, day-of-week). Neither alone beats the combination. | |
| --- | |
| ## Why RMSLE? | |
| In retail, **running out of stock costs more than overstock**. RMSLE operates in log-space, which: | |
| 1. Penalises under-forecasting more than over-forecasting | |
| 2. Gives equal relative weight to low-volume and high-volume SKUs | |
| 3. Aligns the metric with actual business cost structure | |
| --- | |
| ## What I Learned | |
| - **Ensemble wins only when both components are competitive.** Zero-shot Chronos (RMSLE 0.2040) + LightGBM (0.1672) = 0.1722 (worse than LightGBM alone). Fine-tuned Chronos (0.1690) + LightGBM-Optuna (0.1671) = **0.1610** (new best). The lesson: fix the weaker model first, then ensemble. | |
| - **LightGBM is already near-optimal with default hyperparameters.** 50 Optuna trials improved RMSLE by only 0.0001. When the model class fits the data well, HPO has diminishing returns. | |
| - **Foundation models converge fast from pre-training.** Zero-shot β 1000 steps: RMSLE drops 0.035 (massive). 1000 β 3000 steps: only 0.0002. Pre-training on diverse time series provides a warm start β most adaptation happens in the first few hundred steps. | |
| - **Chronos + LightGBM are complementary.** Chronos captures long-range temporal structure and seasonal patterns; LightGBM captures domain features (oil price, promotions, day-of-week). Their errors are not correlated β hence the ensemble gain. | |
| - **AutoARIMA fails on complex retail.** MASE 1.12 = worse than seasonal naive. Lag features + calendar + oil price give tree models the context that ARIMA's linear structure can't model. | |
| - **MASE < 1.0 is the real bar.** Only LightGBM, fine-tuned Chronos, and their ensemble clear it. AutoARIMA and zero-shot Chronos both fail to beat the naive baseline on MASE. | |
| - **lag_364 (same day last year) is critical.** Annual cycles in retail (back-to-school, holidays, oil price cycles) are only captured by a 1-year lag β shorter lags miss this entirely. | |