Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- cbc-reference-model
|
| 5 |
+
- mlops-100-day
|
| 6 |
+
- demand-forecasting
|
| 7 |
+
- time-series
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# CBC Reference Model: Retail Demand Forecasting (NYC Taxi hourly)
|
| 11 |
+
|
| 12 |
+
> Pre-trained reference model for the **CBC [MLOps 100-Day Track](https://github.com/careerbytecode/cbc-learning-hub/tree/main/100-days/mlops)** (Capstone 3). Published twin of ML Development Capstone 3.
|
| 13 |
+
|
| 14 |
+
## Model details
|
| 15 |
+
- **Type:** XGBoost regressor (n_estimators=300, max_depth=4) on 12 past-only lag/rolling/calendar features of an hourly demand series. Seed 42, CPU, 0.46 MB.
|
| 16 |
+
- **Framework:** xgboost 3.2.0 · **Serialization:** joblib (full XGBRegressor; `.predict(DataFrame)` -> predicted trips).
|
| 17 |
+
- C3 is deliberately **classical**: a univariate LSTM (ML Dev Day 74) loses to this GBM on a medium-size, strongly-seasonal series.
|
| 18 |
+
|
| 19 |
+
## Intended use
|
| 20 |
+
A next-hour demand estimate to support staffing/dispatch planning. Decision support, not an automated control signal. Teaching/reference artifact.
|
| 21 |
+
|
| 22 |
+
## Training data
|
| 23 |
+
NYC Yellow Taxi trip records, January 2024, aggregated to a 744-hour count series (a stand-in for retail demand). Public NYC.gov open data, no PII. NYC.gov Terms of Use (not CC0).
|
| 24 |
+
|
| 25 |
+
## Metrics (untouched holdout + walk-forward, evaluated once)
|
| 26 |
+
| Model | Holdout MAE | RMSE | MAPE | R2 |
|
| 27 |
+
|---|---|---|---|---|
|
| 28 |
+
| previous-hour naive (lag_1) | 639.67 | 831.09 | 31.24% | - |
|
| 29 |
+
| same-hour-last-week naive (lag_168) | 270.05 | 418.94 | 7.82% | - |
|
| 30 |
+
| **XGBoost** | **279.39** | **389.85** | 12.24% | 0.9716 |
|
| 31 |
+
|
| 32 |
+
On a single holdout the model wins RMSE but trails the strong same-hour-last-week naive on MAE. The honest headline is **walk-forward** (the reliable estimate): XGBoost MAE 384.20 vs that naive 427.32 (per-fold [600.3, 402.5, 399.1, 252.6, 266.5]) — the model beats it on the estimate that matters. Top features: lag_1 0.51, lag_168 0.25, hour 0.11, lag_24 0.04.
|
| 33 |
+
|
| 34 |
+
## How to load and predict
|
| 35 |
+
```python
|
| 36 |
+
import joblib, json, pandas as pd
|
| 37 |
+
from huggingface_hub import hf_hub_download
|
| 38 |
+
|
| 39 |
+
model = joblib.load(hf_hub_download("careerbytecode/mlops-ref-retail-demand", "model/pipeline.joblib"))
|
| 40 |
+
sample = json.load(open(hf_hub_download("careerbytecode/mlops-ref-retail-demand", "sample_input.json")))
|
| 41 |
+
trips = float(model.predict(pd.DataFrame([sample]))[0])
|
| 42 |
+
print(trips) # predicted next-hour demand
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
Input schema: 12 past-only features (lags 1/2/3/24/168, rolling mean/std, hour, day_of_week, is_weekend, day_of_month) computed from real history.
|
| 46 |
+
|
| 47 |
+
## Limitations
|
| 48 |
+
- Hourly demand is strongly weekly-periodic — the one-line same-hour-last-week naive is a very strong bar; the model only modestly beats it (walk-forward).
|
| 49 |
+
- Trained on a single month (744 hours); longer seasonality (holidays, weather, trend) is not represented and will drift.
|
| 50 |
+
- Features are past-only; serving must supply the same 12 values from real history. Reference/teaching artifact only.
|
| 51 |
+
|
| 52 |
+
---
|
| 53 |
+
© 2015-2026 CareerByteCode. All rights reserved. | CC BY-NC-SA 4.0 (docs), MIT (code) | Authored by Raghavendra R, Platform Owner CareerByteCode, Solution Architect
|