careerbytecode
/

mlops-ref-retail-demand

cbc-reference-model

demand-forecasting

Model card Files Files and versions

mlops-ref-retail-demand / README.md

architectraghu's picture

Upload README.md

eb47bed verified 5 days ago

|

history blame contribute delete

3.08 kB

	---
	license: mit
	tags:
	- cbc-reference-model
	- mlops-100-day
	- demand-forecasting
	- time-series
	---

	# CBC Reference Model: Retail Demand Forecasting (NYC Taxi hourly)

	> Pre-trained reference model for the CBC [MLOps 100-Day Track](https://github.com/careerbytecode/cbc-learning-hub/tree/main/100-days/mlops) (Capstone 3). Published twin of ML Development Capstone 3.

	## Model details
	- Type: XGBoost regressor (n_estimators=300, max_depth=4) on 12 past-only lag/rolling/calendar features of an hourly demand series. Seed 42, CPU, 0.46 MB.
	- Framework: xgboost 3.2.0 · Serialization: joblib (full XGBRegressor; `.predict(DataFrame)` -> predicted trips).
	- C3 is deliberately classical: a univariate LSTM (ML Dev Day 74) loses to this GBM on a medium-size, strongly-seasonal series.

	## Intended use
	A next-hour demand estimate to support staffing/dispatch planning. Decision support, not an automated control signal. Teaching/reference artifact.

	## Training data
	NYC Yellow Taxi trip records, January 2024, aggregated to a 744-hour count series (a stand-in for retail demand). Public NYC.gov open data, no PII. NYC.gov Terms of Use (not CC0).

	## Metrics (untouched holdout + walk-forward, evaluated once)
	\| Model \| Holdout MAE \| RMSE \| MAPE \| R2 \|
	\|---\|---\|---\|---\|---\|
	\| previous-hour naive (lag_1) \| 639.67 \| 831.09 \| 31.24% \| - \|
	\| same-hour-last-week naive (lag_168) \| 270.05 \| 418.94 \| 7.82% \| - \|
	\| XGBoost \| 279.39 \| 389.85 \| 12.24% \| 0.9716 \|

	On a single holdout the model wins RMSE but trails the strong same-hour-last-week naive on MAE. The honest headline is walk-forward (the reliable estimate): XGBoost MAE 384.20 vs that naive 427.32 (per-fold [600.3, 402.5, 399.1, 252.6, 266.5]) — the model beats it on the estimate that matters. Top features: lag_1 0.51, lag_168 0.25, hour 0.11, lag_24 0.04.

	## How to load and predict
	```python
	import joblib, json, pandas as pd
	from huggingface_hub import hf_hub_download

	model = joblib.load(hf_hub_download("careerbytecode/mlops-ref-retail-demand", "model/pipeline.joblib"))
	sample = json.load(open(hf_hub_download("careerbytecode/mlops-ref-retail-demand", "sample_input.json")))
	trips = float(model.predict(pd.DataFrame([sample]))[0])
	print(trips) # predicted next-hour demand
	```

	Input schema: 12 past-only features (lags 1/2/3/24/168, rolling mean/std, hour, day_of_week, is_weekend, day_of_month) computed from real history.

	## Limitations
	- Hourly demand is strongly weekly-periodic — the one-line same-hour-last-week naive is a very strong bar; the model only modestly beats it (walk-forward).
	- Trained on a single month (744 hours); longer seasonality (holidays, weather, trend) is not represented and will drift.
	- Features are past-only; serving must supply the same 12 values from real history. Reference/teaching artifact only.

	---
	© 2015-2026 CareerByteCode. All rights reserved. \| CC BY-NC-SA 4.0 (docs), MIT (code) \| Authored by Raghavendra R, Platform Owner CareerByteCode, Solution Architect