Bike Sharing - Tabular Models (8 architectures + interpretability, Poisson)

Pre-trained models for the t22000t/bike-sharing-tabular dataset, covering all eight architectures from the tabular-data-modelling-pipeline with a Poisson loss matching the count target.

v3 release - adds the full interpretability stack: SHAP TreeExplainer for the GBMs, Captum Integrated Gradients for the DL models, per-layer attention for FT-Transformer, residual analysis for CANN / CANN-GBM, per-row coefficients for LocalGLMnet, and full distributional outputs (mean, variance, quantiles, VaR) for DRN. See INTERPRETABILITY.md and dashboard_dl_interpretability.html.

Results

All 8 architectures trained with default hyperparameters (no Optuna tuning), 3-seed ensembles for the DL models, Poisson family + log link.

Rank	Model	Test Gini	Test MAE	Test RMSE	A/E ratio	n params	Training time
1	XGBoost	0.4975	22.9	38.65	1.007	999 trees	1.2 s
2	CatBoost	0.4935	29.7	45.87	1.021	583 trees	3.3 s
3	CANN-GBM	0.4876	38.99	60.18	0.954	33,201	29.5 s
4	LocalGLMnet	0.3137	118.06	166.09	0.894	11,139	27.3 s
5	CANN	0.3085	120.68	168.60	0.879	33,201	29.6 s
6	DRN	0.2926	3,819.48†	7,117.12	0.047	33,396	30.6 s
7	TabM	0.1628	143.12	178.55	0.942	264,414	304.4 s
8	FT-Transformer	0.1236	142.74	178.45	0.950	464,067	186.6 s
-	Stacked ensemble (NNLS)	0.4975	22.9	38.65	1.007	(9 weights)	-

† DRN has a calibration issue on this dataset: rank-order discrimination (Gini=0.29) is reasonable but predicted magnitudes are ~20× the actuals (MAE 3819 vs target mean ~190). The model's distributional output scaling is off. Treat DRN's predictions as rank-only; do not interpret them on the count scale.

Test set: 3,470 hourly rows (20% of 17,379)
Target: cnt (hourly bike rental count, 1-977)
Loss: Poisson NLL (count:poisson for XGBoost, Poisson for CatBoost)
Link: log
Random seed: 42

MAE units are bikes. A test MAE of ~23 on a target with median 142 is genuinely useful.

Interpretability (new in v3)

Multiple methods applied side by side. With 17,379 hourly rows the data is rich enough that all methods agree closely on the dominant drivers:

Method	Applies to	What it measures
SHAP TreeExplainer	CatBoost, XGBoost	Per-row Shapley contribution to the model's log-prediction.
Native importance (CatBoost / XGBoost)	CatBoost, XGBoost	Loss reduction (CatBoost) / gain (XGBoost).
Captum Integrated Gradients	All 6 DL architectures	Gradient-based attribution for continuous features.
FT-Transformer attention	FT-Transformer	Per-layer multi-head self-attention weights.
CANN / CANN-GBM residual analysis	CANN, CANN-GBM	Distribution of NN's correction to GBM/GLM base. Here mean=0.043, std=0.286 - the NN added a meaningful but not dominant correction.
LocalGLMnet coefficients	LocalGLMnet	Per-row linear coefficients (500 rows × 7 continuous features).
DRN distributional output	DRN	Mean shape=2.15, CoV=0.97, VaR95=35,446, VaR99=53,159 (units = bike count).

Cross-method consensus (the high-confidence finding)

Both CatBoost and XGBoost ranked these five features in their top-5:

Feature	CatBoost	XGBoost	Interpretation
`hr` (hour of day)	0.882	0.865	The dominant driver - by a wide margin. Commute peaks (~8am, ~5pm), midday lull, late-night near-zero.
`yr` (2011 vs 2012)	0.232	0.221	Capital Bikeshare expanded significantly between years. The system grew, so all-else-equal demand doubled.
`temp` (normalised)	0.164	0.196	Warmer weather -> more rentals (up to a heat-stress ceiling not captured here).
`workingday`	0.132	0.145	Working days have commute peaks; weekends/holidays have midday peaks. The interaction with `hr` is strong.
`season`	0.104	0.101	Independent of `temp` because it carries daylight + cultural-seasonal effects.

Note that hr dominates so heavily (importance ~5x the next feature) that any model failing to use it heavily is essentially blind. This explains why TabM and FT-Transformer underperform on this dataset despite 17k rows - their attention/aggregation mechanisms appear to dilute the strong univariate signal in hr.

Full breakdown - per-method top-10 tables, LocalGLMnet coefficient distributions, sign-stability analysis - is in INTERPRETABILITY.md. For interactive plots see dashboard_dl_interpretability.html.

Files

File	What it is
`catboost.cbm`	Trained CatBoost (Poisson loss)
`xgboost.json`	Trained XGBoost Booster (count:poisson, base_score=log(mean(y)))
`cann_member{0,1,2}.pt`	CANN 3-seed ensemble
`cann_gbm_member{0,1,2}.pt`	CANN-GBM 3-seed ensemble
`ft_transformer_member{0,1,2}.pt`	FT-Transformer 3-seed ensemble
`tabm_member{0,1,2}.pt`	TabM 3-seed ensemble
`localglmnet_member{0,1,2}.pt`	LocalGLMnet 3-seed ensemble
`drn_member{0,1,2}.pt`	DRN 3-seed ensemble
`evaluation_summary.csv`	Per-model train/test metrics
`ensemble_weights.json`	NNLS weights over the 8 base predictions
`dashboard_dl_models.html`	Performance dashboard (Lorenz, calibration, A/P scatter)
`dashboard_dl_interpretability.html`	Interpretability dashboard (SHAP, IG, attention, residuals)
`feature_importance.csv`	Consolidated importances across CatBoost + XGBoost
`localglmnet_coefficients.csv`	LocalGLMnet per-row coefficients (500 rows × 7 continuous features)
`drn_distributional_outputs.csv`	DRN per-row distributional moments
`INTERPRETABILITY.md`	Human-readable interpretability summary report
`figures/fig_dl_*.png`	Standalone publication figures (incl. attention heatmap)
`model_summary.json`	Structured run record

Loading and inference

CatBoost

from huggingface_hub import hf_hub_download
from catboost import CatBoostRegressor
import pandas as pd

path = hf_hub_download("t22000t/bike-sharing-tabular-models", "catboost.cbm")
model = CatBoostRegressor()
model.load_model(path)

df = pd.read_csv("hf://datasets/t22000t/bike-sharing-tabular/hour.csv")
features = [
    "temp", "atemp", "hum", "windspeed", "hr", "yr", "mnth",
    "season", "holiday", "weekday", "workingday", "weathersit",
]
preds = model.predict(df[features])  # predicted hourly rental count

XGBoost (best overall)

from huggingface_hub import hf_hub_download
import xgboost as xgb

path = hf_hub_download("t22000t/bike-sharing-tabular-models", "xgboost.json")
booster = xgb.Booster()
booster.load_model(path)

Deep-learning architectures

Each ships as a 3-seed ensemble of PyTorch state-dicts. To reconstruct, install the pipeline package and load via the matching architecture class:

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"

Training configuration

Setting	Value
Pipeline	tabular-data-modelling-pipeline v0.1.0
Architectures	All 8 (catboost, xgboost, cann, cann_gbm, ft_transformer, tabm, localglmnet, drn)
Hyperparameters	Defaults - no Optuna tuning
DL ensemble size	3 seeds per architecture
Family / link	Poisson / log
XGBoost objective	`count:poisson` (base_score = log(mean(y)))
CatBoost loss	`Poisson`
Train/test split	Random 80/20, seed 42
Cap percentile	99.9
Hardware	Apple M-series, MPS device for DL
Total wall-clock	~10.7 min

To reproduce:

git clone https://github.com/timothy22000/tabular_data_modelling_pipeline
cd tabular_data_modelling_pipeline
pip install -e ".[all]"
python scripts/download_data.py --dataset bike_sharing

OMP_NUM_THREADS=1 python train.py \
    --config configs/example_bike_sharing.py \
    --input data/bike_sharing.csv \
    --skip-tuning --skip-interpretability \
    --architectures catboost xgboost cann cann_gbm ft_transformer tabm localglmnet drn

(OMP_NUM_THREADS=1 is only needed on macOS arm64.)

Limitations

No Optuna tuning. Defaults only.
DRN calibration off. Rank-only on this dataset (see results note).
FT-Transformer & TabM underperform the GBMs even with 17k rows. These architectures usually need 50k+ rows + tuning to be competitive on tabular tasks. Reported here for completeness, not as a recommendation.
No interpretability artefacts. --skip-interpretability was set for wall-clock; re-run without it for Captum attributions and partial-dependence plots.
Random split, not chronological. Bike rental data has obvious seasonality; a date-based split (train on 2011, test on 2012) would be more realistic.
casual and registered excluded as features (they sum to cnt, i.e. label leakage).

Intended use

Demonstrating the pipeline on count/Poisson data alongside the gamma-family House Prices model collection.
Baseline for tabular DL research on count regression.
Teaching Poisson regression with a realistic mid-sized dataset.

Citation

@article{fanaee2014event,
  title   = {Event labeling combining ensemble detectors and background knowledge},
  author  = {Fanaee-T, Hadi and Gama, Jo{\~a}o},
  journal = {Progress in Artificial Intelligence},
  year    = {2014}
}

@software{tabular_data_modelling_pipeline,
  author = {Mun, Timothy},
  title  = {tabular-data-modelling-pipeline},
  url    = {https://github.com/timothy22000/tabular_data_modelling_pipeline},
  year   = {2026}
}

Please also cite the individual architecture papers - see the main repo README.

License

MIT for the model code and pipeline. Underlying dataset under CC BY 4.0.

📂 Dataset: t22000t/bike-sharing-tabular
🤖 Companion: t22000t/house-prices-tabular-models - gamma family, 1.5k rows
📦 Pipeline: tabular-data-modelling-pipeline

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train t22000t/bike-sharing-tabular-models

Evaluation results

Test Gini (XGBoost, best) on Bike Sharing Demand
self-reported

0.497
Test MAE (XGBoost, count units) on Bike Sharing Demand
self-reported

22.900

t22000t
/

bike-sharing-tabular-models