--- license: apache-2.0 tags: - cgm - time-series - glucose-forecasting - lightgbm - metabonet library_name: transformers pipeline_tag: time-series-forecasting --- # LightGBM multi-horizon CGM forecaster (MetaboNet) A `MultiOutputRegressor(LGBMRegressor)` trained on the MetaboNet tabular split and re-packaged as a `transformers`-compatible Hub model. One repo holds four feature ablations, each with 12 boosters (one per 5-minute horizon up to 60 min): - `cgm` — 24 CGM lags + `hour_sin`/`hour_cos` (26 features). - `insulin` — `cgm` features + 24 Insulin lags (50 features). - `carbs` — `cgm` features + 24 Carbs lags (50 features). - `all` — `cgm` features + 24 Insulin lags + 24 Carbs lags (74 features). ## Files - `config.json` — `auto_map` wiring + per-ablation feature lists. - `model.py` — `LightGBMMultiHorizonConfig` / `LightGBMMultiHorizonModel` (`trust_remote_code=True`). - `boosters//horizon_.txt` — `Booster.save_model` text dumps (4 ablations x 12 horizons = 48 files). ## Usage ```python from transformers import AutoConfig, AutoModel cfg = AutoConfig.from_pretrained( "anonymous-4FAD/LightGBM", trust_remote_code=True, ablation="cgm" ) model = AutoModel.from_pretrained( "anonymous-4FAD/LightGBM", trust_remote_code=True, config=cfg ) # Inputs match the MetaboNet benchmark.py contract: # timestamps: int64 ns, shape (B, T_in) # cgm/insulin/carbs: float, shape (B, T_in); only the last 24 steps are used preds = model.predict(timestamps, cgm, insulin, carbs) # -> (B, 12) ``` The thin local wrapper in [`models/lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/models/lightgbm.py) exposes the same API used by `benchmark.py`. `lightgbm>=4.0` must be installed locally (boosters are loaded via `lightgbm.Booster(model_file=...)`); inference is CPU-only. ## Feature convention `CGM_t` denotes the i-th sample within the last `history_length=24` steps, ordered oldest -> newest. Same for `Insulin_t` / `Carbs_t`. `hour_sin` and `hour_cos` come from the most recent input timestamp. The original boosters were trained on numpy arrays so the feature names embedded in the boosters are anonymized (`Column_0..`); the explicit names listed in `config.json` come from the matched Ridge artifacts (same preprocessing schema, same column order). ## Provenance Trained via [`other_models/results/train_lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/other_models/results/train_lightgbm.py) on the public MetaboNet train split. The Hub repo is staged by [`scripts/build_other_models_hub.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/scripts/build_other_models_hub.py) which copies the booster text files verbatim and writes `config.json`.