LightGBM / README.md
anonymous-4FAD's picture
Upload 4 files
53e13eb verified
---
license: apache-2.0
tags:
- cgm
- time-series
- glucose-forecasting
- lightgbm
- metabonet
library_name: transformers
pipeline_tag: time-series-forecasting
---
# LightGBM multi-horizon CGM forecaster (MetaboNet)
A `MultiOutputRegressor(LGBMRegressor)` trained on the MetaboNet tabular split
and re-packaged as a `transformers`-compatible Hub model. One repo holds four
feature ablations, each with 12 boosters (one per 5-minute horizon up to
60 min):
- `cgm` β€” 24 CGM lags + `hour_sin`/`hour_cos` (26 features).
- `insulin` β€” `cgm` features + 24 Insulin lags (50 features).
- `carbs` β€” `cgm` features + 24 Carbs lags (50 features).
- `all` β€” `cgm` features + 24 Insulin lags + 24 Carbs lags (74 features).
## Files
- `config.json` β€” `auto_map` wiring + per-ablation feature lists.
- `model.py` β€” `LightGBMMultiHorizonConfig` / `LightGBMMultiHorizonModel`
(`trust_remote_code=True`).
- `boosters/<ablation>/horizon_<NN>.txt` β€” `Booster.save_model` text dumps
(4 ablations x 12 horizons = 48 files).
## Usage
```python
from transformers import AutoConfig, AutoModel
cfg = AutoConfig.from_pretrained(
"anonymous-4FAD/LightGBM", trust_remote_code=True, ablation="cgm"
)
model = AutoModel.from_pretrained(
"anonymous-4FAD/LightGBM", trust_remote_code=True, config=cfg
)
# Inputs match the MetaboNet benchmark.py contract:
# timestamps: int64 ns, shape (B, T_in)
# cgm/insulin/carbs: float, shape (B, T_in); only the last 24 steps are used
preds = model.predict(timestamps, cgm, insulin, carbs) # -> (B, 12)
```
The thin local wrapper in
[`models/lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/models/lightgbm.py)
exposes the same API used by `benchmark.py`.
`lightgbm>=4.0` must be installed locally (boosters are loaded via
`lightgbm.Booster(model_file=...)`); inference is CPU-only.
## Feature convention
`CGM_t<i>` denotes the i-th sample within the last `history_length=24` steps,
ordered oldest -> newest. Same for `Insulin_t<i>` / `Carbs_t<i>`. `hour_sin`
and `hour_cos` come from the most recent input timestamp. The original
boosters were trained on numpy arrays so the feature names embedded in the
boosters are anonymized (`Column_0..`); the explicit names listed in
`config.json` come from the matched Ridge artifacts (same preprocessing
schema, same column order).
## Provenance
Trained via
[`other_models/results/train_lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/other_models/results/train_lightgbm.py)
on the public MetaboNet train split. The Hub repo is staged by
[`scripts/build_other_models_hub.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/scripts/build_other_models_hub.py)
which copies the booster text files verbatim and writes `config.json`.