---
license: apache-2.0
tags:
  - cgm
  - time-series
  - glucose-forecasting
  - lightgbm
  - metabonet
library_name: transformers
pipeline_tag: time-series-forecasting
---

# LightGBM multi-horizon CGM forecaster (MetaboNet)

A `MultiOutputRegressor(LGBMRegressor)` trained on the MetaboNet tabular split
and re-packaged as a `transformers`-compatible Hub model. One repo holds four
feature ablations, each with 12 boosters (one per 5-minute horizon up to
60 min):

- `cgm` — 24 CGM lags + `hour_sin`/`hour_cos` (26 features).
- `insulin` — `cgm` features + 24 Insulin lags (50 features).
- `carbs` — `cgm` features + 24 Carbs lags (50 features).
- `all` — `cgm` features + 24 Insulin lags + 24 Carbs lags (74 features).

## Files

- `config.json` — `auto_map` wiring + per-ablation feature lists.
- `model.py` — `LightGBMMultiHorizonConfig` / `LightGBMMultiHorizonModel`
  (`trust_remote_code=True`).
- `boosters/<ablation>/horizon_<NN>.txt` — `Booster.save_model` text dumps
  (4 ablations x 12 horizons = 48 files).

## Usage

```python
from transformers import AutoConfig, AutoModel

cfg = AutoConfig.from_pretrained(
    "anonymous-4FAD/LightGBM", trust_remote_code=True, ablation="cgm"
)
model = AutoModel.from_pretrained(
    "anonymous-4FAD/LightGBM", trust_remote_code=True, config=cfg
)

# Inputs match the MetaboNet benchmark.py contract:
#   timestamps: int64 ns, shape (B, T_in)
#   cgm/insulin/carbs: float, shape (B, T_in); only the last 24 steps are used
preds = model.predict(timestamps, cgm, insulin, carbs)  # -> (B, 12)
```

The thin local wrapper in
[`models/lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/models/lightgbm.py)
exposes the same API used by `benchmark.py`.

`lightgbm>=4.0` must be installed locally (boosters are loaded via
`lightgbm.Booster(model_file=...)`); inference is CPU-only.

## Feature convention

`CGM_t<i>` denotes the i-th sample within the last `history_length=24` steps,
ordered oldest -> newest. Same for `Insulin_t<i>` / `Carbs_t<i>`. `hour_sin`
and `hour_cos` come from the most recent input timestamp. The original
boosters were trained on numpy arrays so the feature names embedded in the
boosters are anonymized (`Column_0..`); the explicit names listed in
`config.json` come from the matched Ridge artifacts (same preprocessing
schema, same column order).

## Provenance

Trained via
[`other_models/results/train_lightgbm.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/other_models/results/train_lightgbm.py)
on the public MetaboNet train split. The Hub repo is staged by
[`scripts/build_other_models_hub.py`](https://github.com/njeffrie/MetaboNet-Bench/blob/main/scripts/build_other_models_hub.py)
which copies the booster text files verbatim and writes `config.json`.