Toto 2.0 Family and Friends — GIFT-Eval artifact bundle

Pre-computed artifacts for replicating the Toto 2.0 Family and Friends (short form: Toto-2.0-FnF) submission to the GIFT-Eval benchmark. The ensemble is an FFORMA-style (Montero-Manso et al., International Journal of Forecasting, 2020) meta-learner that gates a pool of foundation models on a per-(frequency, term) bucket basis using XGBoost over time-series features.

The replication notebook lives in the GIFT-Eval repo at notebooks/toto_2_0_fnf.ipynb.


✨ Key Features

  • Per-bucket gating: Separate XGBoost head per (frequency, term) bucket — each bucket learns its own softmax over the model pool so the ensemble can specialize without one global gate trading off across regimes.
  • No retraining at inference: The bundle ships pre-computed base-model predictions and tsfeatures for the full GIFT-Eval test split, so replication needs neither GPUs nor the base-model libraries.
  • No leakage: tsfeatures are computed only on the lookback context preceding each forecast window; the bundle stores dataset metadata but not ground-truth labels.

🧩 Model pool

The meta-learner outputs softmax weights over 10 foundation models (column order matters — it is tied to the booster's class indices):

# Model Family
0 chronos-2 Chronos
1 timesfm-2.5 TimesFM
2 flowstate FlowState
3 tirex TiRex
4 patchtst-fm PatchTST
5 toto-2.0-4m Toto 2.0
6 toto-2.0-22m Toto 2.0
7 toto-2.0-313m Toto 2.0
8 toto-2.0-1b Toto 2.0
9 toto-2.0-2.5b Toto 2.0

📦 Bundle layout

booster_manifest.json          ~4.8 GB — base64-encoded XGBoost boosters keyed by "<canonical_freq>|<term>"
feature_columns.json           train-time column order expected by the booster
feature_types.json             XGBoost feature_types (c = categorical, q = float)
categories.json                {"freq": [...], "domain": [...]} train-time category vocabularies
models.json                    list of model names in column order (column index ↔ model)
test_features/<ds_dirname>/
  test_features.npz            (n_windows, n_tsfeatures) tsfeatures from the lookback context preceding each window
  test_metadata.npz            dataset-level scalars only (seasonality, prediction_length, num_variates, freq, domain)
test_predictions/<model>/<ds_dirname>/
  test_predictions.npz         (n_windows, 9, prediction_length) quantile forecasts at QUANTILE_LEVELS = [0.1, ..., 0.9]

ds_dirname follows GIFT-Eval's canonical naming: <pretty_name>_<freq>_<term> (e.g. m4_weekly_W_short).


⚡ How the booster is used

Per (dataset, term):

  1. Load test_features.npz and test_metadata.npz. Reindex the tsfeatures to feature_columns.json — columns missing in this dataset's tsfeatures (e.g. seasonal_strength on yearly data) become NaN, which XGBoost handles natively. Attach scalar features (seasonality, prediction_length, num_variates) and categorical features (freq, domain) using the train-time categorical vocabularies in categories.json. The tsfeatures are computed only on the lookback context that precedes each forecast window, so no information from the ground-truth labels is ever used at inference time.
  2. Look up the bucket booster for (canonical_freq, term) where canonical_freq strips pandas anchor suffixes (W-TUEW, Q-DECQ).
  3. booster.predict(..., output_margin=True) returns raw class logits of shape (n_windows, 10); softmax over the model axis gives the per-window weights.
  4. Stack the 10 per-model test_predictions.npz arrays into a (n_windows, 10, 9, prediction_length) tensor; weight-sum across the model axis → final quantile forecast.
  5. Score with gluonts.evaluate_model using the same call shape every other GIFT-Eval submission uses (see evaluate_dataset in the notebook).

🔁 Reproducing from scratch

Each base model's predictions were generated by running its standard GIFT-Eval notebook (notebooks/chronos-2.ipynb, etc.) with a wrapper that saves the per-window quantile forecasts to test_predictions.npz instead of going straight into evaluate_model. The notebook's "Optional B" section shows the wrapper for every pool member. Time-series features come from the tsfeatures library; "Optional A" in the notebook shows the per-window extraction call. The meta-learner boosters were trained on the corresponding train-window predictions, which are not included in this bundle.


🔗 Additional Resources


📖 Citation

(citation coming soon)

📝 License

Apache 2.0. Each base model retains its original license — see the linked HF repos in the model pool table.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support