Datadog
/

Toto-2.0-4m

Time Series Forecasting

PyTorch

Safetensors

pytorch_model_hub_mixin

Eval Results (legacy)

Model card Files Files and versions

xet

Community

Emaad

nielsr HF Staff commited on 3 days ago

Commit

80cc3ff

1 Parent(s): a6f4dbd

Improve model card: add links, dataset metadata, and absolute image paths (#2)

Browse files

- Improve model card: add links, dataset metadata, and absolute image paths (0b07d913db48ee0d275b1c0523a3a8186650ce4f)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +58 -52

README.md CHANGED Viewed

@@ -1,4 +1,10 @@
 ---
 tags:
 - time-series-forecasting
 - foundation-models
@@ -9,61 +15,61 @@ tags:
 - observability
 - safetensors
 - pytorch_model_hub_mixin
-license: apache-2.0
-pipeline_tag: time-series-forecasting
 thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
 model-index:
 - name: Toto-2.0-4m
   results:
-    - task:
-        type: time-series-forecasting
-      dataset:
-        name: BOOM
-        type: BOOM
-      metrics:
-        - name: CRPS
-          type: CRPS
-          value: 0.377
-        - name: MASE
-          type: MASE
-          value: 0.624
-      source:
-        name: BOOM 💥 Observability Time-Series Forecasting Leaderboard
-        url: https://huggingface.co/spaces/Datadog/BOOM
-    - task:
-        type: time-series-forecasting
-      dataset:
-        name: GIFT-Eval
-        type: GIFT-Eval
-      metrics:
-        - name: CRPS
-          type: CRPS
-          value: 0.524
-        - name: MASE
-          type: MASE
-          value: 0.757
-      source:
-        name: GIFT-Eval Time Series Forecasting Leaderboard
-        url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
-    - task:
-        type: time-series-forecasting
-      dataset:
-        name: TIME
-        type: TIME
-      metrics:
-        - name: CRPS
-          type: CRPS
-          value: 0.574
-        - name: MASE
-          type: MASE
-          value: 0.689
-      source:
-        name: TIME Benchmark Leaderboard
-        url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
 ---
 # Toto-2.0-4m
 Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-μP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family.
 The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
@@ -71,7 +77,7 @@ The family sets a new state of the art on three forecasting benchmarks: [BOOM](h
 ## 📊 Performance
 <figure>
-<img src="assets/pareto.png" alt="Pareto frontier on BOOM and GIFT-Eval">
 <figcaption>Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.</figcaption>
 </figure>
@@ -135,13 +141,13 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
 ## 🏗️ Architecture
 <figure>
-<img src="assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
-<figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="#-additional-resources">technical report</a> for details.</figcaption>
 </figure>
 ## 🔗 Additional Resources
-- [Technical Report](https://arxiv.org/abs/2605.20119)
 - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
 - [GitHub Repository](https://github.com/DataDog/toto)
 - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) — all five base checkpoints
@@ -160,4 +166,4 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2605.20119},
 }
-```

 ---
+license: apache-2.0
+pipeline_tag: time-series-forecasting
+datasets:
+- Datadog/BOOM
+- Salesforce/GiftEvalPretrain
+- autogluon/chronos_datasets
 tags:
 - time-series-forecasting
 - foundation-models
 - observability
 - safetensors
 - pytorch_model_hub_mixin
 thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
 model-index:
 - name: Toto-2.0-4m
   results:
+  - task:
+      type: time-series-forecasting
+    dataset:
+      name: BOOM
+      type: BOOM
+    metrics:
+    - type: CRPS
+      value: 0.377
+      name: CRPS
+    - type: MASE
+      value: 0.624
+      name: MASE
+    source:
+      url: https://huggingface.co/spaces/Datadog/BOOM
+      name: BOOM 💥 Observability Time-Series Forecasting Leaderboard
+  - task:
+      type: time-series-forecasting
+    dataset:
+      name: GIFT-Eval
+      type: GIFT-Eval
+    metrics:
+    - type: CRPS
+      value: 0.524
+      name: CRPS
+    - type: MASE
+      value: 0.757
+      name: MASE
+    source:
+      url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
+      name: GIFT-Eval Time Series Forecasting Leaderboard
+  - task:
+      type: time-series-forecasting
+    dataset:
+      name: TIME
+      type: TIME
+    metrics:
+    - type: CRPS
+      value: 0.574
+      name: CRPS
+    - type: MASE
+      value: 0.689
+      name: MASE
+    source:
+      url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
+      name: TIME Benchmark Leaderboard
 ---
 # Toto-2.0-4m
+[[Technical Report](https://huggingface.co/papers/2605.20119)] [[GitHub](https://github.com/DataDog/toto)] [[Blog](https://www.datadoghq.com/blog/ai/toto-2/)]
 Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-μP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family.
 The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
 ## 📊 Performance
 <figure>
+<img src="https://huggingface.co/Datadog/Toto-2.0-4m/resolve/main/assets/pareto.png" alt="Pareto frontier on BOOM and GIFT-Eval">
 <figcaption>Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.</figcaption>
 </figure>
 ## 🏗️ Architecture
 <figure>
+<img src="https://huggingface.co/Datadog/Toto-2.0-4m/resolve/main/assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
+<figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="https://huggingface.co/papers/2605.20119">technical report</a> for details.</figcaption>
 </figure>
 ## 🔗 Additional Resources
+- [Technical Report](https://huggingface.co/papers/2605.20119)
 - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
 - [GitHub Repository](https://github.com/DataDog/toto)
 - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) — all five base checkpoints
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2605.20119},
 }
+```