Emaad nielsr HF Staff commited on
Commit
80cc3ff
Β·
1 Parent(s): a6f4dbd

Improve model card: add links, dataset metadata, and absolute image paths (#2)

Browse files

- Improve model card: add links, dataset metadata, and absolute image paths (0b07d913db48ee0d275b1c0523a3a8186650ce4f)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +58 -52
README.md CHANGED
@@ -1,4 +1,10 @@
1
  ---
 
 
 
 
 
 
2
  tags:
3
  - time-series-forecasting
4
  - foundation-models
@@ -9,61 +15,61 @@ tags:
9
  - observability
10
  - safetensors
11
  - pytorch_model_hub_mixin
12
- license: apache-2.0
13
- pipeline_tag: time-series-forecasting
14
  thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
15
  model-index:
16
  - name: Toto-2.0-4m
17
  results:
18
- - task:
19
- type: time-series-forecasting
20
- dataset:
21
- name: BOOM
22
- type: BOOM
23
- metrics:
24
- - name: CRPS
25
- type: CRPS
26
- value: 0.377
27
- - name: MASE
28
- type: MASE
29
- value: 0.624
30
- source:
31
- name: BOOM πŸ’₯ Observability Time-Series Forecasting Leaderboard
32
- url: https://huggingface.co/spaces/Datadog/BOOM
33
- - task:
34
- type: time-series-forecasting
35
- dataset:
36
- name: GIFT-Eval
37
- type: GIFT-Eval
38
- metrics:
39
- - name: CRPS
40
- type: CRPS
41
- value: 0.524
42
- - name: MASE
43
- type: MASE
44
- value: 0.757
45
- source:
46
- name: GIFT-Eval Time Series Forecasting Leaderboard
47
- url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
48
- - task:
49
- type: time-series-forecasting
50
- dataset:
51
- name: TIME
52
- type: TIME
53
- metrics:
54
- - name: CRPS
55
- type: CRPS
56
- value: 0.574
57
- - name: MASE
58
- type: MASE
59
- value: 0.689
60
- source:
61
- name: TIME Benchmark Leaderboard
62
- url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
63
  ---
64
 
65
  # Toto-2.0-4m
66
 
 
 
67
  Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-ΞΌP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family.
68
 
69
  The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
@@ -71,7 +77,7 @@ The family sets a new state of the art on three forecasting benchmarks: [BOOM](h
71
  ## πŸ“Š Performance
72
 
73
  <figure>
74
- <img src="assets/pareto.png" alt="Pareto frontier on BOOM and GIFT-Eval">
75
  <figcaption>Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.</figcaption>
76
  </figure>
77
 
@@ -135,13 +141,13 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
135
  ## πŸ—οΈ Architecture
136
 
137
  <figure>
138
- <img src="assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
139
- <figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="#-additional-resources">technical report</a> for details.</figcaption>
140
  </figure>
141
 
142
  ## πŸ”— Additional Resources
143
 
144
- - [Technical Report](https://arxiv.org/abs/2605.20119)
145
  - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
146
  - [GitHub Repository](https://github.com/DataDog/toto)
147
  - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” all five base checkpoints
@@ -160,4 +166,4 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
160
  primaryClass={cs.LG},
161
  url={https://arxiv.org/abs/2605.20119},
162
  }
163
- ```
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: time-series-forecasting
4
+ datasets:
5
+ - Datadog/BOOM
6
+ - Salesforce/GiftEvalPretrain
7
+ - autogluon/chronos_datasets
8
  tags:
9
  - time-series-forecasting
10
  - foundation-models
 
15
  - observability
16
  - safetensors
17
  - pytorch_model_hub_mixin
 
 
18
  thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
19
  model-index:
20
  - name: Toto-2.0-4m
21
  results:
22
+ - task:
23
+ type: time-series-forecasting
24
+ dataset:
25
+ name: BOOM
26
+ type: BOOM
27
+ metrics:
28
+ - type: CRPS
29
+ value: 0.377
30
+ name: CRPS
31
+ - type: MASE
32
+ value: 0.624
33
+ name: MASE
34
+ source:
35
+ url: https://huggingface.co/spaces/Datadog/BOOM
36
+ name: BOOM πŸ’₯ Observability Time-Series Forecasting Leaderboard
37
+ - task:
38
+ type: time-series-forecasting
39
+ dataset:
40
+ name: GIFT-Eval
41
+ type: GIFT-Eval
42
+ metrics:
43
+ - type: CRPS
44
+ value: 0.524
45
+ name: CRPS
46
+ - type: MASE
47
+ value: 0.757
48
+ name: MASE
49
+ source:
50
+ url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
51
+ name: GIFT-Eval Time Series Forecasting Leaderboard
52
+ - task:
53
+ type: time-series-forecasting
54
+ dataset:
55
+ name: TIME
56
+ type: TIME
57
+ metrics:
58
+ - type: CRPS
59
+ value: 0.574
60
+ name: CRPS
61
+ - type: MASE
62
+ value: 0.689
63
+ name: MASE
64
+ source:
65
+ url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
66
+ name: TIME Benchmark Leaderboard
67
  ---
68
 
69
  # Toto-2.0-4m
70
 
71
+ [[Technical Report](https://huggingface.co/papers/2605.20119)] [[GitHub](https://github.com/DataDog/toto)] [[Blog](https://www.datadoghq.com/blog/ai/toto-2/)]
72
+
73
  Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-ΞΌP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family.
74
 
75
  The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
 
77
  ## πŸ“Š Performance
78
 
79
  <figure>
80
+ <img src="https://huggingface.co/Datadog/Toto-2.0-4m/resolve/main/assets/pareto.png" alt="Pareto frontier on BOOM and GIFT-Eval">
81
  <figcaption>Every Toto 2.0 size sits on or near the Pareto frontier on both BOOM and GIFT-Eval. The three largest sizes rank first, second, and third among foundation models on GIFT-Eval CRPS rank. On TIME, Toto 2.0 sizes take the top three spots on every metric, ahead of every other external foundation model evaluated.</figcaption>
82
  </figure>
83
 
 
141
  ## πŸ—οΈ Architecture
142
 
143
  <figure>
144
+ <img src="https://huggingface.co/Datadog/Toto-2.0-4m/resolve/main/assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
145
+ <figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="https://huggingface.co/papers/2605.20119">technical report</a> for details.</figcaption>
146
  </figure>
147
 
148
  ## πŸ”— Additional Resources
149
 
150
+ - [Technical Report](https://huggingface.co/papers/2605.20119)
151
  - [Blog Post](https://www.datadoghq.com/blog/ai/toto-2/)
152
  - [GitHub Repository](https://github.com/DataDog/toto)
153
  - [Toto 2.0 Collection](https://huggingface.co/collections/Datadog/toto-20) β€” all five base checkpoints
 
166
  primaryClass={cs.LG},
167
  url={https://arxiv.org/abs/2605.20119},
168
  }
169
+ ```