Time Series Forecasting
Transformers
Safetensors
Timer-S1
time series
time-series
forecasting
foundation models
pretrained models
time series foundation models
custom_code
Instructions to use thuml/Timer-S1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use thuml/Timer-S1 with Transformers:
# Load model directly from transformers import Timer-S1 model = Timer-S1.from_pretrained("thuml/Timer-S1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 4,356 Bytes
d041ad0 dd92ce5 d041ad0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | ---
license: apache-2.0
metrics:
- mse
- mae
- mase
- wql
- crps
pipeline_tag: time-series-forecasting
datasets:
- thuml/UTSD
- Salesforce/lotsa_data
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
tags:
- time series
- time-series
- forecasting
- foundation models
- pretrained models
- time series foundation models
library_name: transformers
---
# Timer-S1
Timer-S1 is a time series foundation model with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**.
The model supports **zero-shot forecasting** (predicting without dataset-specific training) at different quantile levels.
For more details, please refer to our [technical report](https://arxiv.org/pdf/2603.04791).

**Architecture**: Timer-S1 is a decoder-only Mixture-of-Experts (MoE) Transformer. For time series forecasting (a sequential problem where each step depends on previous ones), we propose **TimeSTP**, enabling multi-step prediction with cost-effective **serial computations**.

**Performance**: Timer-S1 achieves state-of-the-art results on [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval). The model excels particularly at **medium-term** and **long-term** forecasting tasks.


**Post Training**: Timer-S1 undergoes post-training, including continued pre-training (**CPT**) and long-context extension (**LCE**), which improves short-term and long-context performance.

## Quickstart
```
pip install torch accelerate transformers~=4.57.1
```
```python
import torch
from transformers import AutoModelForCausalLM
# load pretrain model
# supports different lookback/forecast lengths
model = AutoModelForCausalLM.from_pretrained(
'thuml/Timer-S1',
trust_remote_code=True,
device_map="auto"
)
# use local model
# model = AutoModelForCausalLM.from_pretrained(
# 'path_to_timer_s1',
# trust_remote_code=True,
# device_map="auto"
# )
# prepare input
batch_size, lookback_length = 64, 11520
seqs = torch.randn(batch_size, lookback_length).to(model.device)
# Note that Timer-S1 generates predictions at fixed quantile levels
forecast_length = 256
output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)
# produce quantile forecasts in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape) # batch_size x quantile_num(9) x forecast_length
# produce the median forecast of the first sample
print(output[0][4])
```
This model support inference using either CPU or GPU. To load this model on GPU, we recommend a GPU with **at least 40GB VRAM** (e.g., A100 40GB/80GB, or H100).
> **Encounter out-of-memory at runtime?** Try the following options:
> ```python
> # Option 1: reduce batch size or context length
> batch_size, lookback_length = 1, 2880
>
> # Option 2: disable KV cache at runtime (or edit it in config.json for a permanent change)
> model.config.use_cache = False # there is no efficiency impact for cases where the prediction horizon does not exceed 256.
> ```
## Specification
* **Architecture**: decoder-only Transformer with MoE
* **Context Length**: up to 11,520
* **ReNorm**: default=True
* **KV Cache**: default=True
* **Patch Length**: 16
* **Total Parameters**: 8.3B
* **Activated Parameters**: 0.75B
* **Number of Layers**: 40
## License Agreement
This model is licensed under the Apache-2.0 License.
## Citation
If you find Timer-S1 helpful for your research, please cite our paper:
```
@article{liu2026timer,
title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
journal={arXiv preprint arXiv:2603.04791},
year={2026}
}
``` |