| --- |
| license: apache-2.0 |
| metrics: |
| - mse |
| - mae |
| - mase |
| - wql |
| - crps |
| pipeline_tag: time-series-forecasting |
| datasets: |
| - thuml/UTSD |
| - Salesforce/lotsa_data |
| - Salesforce/GiftEvalPretrain |
| - autogluon/chronos_datasets |
| tags: |
| - time series |
| - time-series |
| - forecasting |
| - foundation models |
| - pretrained models |
| - time series foundation models |
| library_name: transformers |
| --- |
| |
| # Timer-S1 |
|
|
| Timer-S1 is a time series foundation model with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**. |
|
|
| The model supports **zero-shot forecasting** (predicting without dataset-specific training) at different quantile levels. |
|
|
| For more details, please refer to our [technical report](https://arxiv.org/pdf/2603.04791). |
|
|
|  |
|
|
| **Architecture**: Timer-S1 is a decoder-only Mixture-of-Experts (MoE) Transformer. For time series forecasting (a sequential problem where each step depends on previous ones), we propose **TimeSTP**, enabling multi-step prediction with cost-effective **serial computations**. |
|  |
|
|
| **Performance**: Timer-S1 achieves state-of-the-art results on [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval). The model excels particularly at **medium-term** and **long-term** forecasting tasks. |
|
|
|  |
|
|
|  |
|
|
| **Post Training**: Timer-S1 undergoes post-training, including continued pre-training (**CPT**) and long-context extension (**LCE**), which improves short-term and long-context performance. |
|
|
|  |
|
|
|
|
| ## Quickstart |
|
|
| ``` |
| pip install torch accelerate transformers~=4.57.1 |
| ``` |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM |
| |
| # load pretrain model |
| # supports different lookback/forecast lengths |
| model = AutoModelForCausalLM.from_pretrained( |
| 'thuml/Timer-S1', |
| trust_remote_code=True, |
| device_map="auto" |
| ) |
| |
| # use local model |
| # model = AutoModelForCausalLM.from_pretrained( |
| # 'path_to_timer_s1', |
| # trust_remote_code=True, |
| # device_map="auto" |
| # ) |
| |
| # prepare input |
| batch_size, lookback_length = 64, 11520 |
| seqs = torch.randn(batch_size, lookback_length).to(model.device) |
| |
| # Note that Timer-S1 generates predictions at fixed quantile levels |
| forecast_length = 256 |
| |
| output = model.generate(seqs, max_new_tokens=forecast_length, revin=True) |
| |
| # produce quantile forecasts in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] |
| print(output.shape) # batch_size x quantile_num(9) x forecast_length |
| |
| # produce the median forecast of the first sample |
| print(output[0][4]) |
| ``` |
|
|
|
|
| This model support inference using either CPU or GPU. To load this model on GPU, we recommend a GPU with **at least 40GB VRAM** (e.g., A100 40GB/80GB, or H100). |
|
|
| > **Encounter out-of-memory at runtime?** Try the following options: |
| > ```python |
| > # Option 1: reduce batch size or context length |
| > batch_size, lookback_length = 1, 2880 |
| > |
| > # Option 2: disable KV cache at runtime (or edit it in config.json for a permanent change) |
| > model.config.use_cache = False # there is no efficiency impact for cases where the prediction horizon does not exceed 256. |
| > ``` |
| |
| ## Specification |
| |
| * **Architecture**: decoder-only Transformer with MoE |
| * **Context Length**: up to 11,520 |
| * **ReNorm**: default=True |
| * **KV Cache**: default=True |
| * **Patch Length**: 16 |
| * **Total Parameters**: 8.3B |
| * **Activated Parameters**: 0.75B |
| * **Number of Layers**: 40 |
| |
| |
| ## License Agreement |
| |
| This model is licensed under the Apache-2.0 License. |
| |
| ## Citation |
| |
| If you find Timer-S1 helpful for your research, please cite our paper: |
| ``` |
| @article{liu2026timer, |
| title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling}, |
| author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng}, |
| journal={arXiv preprint arXiv:2603.04791}, |
| year={2026} |
| } |
| ``` |