Time Series Forecasting
Transformers
Safetensors
Timer-S1
text-generation
time series
time-series
forecasting
foundation models
pretrained models
time series foundation models
quantized
4-bit precision
bitsandbytes
unofficial
custom_code
Instructions to use geetu040/Timer-S1-quantized-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use geetu040/Timer-S1-quantized-4bit with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("geetu040/Timer-S1-quantized-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 4,206 Bytes
8bc63f3 f3a92fc 8bc63f3 f3a92fc 8bc63f3 f3a92fc 8bc63f3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ---
license: apache-2.0
metrics:
- mse
- mae
- mase
- wql
- crps
pipeline_tag: time-series-forecasting
datasets:
- thuml/UTSD
- Salesforce/lotsa_data
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
tags:
- time series
- time-series
- forecasting
- foundation models
- pretrained models
- time series foundation models
- quantized
- 4-bit
- bitsandbytes
- unofficial
library_name: transformers
base_model:
- bytedance-research/Timer-S1
---
# Timer-S1 Quantized 4-bit
This repository contains an **unofficial 4-bit BitsAndBytes quantized checkpoint** derived from [`bytedance-research/Timer-S1`](https://huggingface.co/bytedance-research/Timer-S1).
Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the [Timer-S1 technical report](https://arxiv.org/pdf/2603.04791).
This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference.
## Source and Provenance
- **Base model**: `bytedance-research/Timer-S1`
- **Quantization**: BitsAndBytes 4-bit quantization
- **Status**: unofficial derivative checkpoint
No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized.
## Quantization Details
The checkpoint configuration records the following quantization settings:
```json
{
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes",
"bnb_4bit_quant_type": "fp4",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_use_double_quant": false
}
```
The model config also sets `use_cache` to `true`, matching the local quantized checkpoint. For lower memory usage during generation, set `model.config.use_cache = False` after loading the model.
## Quickstart
Install the expected runtime dependencies:
```bash
pip install torch accelerate bitsandbytes "transformers~=4.57.1"
```
Load the model with Hugging Face Transformers:
```python
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"geetu040/Timer-S1-quantized-4bit",
trust_remote_code=True,
device_map="auto",
)
# Optional: reduce generation memory usage by disabling the KV cache.
# This can be useful on smaller GPUs or for longer lookback windows.
model.config.use_cache = False
batch_size, lookback_length = 1, 2880
seqs = torch.randn(batch_size, lookback_length).to(model.device)
forecast_length = 256
output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)
# Timer-S1 generates forecasts at quantile levels:
# [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape) # batch_size x quantile_num(9) x forecast_length
print(output[0][4]) # median forecast for the first sample
```
## Specification
- **Architecture**: decoder-only Transformer with MoE
- **Context length**: up to 11,520
- **Patch length**: 16
- **Quantiles**: 0.1 through 0.9
- **Hidden size**: 1024
- **Attention heads**: 16
- **Experts**: 32 total, 2 selected per token
- **Hidden layers**: 24
- **Weight format**: `model.safetensors`
- **Quantization**: BitsAndBytes 4-bit FP4
## License
The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata.
## Citation
If you use this quantized checkpoint, cite the original Timer-S1 paper:
```bibtex
@article{liu2026timer,
title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
journal={arXiv preprint arXiv:2603.04791},
year={2026}
}
```
|