---
license: apache-2.0
metrics:
- mse
- mae
- mase
- wql
- crps
pipeline_tag: time-series-forecasting
datasets:
- thuml/UTSD
- Salesforce/lotsa_data
- Salesforce/GiftEvalPretrain
- autogluon/chronos_datasets
tags:
- time series
- time-series
- forecasting
- foundation models
- pretrained models
- time series foundation models
- quantized
- 4-bit
- bitsandbytes
- unofficial
library_name: transformers
base_model:
- bytedance-research/Timer-S1
---

# Timer-S1 Quantized 4-bit

This repository contains an **unofficial 4-bit BitsAndBytes quantized checkpoint** derived from [`bytedance-research/Timer-S1`](https://huggingface.co/bytedance-research/Timer-S1).

Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the [Timer-S1 technical report](https://arxiv.org/pdf/2603.04791).

This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference.

## Source and Provenance

- **Base model**: `bytedance-research/Timer-S1`
- **Quantization**: BitsAndBytes 4-bit quantization
- **Status**: unofficial derivative checkpoint

No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized.

## Quantization Details

The checkpoint configuration records the following quantization settings:

```json
{
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes",
  "bnb_4bit_quant_type": "fp4",
  "bnb_4bit_quant_storage": "uint8",
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_use_double_quant": false
}
```

The model config also sets `use_cache` to `true`, matching the local quantized checkpoint. For lower memory usage during generation, set `model.config.use_cache = False` after loading the model.

## Quickstart

Install the expected runtime dependencies:

```bash
pip install torch accelerate bitsandbytes "transformers~=4.57.1"
```

Load the model with Hugging Face Transformers:

```python
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "geetu040/Timer-S1-quantized-4bit",
    trust_remote_code=True,
    device_map="auto",
)

# Optional: reduce generation memory usage by disabling the KV cache.
# This can be useful on smaller GPUs or for longer lookback windows.
model.config.use_cache = False

batch_size, lookback_length = 1, 2880
seqs = torch.randn(batch_size, lookback_length).to(model.device)

forecast_length = 256
output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)

# Timer-S1 generates forecasts at quantile levels:
# [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape)  # batch_size x quantile_num(9) x forecast_length
print(output[0][4])  # median forecast for the first sample
```

## Specification

- **Architecture**: decoder-only Transformer with MoE
- **Context length**: up to 11,520
- **Patch length**: 16
- **Quantiles**: 0.1 through 0.9
- **Hidden size**: 1024
- **Attention heads**: 16
- **Experts**: 32 total, 2 selected per token
- **Hidden layers**: 24
- **Weight format**: `model.safetensors`
- **Quantization**: BitsAndBytes 4-bit FP4

## License

The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata.

## Citation

If you use this quantized checkpoint, cite the original Timer-S1 paper:

```bibtex
@article{liu2026timer,
  title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
  author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
  journal={arXiv preprint arXiv:2603.04791},
  year={2026}
}
```