Time Series Forecasting
Transformers
Safetensors
Timer-S1
text-generation
time series
time-series
forecasting
foundation models
pretrained models
time series foundation models
quantized
4-bit precision
bitsandbytes
unofficial
custom_code
Instructions to use geetu040/Timer-S1-quantized-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use geetu040/Timer-S1-quantized-4bit with Transformers:
# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("geetu040/Timer-S1-quantized-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| metrics: | |
| - mse | |
| - mae | |
| - mase | |
| - wql | |
| - crps | |
| pipeline_tag: time-series-forecasting | |
| datasets: | |
| - thuml/UTSD | |
| - Salesforce/lotsa_data | |
| - Salesforce/GiftEvalPretrain | |
| - autogluon/chronos_datasets | |
| tags: | |
| - time series | |
| - time-series | |
| - forecasting | |
| - foundation models | |
| - pretrained models | |
| - time series foundation models | |
| - quantized | |
| - 4-bit | |
| - bitsandbytes | |
| - unofficial | |
| library_name: transformers | |
| base_model: | |
| - bytedance-research/Timer-S1 | |
| # Timer-S1 Quantized 4-bit | |
| This repository contains an **unofficial 4-bit BitsAndBytes quantized checkpoint** derived from [`bytedance-research/Timer-S1`](https://huggingface.co/bytedance-research/Timer-S1). | |
| Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the [Timer-S1 technical report](https://arxiv.org/pdf/2603.04791). | |
| This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference. | |
| ## Source and Provenance | |
| - **Base model**: `bytedance-research/Timer-S1` | |
| - **Quantization**: BitsAndBytes 4-bit quantization | |
| - **Status**: unofficial derivative checkpoint | |
| No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized. | |
| ## Quantization Details | |
| The checkpoint configuration records the following quantization settings: | |
| ```json | |
| { | |
| "load_in_4bit": true, | |
| "load_in_8bit": false, | |
| "quant_method": "bitsandbytes", | |
| "bnb_4bit_quant_type": "fp4", | |
| "bnb_4bit_quant_storage": "uint8", | |
| "bnb_4bit_compute_dtype": "bfloat16", | |
| "bnb_4bit_use_double_quant": false | |
| } | |
| ``` | |
| The model config also sets `use_cache` to `true`, matching the local quantized checkpoint. For lower memory usage during generation, set `model.config.use_cache = False` after loading the model. | |
| ## Quickstart | |
| Install the expected runtime dependencies: | |
| ```bash | |
| pip install torch accelerate bitsandbytes "transformers~=4.57.1" | |
| ``` | |
| Load the model with Hugging Face Transformers: | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "geetu040/Timer-S1-quantized-4bit", | |
| trust_remote_code=True, | |
| device_map="auto", | |
| ) | |
| # Optional: reduce generation memory usage by disabling the KV cache. | |
| # This can be useful on smaller GPUs or for longer lookback windows. | |
| model.config.use_cache = False | |
| batch_size, lookback_length = 1, 2880 | |
| seqs = torch.randn(batch_size, lookback_length).to(model.device) | |
| forecast_length = 256 | |
| output = model.generate(seqs, max_new_tokens=forecast_length, revin=True) | |
| # Timer-S1 generates forecasts at quantile levels: | |
| # [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] | |
| print(output.shape) # batch_size x quantile_num(9) x forecast_length | |
| print(output[0][4]) # median forecast for the first sample | |
| ``` | |
| ## Specification | |
| - **Architecture**: decoder-only Transformer with MoE | |
| - **Context length**: up to 11,520 | |
| - **Patch length**: 16 | |
| - **Quantiles**: 0.1 through 0.9 | |
| - **Hidden size**: 1024 | |
| - **Attention heads**: 16 | |
| - **Experts**: 32 total, 2 selected per token | |
| - **Hidden layers**: 24 | |
| - **Weight format**: `model.safetensors` | |
| - **Quantization**: BitsAndBytes 4-bit FP4 | |
| ## License | |
| The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata. | |
| ## Citation | |
| If you use this quantized checkpoint, cite the original Timer-S1 paper: | |
| ```bibtex | |
| @article{liu2026timer, | |
| title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling}, | |
| author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng}, | |
| journal={arXiv preprint arXiv:2603.04791}, | |
| year={2026} | |
| } | |
| ``` | |