--- license: apache-2.0 metrics: - mse - mae - mase - wql - crps pipeline_tag: time-series-forecasting datasets: - thuml/UTSD - Salesforce/lotsa_data - Salesforce/GiftEvalPretrain - autogluon/chronos_datasets tags: - time series - time-series - forecasting - foundation models - pretrained models - time series foundation models - quantized - 4-bit - bitsandbytes - unofficial library_name: transformers base_model: - bytedance-research/Timer-S1 --- # Timer-S1 Quantized 4-bit This repository contains an **unofficial 4-bit BitsAndBytes quantized checkpoint** derived from [`bytedance-research/Timer-S1`](https://huggingface.co/bytedance-research/Timer-S1). Timer-S1 is a time series foundation model for zero-shot forecasting. The original model card describes Timer-S1 as a decoder-only Mixture-of-Experts Transformer with **8.3B** total parameters, **0.75B** activated parameters per token, and a context length of **11,520**. For details about the original model, architecture, training data, benchmark results, and intended use, refer to the upstream model card and the [Timer-S1 technical report](https://arxiv.org/pdf/2603.04791). This upload preserves the upstream Timer-S1 remote-code implementation files and Apache-2.0 license metadata, but stores the model weights as a local 4-bit quantized checkpoint for lower-memory inference. ## Source and Provenance - **Base model**: `bytedance-research/Timer-S1` - **Quantization**: BitsAndBytes 4-bit quantization - **Status**: unofficial derivative checkpoint No new training or benchmark claims are made for this quantized checkpoint. Numerical outputs may differ slightly from the original bfloat16 checkpoint because the weights are quantized. ## Quantization Details The checkpoint configuration records the following quantization settings: ```json { "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes", "bnb_4bit_quant_type": "fp4", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_use_double_quant": false } ``` The model config also sets `use_cache` to `true`, matching the local quantized checkpoint. For lower memory usage during generation, set `model.config.use_cache = False` after loading the model. ## Quickstart Install the expected runtime dependencies: ```bash pip install torch accelerate bitsandbytes "transformers~=4.57.1" ``` Load the model with Hugging Face Transformers: ```python import torch from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "geetu040/Timer-S1-quantized-4bit", trust_remote_code=True, device_map="auto", ) # Optional: reduce generation memory usage by disabling the KV cache. # This can be useful on smaller GPUs or for longer lookback windows. model.config.use_cache = False batch_size, lookback_length = 1, 2880 seqs = torch.randn(batch_size, lookback_length).to(model.device) forecast_length = 256 output = model.generate(seqs, max_new_tokens=forecast_length, revin=True) # Timer-S1 generates forecasts at quantile levels: # [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] print(output.shape) # batch_size x quantile_num(9) x forecast_length print(output[0][4]) # median forecast for the first sample ``` ## Specification - **Architecture**: decoder-only Transformer with MoE - **Context length**: up to 11,520 - **Patch length**: 16 - **Quantiles**: 0.1 through 0.9 - **Hidden size**: 1024 - **Attention heads**: 16 - **Experts**: 32 total, 2 selected per token - **Hidden layers**: 24 - **Weight format**: `model.safetensors` - **Quantization**: BitsAndBytes 4-bit FP4 ## License The upstream Timer-S1 model card lists the model under the Apache-2.0 License. This repository preserves that license metadata. ## Citation If you use this quantized checkpoint, cite the original Timer-S1 paper: ```bibtex @article{liu2026timer, title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling}, author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng}, journal={arXiv preprint arXiv:2603.04791}, year={2026} } ```