TTM-R3 Model Card

Building on top of TTM-R1 and TTM-R2, we introduce the next generation of TinyTimeMixer — TTM-R3. This release incorporates several novel tiny-neural architectural innovations designed to push the limits of accuracy in high-speed forecasting, a critical requirement for real-world production deployments.

🧠 Supported Tasks & Capabilities

TTM-R3 is a family of pretrained models supporting multiple real-world forecasting scenarios:

Zero-shot forecasting across unseen datasets
Few-shot adaptation (effective with as few as ~1K samples)
Full fine-tuning for domain-specific optimization
Multivariate time-series forecasting
Exogenous / control variable integration
High-throughput batch inference for production systems

🚀 Accuracy & Speed

TTM-R3 is engineered to achieve a strong balance between state-of-the-art accuracy and extreme inference efficiency, making it well-suited for real-world, high-throughput deployments.

📊 Accuracy (GIFT-Eval)

Maintains top-tier performance on the GIFT-Eval leaderboard.
Fine-tuned FM:
- MASE: 0.713 | CRPS: 0.512
- (Lite) MASE: 0.721 | CRPS: 0.516
Pre-trained FM:
- MASE: 0.724 | CRPS: 0.520
- (Lite) MASE: 0.734 | CRPS: 0.526

⚡ Inference Throughput

TTM-R3 delivers orders-of-magnitude faster inference compared to existing popular SOTA models.

🖥️ GPU Throughput

Typical SOTA models: ~20–500 samples/sec
TTM-R3: ~7,500 samples/sec
TTM-R3 Lite: ~18,000 samples/sec

💻 CPU Throughput

Typical SOTA models: ~1–20 samples/sec
TTM-R3: ~180 samples/sec
TTM-R3 Lite: ~800 samples/sec

👉 TTM-R3 achieves ~15–50× speedup over many existing approaches, without compromising accuracy, setting a new benchmark for fast and reliable time-series forecasting.

🧩 Architecture Overview

TTM adopts a mixture-of-experts paradigm composed of models with varying complexities[1-35M and Lite: 1-18M parameters], coupled with a lightweight routing mechanism that automatically selects or blends the most suitable expert based on input data characteristics. This enables adaptive model selection improving both accuracy and efficiency across diverse time-series scenarios. The architecture is built on efficient mixer-based designs that avoid expensive self-attention, instead leveraging linear gating-based attention mechanisms to capture temporal dependencies with significantly lower computational overhead. This combination allows TTM-R3 to deliver scalable, adaptive, and ultra-fast forecasting performance suitable for real-time and large-scale deployments.

🔬 What’s New in TTM-R3

Trend–Residual Decomposition
Separately models long-term trends and high-frequency residuals for improved structural learning.
Three-Stage Pre-Training
Sequential training of trend, residual, and joint components for better stability and convergence.
Student-Teacher Pretraining Student-teacher based pretraining for stable learning in noisy datasets.
Enhanced Data Augmentation
Structured perturbations improve robustness across domains.
Improved Normalization Strategy
Stabilizes training across scale shifts and heterogeneous datasets.
GLU Gating
Dynamic information flow control within mixer blocks.
Multi-Resolution Temporal Layer
Captures dependencies across short-, medium-, and long-term horizons.
FFT-Based Embeddings
Incorporates frequency-domain signals to model periodicity and seasonality.
Register Tokens
Learnable global tokens to encode sequence-level semantics.
Multi-Quantile Forecasting Head
Enables probabilistic forecasting with multiple quantiles.
Refined Loss Weighting
Balances trend, residual, and quantile objectives for improved calibration.

⚡ Why TTM-R3

TTM-R3 is purpose-built for production-grade time-series systems where:

Low latency is critical (real-time inference)
High scale is required (millions of forecasts)
Data is heterogeneous and evolving
Compute constraints exist (CPU-friendly deployment)

It provides a practical combination of compactness, speed, and accuracy, making it suitable for industrial deployment.

Example recipes and notebooks

[To be released]

Publication

TTM-R3 Paper: [To be released]
Base TTM Paper: Link(Neurips 2024)

Training Data

Datasets from GiftEvalPretrain, and Train (Non-leaking historical context).
Custom synthesized data: Based on KernelSynth

Model Card Authors

Vijay Ekambaram, Arindam Jati, Haoxiang Qiu, Takayuki Katsuki, Tomoya Sakai, Priyanshul Govil, Pankaj Dayama

Citation

Please cite the following paper if you intend to use our model or its associated architectures/approaches in your work.

BibTeX:

@inproceedings{ekambaram2024tinytimemixersttms,
      title={Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series},
      author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS 2024)},
      year={2024},
}

IBM Public Repository Disclosure

All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.

Downloads last month: 167,271

Safetensors

Model size

1.41M params

Tensor type

F32

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ibm-research/ttm-r3 1

Collection including ibm-research/ttm-r3

Time Series Models

Collection

A collection of time series models trained by IBM • 4 items • Updated Feb 25 • 1