TTM-R3 Model Card
Building on top of TTM-R1 and TTM-R2, we introduce the next generation of TinyTimeMixer β TTM-R3. This release incorporates several novel tiny-neural architectural innovations designed to push the limits of accuracy in high-speed forecasting, a critical requirement for real-world production deployments.
π§ Supported Tasks & Capabilities
TTM-R3 is a family of pretrained models supporting multiple real-world forecasting scenarios:
- Zero-shot forecasting across unseen datasets
- Few-shot adaptation (effective with as few as ~1K samples)
- Full fine-tuning for domain-specific optimization
- Multivariate time-series forecasting
- Exogenous / control variable integration
- High-throughput batch inference for production systems
π Accuracy & Speed
TTM-R3 is engineered to achieve a strong balance between state-of-the-art accuracy and extreme inference efficiency, making it well-suited for real-world, high-throughput deployments.
π Accuracy (GIFT-Eval)
- Maintains top-tier performance on the GIFT-Eval leaderboard (PR under review)
- Fine-tuned FM:
- MASE: 0.713 | CRPS: 0.512
- (Lite) MASE: 0.721 | CRPS: 0.516
- Pre-trained FM:
- MASE: 0.724 | CRPS: 0.520
- (Lite) MASE: 0.734 | CRPS: 0.526
β‘ Inference Throughput
TTM-R3 delivers orders-of-magnitude faster inference compared to existing popular SOTA models.
π₯οΈ GPU Throughput
- Typical SOTA models: ~20β500 samples/sec
- TTM-R3: ~7,500 samples/sec
- TTM-R3 Lite: ~18,000 samples/sec
π» CPU Throughput
- Typical SOTA models: ~1β20 samples/sec
- TTM-R3: ~180 samples/sec
- TTM-R3 Lite: ~800 samples/sec
π TTM-R3 achieves ~15β50Γ speedup over many existing approaches, without compromising accuracy, setting a new benchmark for fast and reliable time-series forecasting.
π§© Architecture Overview
TTM adopts a mixture-of-experts paradigm composed of models with varying complexities[1-35M and Lite: 1-18M parameters], coupled with a lightweight routing mechanism that automatically selects or blends the most suitable expert based on input data characteristics. This enables adaptive model selection improving both accuracy and efficiency across diverse time-series scenarios. The architecture is built on efficient mixer-based designs that avoid expensive self-attention, instead leveraging linear gating-based attention mechanisms to capture temporal dependencies with significantly lower computational overhead. This combination allows TTM-R3 to deliver scalable, adaptive, and ultra-fast forecasting performance suitable for real-time and large-scale deployments.
π¬ Whatβs New in TTM-R3
TrendβResidual Decomposition
Separately models long-term trends and high-frequency residuals for improved structural learning.Three-Stage Pre-Training
Sequential training of trend, residual, and joint components for better stability and convergence.Student-Teacher Pretraining Student-teacher based pretraining for stable learning in noisy datasets.
Enhanced Data Augmentation
Structured perturbations improve robustness across domains.Improved Normalization Strategy
Stabilizes training across scale shifts and heterogeneous datasets.GLU Gating
Dynamic information flow control within mixer blocks.Multi-Resolution Temporal Layer
Captures dependencies across short-, medium-, and long-term horizons.FFT-Based Embeddings
Incorporates frequency-domain signals to model periodicity and seasonality.Register Tokens
Learnable global tokens to encode sequence-level semantics.Multi-Quantile Forecasting Head
Enables probabilistic forecasting with multiple quantiles.Refined Loss Weighting
Balances trend, residual, and quantile objectives for improved calibration.
β‘ Why TTM-R3
TTM-R3 is purpose-built for production-grade time-series systems where:
- Low latency is critical (real-time inference)
- High scale is required (millions of forecasts)
- Data is heterogeneous and evolving
- Compute constraints exist (CPU-friendly deployment)
It provides a practical combination of compactness, speed, and accuracy, making it suitable for industrial deployment.
Example recipes and notebooks
[To be released]
Publication
- TTM-R3 Paper: [To be released]
- Base TTM Paper: Link(Neurips 2024)
Training Data
- Datasets from GiftEvalPretrain, and Train (Non-leaking historical context).
- Custom synthesized data: Based on KernelSynth
Model Card Authors
Vijay Ekambaram, Arindam Jati, Haoxiang Qiu, Takayuki Katsuki, Tomoya Sakai, Priyanshul Govil, Pankaj Dayama
Citation
Please cite the following paper if you intend to use our model or its associated architectures/approaches in your work.
BibTeX:
@inproceedings{ekambaram2024tinytimemixersttms,
title={Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series},
author={Vijay Ekambaram and Arindam Jati and Pankaj Dayama and Sumanta Mukherjee and Nam H. Nguyen and Wesley M. Gifford and Chandra Reddy and Jayant Kalagnanam},
booktitle={Advances in Neural Information Processing Systems (NeurIPS 2024)},
year={2024},
}
IBM Public Repository Disclosure
All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.
- Downloads last month
- 23,592