| |
|
| | --- |
| | license: mit |
| | tags: |
| | - time-series |
| | - mixture-of-experts |
| | - forecasting |
| | - pytorch |
| | - fft |
| | model-index: |
| | - name: SuperLinear |
| | results: [] |
| | --- |
| | |
| |
|
| | # Super-Linear: A Mixture of Experts Time Series Forecasting Model |
| |
|
| | SuperLinear is a novel time series forecasting model that employs a Mixture of Experts (MoE) architecture to achieve superior performance across various forecasting tasks. The model routes inputs to the most relevant experts based on frequency-domain analysis using FFT-based gating networks. |
| |
|
| | ## Model Architecture |
| |
|
| | The SuperLinear model consists of: |
| |
|
| | - **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts |
| | - **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing |
| | - **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns |
| |
|
| | ## Key Features |
| |
|
| | - **Adaptive Expert Selection**: Dynamic routing based on input characteristics |
| | - **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection |
| | - **Auto-regressive Capabilities**: Supports long-horizon forecasting |
| | - **Multi-scale Processing**: Handles various sequence lengths through resampling |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoConfig |
| | import torch |
| | |
| | # Load the model |
| | model = AutoModelForCausalLM.from_pretrained("SequentialLearning/SuperLinear", trust_remote_code=True) |
| | |
| | # Prepare input time series data |
| | # Shape: [batch_size, channel, sequence_length] or [batch_size, sequence_length] |
| | input_data = torch.randn(1, 1, 512) |
| | |
| | # Generate predictions |
| | with torch.no_grad(): |
| | outputs = model(inputs_embeds=input_data, pred_len=96, get_prob = True) |
| | preds = outputs.logits # Predicted values |
| | probs = outputs.attentions # Expert probabilities stored here |
| | |
| | ``` |
| |
|
| | ## Configuration |
| |
|
| | Key parameters: |
| |
|
| | - `train_seq_len`: Training sequence length (default: 512) |
| | - `train_pred_len`: Training prediction length (default: 96) |
| | - `top_k_experts`: Number of experts to use (default: 12) |
| | - `use_fft`: Whether to use FFT-based gating (default: True) |
| | - `freq_experts`: Frequency-specific expert configuration |
| | - `moe_temp`: Temperature for expert selection during inference (default: 1) |
| |
|
| | ## Links |
| |
|
| | - **GitHub Repository**: [https://github.com/azencot-group/SuperLinear](https://github.com/azencot-group/SuperLinear) |
| | - **Paper**: [https://arxiv.org/abs/2509.15105](https://arxiv.org/abs/2509.15105) |
| |
|
| | ## Citation |
| |
|
| | If you use SuperLinear in your research, please cite: |
| |
|
| | ```bibtex |
| | @article{nochumsohn2025super, |
| | title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting}, |
| | author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri}, |
| | journal={arXiv preprint arXiv:2509.15105}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is released under the MIT License. |
| |
|