| --- |
| license: other |
| license_name: 8f-ai-license-v1.0 |
| license_link: https://huggingface.co/8Fai/license |
| tags: |
| - time-series-forecasting |
| - pytorch |
| - gqa |
| - rope |
| - swiglu |
| - revin |
| - patch-transformer |
| language: |
| - en |
| library_name: torch |
| --- |
| |
| # Chorous1 |
|
|
| <p align="center"> |
| <img src="https://img.shields.io/badge/Parameters-100M%20%7C%2050M%20%7C%2027M-brightgreen?style=flat-square" /> |
| <img src="https://img.shields.io/badge/Architecture-Patch--Transformer-purple?style=flat-square" /> |
| <img src="https://img.shields.io/badge/License-8f--ai--license--v1.0-red?style=flat-square" /> |
| </p> |
|
|
| > **Chorous1** is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data. |
|
|
| --- |
|
|
| ## Table of Contents |
|
|
| - [Model Variants](#model-variants) |
| - [Architecture](#architecture) |
| - [Quickstart](#quickstart) |
| - [Performance](#performance) |
| - [Limitations](#limitations) |
| - [License](#license) |
|
|
| --- |
|
|
| ## Model Variants |
|
|
| | Variant | Parameters | Hidden Size | Layers | Query Heads / KV Heads | |
| |---|---|---|---|---| |
| | `chorous1-100m` | ~100M | 768 | 12 | 12 / 4 | |
| | `chorous1-50m` | ~50M | 512 | 16 | 8 / 2 | |
| | `chorous1-27m` | ~27M | 384 | 16 | 6 / 2 | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| | Component | Specification | |
| |---|---| |
| | Context Length | 512 steps | |
| | Forecast Horizon | 96 steps | |
| | Patch Size | 16 (non-overlapping) | |
| | Number of Patches | 32 | |
| | FFN Multiplier | 2.667× | |
| | Activation | SwiGLU | |
| | Positional Encoding | RoPE (θ = 500,000) | |
| | Normalization | RMSNorm | |
| | Masking Ratio | 25% (training only) | |
| | Loss Function | Huber Loss + MAE | |
| | Precision | bfloat16 | |
|
|
| ### How It Works |
|
|
| **Stage 1 — Neural Encoding.** The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure. |
|
|
| **Stage 2 — RevIN Normalization.** A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments. |
|
|
| --- |
|
|
| ## Quickstart |
|
|
| ```python |
| import torch |
| from safetensors.torch import load_file |
| |
| # Replace "100m" with "50m" or "27m" as needed |
| weights = load_file("./chorous_checkpoint/100m/model.safetensors") |
| model.load_state_dict(weights) |
| model.eval() |
| |
| # Input shape: [Batch, Channels, Time] |
| x = torch.randn(1, 7, 512) |
| |
| with torch.no_grad(): |
| forecast = model(x) # Output shape: [1, 7, 96] |
| ``` |
|
|
| --- |
|
|
| ## Performance |
|
|
| | Metric | `chorous1-100m` | `chorous1-50m` | `chorous1-27m` | |
| |---|---|---|---| |
| | Weights Size | ~200 MB | ~110 MB | ~65 MB | |
| | VRAM (Inference) | ~12 GB | ~8 GB | ~6 GB | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **Fixed Forecast Horizon** — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy. |
| - **Channel Count Constraint** — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box. |
| - **Patch Alignment Requirement** — Input context length must be an exact multiple of the patch size (16). |
|
|
| --- |
|
|
| ## License |
|
|
| Chorous1 is released under the [8f-ai-license-v1.0](https://huggingface.co/8Fai/license). Please review the full terms before use in production or commercial applications. |