8Fai
/

chorous-50m

+# Chorous1
+<p align="center">
+  <img src="https://img.shields.io/badge/Parameters-100M%20%7C%2050M%20%7C%2027M-brightgreen?style=flat-square" />
+  <img src="https://img.shields.io/badge/Architecture-Patch--Transformer-purple?style=flat-square" />
+  <img src="https://img.shields.io/badge/License-8f--ai--license--v1.0-red?style=flat-square" />
+</p>
+> **Chorous1** is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data.
+---
+## Table of Contents
+- [Model Variants](#model-variants)
+- [Architecture](#architecture)
+- [Quickstart](#quickstart)
+- [Performance](#performance)
+- [Limitations](#limitations)
+- [License](#license)
+---
+## Model Variants
+| Variant | Parameters | Hidden Size | Layers | Query Heads / KV Heads |
+|---|---|---|---|---|
+| `chorous1-100m` | ~100M | 768 | 12 | 12 / 4 |
+| `chorous1-50m` | ~50M | 512 | 16 | 8 / 2 |
+| `chorous1-27m` | ~27M | 384 | 16 | 6 / 2 |
+---
+## Architecture
+| Component | Specification |
+|---|---|
+| Context Length | 512 steps |
+| Forecast Horizon | 96 steps |
+| Patch Size | 16 (non-overlapping) |
+| Number of Patches | 32 |
+| FFN Multiplier | 2.667× |
+| Activation | SwiGLU |
+| Positional Encoding | RoPE (θ = 500,000) |
+| Normalization | RMSNorm |
+| Masking Ratio | 25% (training only) |
+| Loss Function | Huber Loss + MAE |
+| Precision | bfloat16 |
+### How It Works
+**Stage 1 — Neural Encoding.** The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure.
+**Stage 2 — RevIN Normalization.** A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments.
+---
+## Quickstart
+```python
+import torch
+from safetensors.torch import load_file
+# Replace "100m" with "50m" or "27m" as needed
+weights = load_file("./chorous_checkpoint/100m/model.safetensors")
+model.load_state_dict(weights)
+model.eval()
+# Input shape: [Batch, Channels, Time]
+x = torch.randn(1, 7, 512)
+with torch.no_grad():
+    forecast = model(x)  # Output shape: [1, 7, 96]
+```
+---
+## Performance
+| Metric | `chorous1-100m` | `chorous1-50m` | `chorous1-27m` |
+|---|---|---|---|
+| Weights Size | ~200 MB | ~110 MB | ~65 MB |
+| VRAM (Inference) | ~12 GB | ~8 GB | ~6 GB |
+---
+## Limitations
+- **Fixed Forecast Horizon** — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy.
+- **Channel Count Constraint** — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box.
+- **Patch Alignment Requirement** — Input context length must be an exact multiple of the patch size (16).
+---
+## License
+Chorous1 is released under the [8f-ai-license-v1.0](https://huggingface.co/8Fai/license). Please review the full terms before use in production or commercial applications.