Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Chorous1
|
| 2 |
+
|
| 3 |
+
<p align="center">
|
| 4 |
+
<img src="https://img.shields.io/badge/Parameters-100M%20%7C%2050M%20%7C%2027M-brightgreen?style=flat-square" />
|
| 5 |
+
<img src="https://img.shields.io/badge/Architecture-Patch--Transformer-purple?style=flat-square" />
|
| 6 |
+
<img src="https://img.shields.io/badge/License-8f--ai--license--v1.0-red?style=flat-square" />
|
| 7 |
+
</p>
|
| 8 |
+
|
| 9 |
+
> **Chorous1** is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Table of Contents
|
| 14 |
+
|
| 15 |
+
- [Model Variants](#model-variants)
|
| 16 |
+
- [Architecture](#architecture)
|
| 17 |
+
- [Quickstart](#quickstart)
|
| 18 |
+
- [Performance](#performance)
|
| 19 |
+
- [Limitations](#limitations)
|
| 20 |
+
- [License](#license)
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Model Variants
|
| 25 |
+
|
| 26 |
+
| Variant | Parameters | Hidden Size | Layers | Query Heads / KV Heads |
|
| 27 |
+
|---|---|---|---|---|
|
| 28 |
+
| `chorous1-100m` | ~100M | 768 | 12 | 12 / 4 |
|
| 29 |
+
| `chorous1-50m` | ~50M | 512 | 16 | 8 / 2 |
|
| 30 |
+
| `chorous1-27m` | ~27M | 384 | 16 | 6 / 2 |
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Architecture
|
| 35 |
+
|
| 36 |
+
| Component | Specification |
|
| 37 |
+
|---|---|
|
| 38 |
+
| Context Length | 512 steps |
|
| 39 |
+
| Forecast Horizon | 96 steps |
|
| 40 |
+
| Patch Size | 16 (non-overlapping) |
|
| 41 |
+
| Number of Patches | 32 |
|
| 42 |
+
| FFN Multiplier | 2.667× |
|
| 43 |
+
| Activation | SwiGLU |
|
| 44 |
+
| Positional Encoding | RoPE (θ = 500,000) |
|
| 45 |
+
| Normalization | RMSNorm |
|
| 46 |
+
| Masking Ratio | 25% (training only) |
|
| 47 |
+
| Loss Function | Huber Loss + MAE |
|
| 48 |
+
| Precision | bfloat16 |
|
| 49 |
+
|
| 50 |
+
### How It Works
|
| 51 |
+
|
| 52 |
+
**Stage 1 — Neural Encoding.** The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure.
|
| 53 |
+
|
| 54 |
+
**Stage 2 — RevIN Normalization.** A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## Quickstart
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
import torch
|
| 62 |
+
from safetensors.torch import load_file
|
| 63 |
+
|
| 64 |
+
# Replace "100m" with "50m" or "27m" as needed
|
| 65 |
+
weights = load_file("./chorous_checkpoint/100m/model.safetensors")
|
| 66 |
+
model.load_state_dict(weights)
|
| 67 |
+
model.eval()
|
| 68 |
+
|
| 69 |
+
# Input shape: [Batch, Channels, Time]
|
| 70 |
+
x = torch.randn(1, 7, 512)
|
| 71 |
+
|
| 72 |
+
with torch.no_grad():
|
| 73 |
+
forecast = model(x) # Output shape: [1, 7, 96]
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Performance
|
| 79 |
+
|
| 80 |
+
| Metric | `chorous1-100m` | `chorous1-50m` | `chorous1-27m` |
|
| 81 |
+
|---|---|---|---|
|
| 82 |
+
| Weights Size | ~200 MB | ~110 MB | ~65 MB |
|
| 83 |
+
| VRAM (Inference) | ~12 GB | ~8 GB | ~6 GB |
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Limitations
|
| 88 |
+
|
| 89 |
+
- **Fixed Forecast Horizon** — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy.
|
| 90 |
+
- **Channel Count Constraint** — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box.
|
| 91 |
+
- **Patch Alignment Requirement** — Input context length must be an exact multiple of the patch size (16).
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## License
|
| 96 |
+
|
| 97 |
+
Chorous1 is released under the [8f-ai-license-v1.0](https://huggingface.co/8Fai/license). Please review the full terms before use in production or commercial applications.
|