8Fai
/

chorous-27m

Time Series Forecasting

patch-transformer

Model card Files Files and versions

chorous-27m / README.md

Ill-Ness's picture

Create README.md

55f33ee verified 25 days ago

|

history blame contribute delete

3.44 kB

	---
	license: other
	license_name: 8f-ai-license-v1.0
	license_link: https://huggingface.co/8Fai/license
	tags:
	- time-series-forecasting
	- pytorch
	- gqa
	- rope
	- swiglu
	- revin
	- patch-transformer
	language:
	- en
	library_name: torch
	---

	# Chorous1

	<p align="center">
	<img src="https://img.shields.io/badge/Parameters-100M%20%7C%2050M%20%7C%2027M-brightgreen?style=flat-square" />
	<img src="https://img.shields.io/badge/Architecture-Patch--Transformer-purple?style=flat-square" />
	<img src="https://img.shields.io/badge/License-8f--ai--license--v1.0-red?style=flat-square" />
	</p>

	> Chorous1 is a suite of three high-performance, patch-based transformer models for multivariate time-series forecasting. Combining RevIN, MAE-style patch masking, and a Flatten Head architecture, Chorous1 delivers state-of-the-art accuracy on real-world benchmark data.

	---

	## Table of Contents

	- [Model Variants](#model-variants)
	- [Architecture](#architecture)
	- [Quickstart](#quickstart)
	- [Performance](#performance)
	- [Limitations](#limitations)
	- [License](#license)

	---

	## Model Variants

	\| Variant \| Parameters \| Hidden Size \| Layers \| Query Heads / KV Heads \|
	\|---\|---\|---\|---\|---\|
	\| `chorous1-100m` \| ~100M \| 768 \| 12 \| 12 / 4 \|
	\| `chorous1-50m` \| ~50M \| 512 \| 16 \| 8 / 2 \|
	\| `chorous1-27m` \| ~27M \| 384 \| 16 \| 6 / 2 \|

	---

	## Architecture

	\| Component \| Specification \|
	\|---\|---\|
	\| Context Length \| 512 steps \|
	\| Forecast Horizon \| 96 steps \|
	\| Patch Size \| 16 (non-overlapping) \|
	\| Number of Patches \| 32 \|
	\| FFN Multiplier \| 2.667× \|
	\| Activation \| SwiGLU \|
	\| Positional Encoding \| RoPE (θ = 500,000) \|
	\| Normalization \| RMSNorm \|
	\| Masking Ratio \| 25% (training only) \|
	\| Loss Function \| Huber Loss + MAE \|
	\| Precision \| bfloat16 \|

	### How It Works

	Stage 1 — Neural Encoding. The transformer encoder processes patches of time-series data using RoPE and GQA to capture long-range temporal dependencies and periodic structure.

	Stage 2 — RevIN Normalization. A reversible instance normalization layer removes mean and variance shifts from the input prior to processing, then restores them on the output — eliminating the distribution mismatch problem common in real-world deployments.

	---

	## Quickstart

	```python
	import torch
	from safetensors.torch import load_file

	# Replace "100m" with "50m" or "27m" as needed
	weights = load_file("./chorous_checkpoint/100m/model.safetensors")
	model.load_state_dict(weights)
	model.eval()

	# Input shape: [Batch, Channels, Time]
	x = torch.randn(1, 7, 512)

	with torch.no_grad():
	forecast = model(x) # Output shape: [1, 7, 96]
	```

	---

	## Performance

	\| Metric \| `chorous1-100m` \| `chorous1-50m` \| `chorous1-27m` \|
	\|---\|---\|---\|---\|
	\| Weights Size \| ~200 MB \| ~110 MB \| ~65 MB \|
	\| VRAM (Inference) \| ~12 GB \| ~8 GB \| ~6 GB \|

	---

	## Limitations

	- Fixed Forecast Horizon — Optimized for 96-step forecasting. Modifying the output head for longer horizons may reduce accuracy.
	- Channel Count Constraint — The RevIN layer is initialized using the maximum channel count from the training suite. Inputs exceeding this limit are not supported out of the box.
	- Patch Alignment Requirement — Input context length must be an exact multiple of the patch size (16).

	---

	## License

	Chorous1 is released under the [8f-ai-license-v1.0](https://huggingface.co/8Fai/license). Please review the full terms before use in production or commercial applications.