Hallamonic 1B
TinyLlama-1.1B with all 22 LlamaAttention layers replaced by HarmonicBlock, a
hierarchical state-space module. This removes TinyLlama's RoPE positional limit
(max_position_embeddings=2048): the unmodified model degrades catastrophically past
2K tokens, Hallamonic does not.
This is the final checkpoint (Phase 2, full fine-tune, step 5000) from the
Harmonic paper: arxiv.org/abs/2606.24650.
See Omibranch/Harmonic for code and the
small-scale (7M-112M param) Harmonic results this architecture is based on.
Result
Evaluated on three independent held-out benchmarks, none overlapping the fineweb-edu training data:
| Dataset | Seq len | Hallamonic (bpt) | TinyLlama (bpt) | ฮ |
|---|---|---|---|---|
| WikiText-103 | 1,024 | 0.43 | 2.84 | +2.41 |
| WikiText-103 | 8,192 | 0.48 | 10.56 | +10.08 |
| Lambada (clean) | 1,024 | 0.44 | 4.29 | +3.86 |
| Lambada (clean) | 8,192 | 0.45 | 9.89 | +9.44 |
| fineweb-edu held-out | 1,024 | 0.36 | 3.37 | +3.00 |
| fineweb-edu held-out | 8,192 | 0.36 | 10.82 | +10.45 |
TinyLlama's loss explodes past its 2K RoPE limit on every dataset. Hallamonic's loss at 8K is within 0.02-0.04 bpt of its 1K value across all three.
Training
- Base:
TinyLlama/TinyLlama-1.1B-Chat-v1.0(892M params frozen: FFN + embeddings) - New: 141M params (HarmonicBlock,
d_state=128, compress ratioK=4) - Phase 1 (SSM warmup): 10K steps, seq=512, batch=4, FFN frozen, lr 3e-4
- Phase 2 (full fine-tune): 5K steps, seq=1024, batch=8, grad-accum=4, lr 3e-5
- Data: fineweb-edu (sample-10BT)
- Cost: ~$15 on a single H100 (Modal), under 3 hours total
Usage
This is a custom architecture, not a stock transformers model class. Load it with
the code in the Harmonic repo:
git clone https://github.com/Omibranch/Harmonic
cd Harmonic/hallamonic
from huggingface_hub import snapshot_download
from model import load_hallamonic
ckpt = snapshot_download("Omibranch/harmonic-checkpoints-phase2-final")
model, tokenizer = load_hallamonic(ckpt, device="cuda")
model.eval()
input_ids = tokenizer.encode("The theory of relativity states that", return_tensors="pt").to("cuda")
out = model.generate(input_ids, max_new_tokens=150, do_sample=True, temperature=0.8, top_p=0.9)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Limitations
Single training run, no multi-seed replication. Evaluated on English text only. Not benchmarked for instruction-following or chat quality โ this is a base language model demonstrating an architectural property (no positional limit at long context), not a general-purpose assistant.
License
Dual-licensed: free for noncommercial use (research, study, evaluation) under the PolyForm Noncommercial License 1.0.0. Commercial use requires a separate license โ see LICENSE-COMMERCIAL.md.
Citation
@software{harmonic2026,
title = {Harmonic: Hierarchical State Space Models},
author = {Omibranch and {Harmonic Labs}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20381713},
url = {https://github.com/Omibranch/Harmonic}
}
Model tree for Omibranch/harmonic-checkpoints-phase2-final
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0