Aetheris β Hybrid Mamba-MoE Multilingual Model
Aetheris is a ~800M parameter hybrid SSM/MoE language model distilled from CohereLabs/tiny-aya-global (3.35B).
Built by Wayy Research.
Architecture
- Type: Hybrid Mamba (SSM) + Mixture of Experts (MoE)
- Layers: 24 (interleaved: even=SSM, odd=MoE)
- Hidden dim: 1024
- Experts: 4 per MoE layer, top-1 routing
- SSM state dim: 16
- Vocab size: 256,000 (shared with tiny-aya-global)
- Parameters: ~800M
Training
3-stage MambaInLlama distillation pipeline:
| Stage | Method | Data | Steps |
|---|---|---|---|
| 1 | CKA-guided Layer Alignment | ClimbMix | 10,000 |
| 2 | KL Distillation (T=2.0, alpha=0.7) | ClimbMix | 20,000 |
| 3 | Supervised Fine-Tuning | aya_collection | 5,000 |
Key research findings applied:
- SSM 10x LR boost (compensates 27x gradient imbalance)
- SVD split for MoE expert initialization (CKA=0.097 diversity)
- Per-language KL tracking for multilingual equity
Current Checkpoint
- Stage: 2 (kl-distillation)
- Step: 18000
- Loss: 3.4199
- Updated: 2026-03-13T01:45:14.154527+00:00
Languages
Supports 70+ languages inherited from tiny-aya-global. Core evaluation languages: English, Spanish, Hindi, Chinese, Arabic, Swahili, Turkish, Japanese, Indonesian, Telugu.
Citation
@misc{aetheris2026,
title={Aetheris: Hybrid Mamba-MoE Multilingual Model via Knowledge Distillation},
author={Wayy Research},
year={2026},
url={https://huggingface.co/wayyresearch/aetheris}
}