Aetheris β€” Hybrid Mamba-MoE Multilingual Model

Aetheris is a ~800M parameter hybrid SSM/MoE language model distilled from CohereLabs/tiny-aya-global (3.35B).

Built by Wayy Research.

Architecture

  • Type: Hybrid Mamba (SSM) + Mixture of Experts (MoE)
  • Layers: 24 (interleaved: even=SSM, odd=MoE)
  • Hidden dim: 1024
  • Experts: 4 per MoE layer, top-1 routing
  • SSM state dim: 16
  • Vocab size: 256,000 (shared with tiny-aya-global)
  • Parameters: ~800M

Training

3-stage MambaInLlama distillation pipeline:

Stage Method Data Steps
1 CKA-guided Layer Alignment ClimbMix 10,000
2 KL Distillation (T=2.0, alpha=0.7) ClimbMix 20,000
3 Supervised Fine-Tuning aya_collection 5,000

Key research findings applied:

  • SSM 10x LR boost (compensates 27x gradient imbalance)
  • SVD split for MoE expert initialization (CKA=0.097 diversity)
  • Per-language KL tracking for multilingual equity

Current Checkpoint

  • Stage: 2 (kl-distillation)
  • Step: 18000
  • Loss: 3.4199
  • Updated: 2026-03-13T01:45:14.154527+00:00

Languages

Supports 70+ languages inherited from tiny-aya-global. Core evaluation languages: English, Spanish, Hindi, Chinese, Arabic, Swahili, Turkish, Japanese, Indonesian, Telugu.

Citation

@misc{aetheris2026,
  title={Aetheris: Hybrid Mamba-MoE Multilingual Model via Knowledge Distillation},
  author={Wayy Research},
  year={2026},
  url={https://huggingface.co/wayyresearch/aetheris}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using wayyresearch/aetheris 1