Mirror-MoE-80M
A Sparse Mixture-of-Experts language model optimized for edge devices.
| Metric | Value |
|---|---|
| Total Parameters | 81M |
| Active Parameters | 37M (2.2x sparse) |
| Experts | 16 Sparse + 1 Shared Anchor |
| Context | 512 tokens |
| Speed (Apple M4) | 111 tokens/sec |
π₯ Key Features
- Extreme Efficiency: Only 37M parameters compute per token
- Mobile-Ready: Runs at 100+ tok/s on Apple Silicon
- Dual-Mode: Chat and RAG (context extraction) capable
π¦ Model Variants
| File | Best For |
|---|---|
mirror_ai_hybrid.safetensors |
General Chat + Fact Retrieval |
mirror_ai_elite.safetensors |
Logic + Instruction Following |
π Benchmarks
| Benchmark | Mirror-MoE-80M | Pythia-70M | Random |
|---|---|---|---|
| PIQA | 53.6% | 56% | 50% |
| ARC-Easy | 32.2% | 37% | 25% |
| HellaSwag | 25.6% | 26% | 25% |
Mirror-MoE achieves Pythia-70M-level PIQA with half the compute (37M vs 70M active params).
π Quick Start
Apple Silicon (MLX)
pip install mlx tokenizers
python inference.py
PyTorch (CPU/CUDA)
pip install torch safetensors tokenizers
python inference_pytorch.py
π Files
| File | Description |
|---|---|
mirror_ai_hybrid.safetensors |
Hybrid model weights (309MB) |
mirror_ai_elite.safetensors |
Elite model weights (309MB) |
custom_bpe_32k.json |
BPE tokenizer (32k vocab) |
model.py |
MLX architecture |
model_pytorch.py |
PyTorch architecture |
inference.py |
MLX inference script |
inference_pytorch.py |
PyTorch inference script |
ποΈ Architecture
MirrorTransformer (81M total)
βββ Embedding (16M)
βββ 8x TransformerBlock
β βββ Attention (RoPE)
β βββ MoE Layer
β βββ Shared Expert (512-dim, always active)
β βββ 16 Sparse Experts (256-dim, Top-2 routing)
βββ Output Head (16M)
π Citation
Research Paper
@misc{mirror2026moe,
title={Mirror-MoE-80M: Anchor-Stabilized Granular Mixture of Experts for Low-Resource Training},
author={Dipesh Majithia},
year={2026},
publisher={Zenodo},
doi={10.5281/zenodo.18473273},
url={https://zenodo.org/records/18473273}
}
β οΈ Disclaimer
This is a research model. Outputs may be incorrect or biased. Not for production use without additional safety measures.
π License
CC BY 4.0 - Free to use with attribution to MirrorAI / Dipesh Majithia