You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Mirror-MoE-80M

A Sparse Mixture-of-Experts language model optimized for edge devices.

Metric Value
Total Parameters 81M
Active Parameters 37M (2.2x sparse)
Experts 16 Sparse + 1 Shared Anchor
Context 512 tokens
Speed (Apple M4) 111 tokens/sec

πŸ”₯ Key Features

  • Extreme Efficiency: Only 37M parameters compute per token
  • Mobile-Ready: Runs at 100+ tok/s on Apple Silicon
  • Dual-Mode: Chat and RAG (context extraction) capable

πŸ“¦ Model Variants

File Best For
mirror_ai_hybrid.safetensors General Chat + Fact Retrieval
mirror_ai_elite.safetensors Logic + Instruction Following

πŸ“Š Benchmarks

Benchmark Mirror-MoE-80M Pythia-70M Random
PIQA 53.6% 56% 50%
ARC-Easy 32.2% 37% 25%
HellaSwag 25.6% 26% 25%

Mirror-MoE achieves Pythia-70M-level PIQA with half the compute (37M vs 70M active params).

πŸš€ Quick Start

Apple Silicon (MLX)

pip install mlx tokenizers
python inference.py

PyTorch (CPU/CUDA)

pip install torch safetensors tokenizers
python inference_pytorch.py

πŸ“ Files

File Description
mirror_ai_hybrid.safetensors Hybrid model weights (309MB)
mirror_ai_elite.safetensors Elite model weights (309MB)
custom_bpe_32k.json BPE tokenizer (32k vocab)
model.py MLX architecture
model_pytorch.py PyTorch architecture
inference.py MLX inference script
inference_pytorch.py PyTorch inference script

πŸ—οΈ Architecture

MirrorTransformer (81M total)
β”œβ”€β”€ Embedding (16M)
β”œβ”€β”€ 8x TransformerBlock
β”‚   β”œβ”€β”€ Attention (RoPE)
β”‚   └── MoE Layer
β”‚       β”œβ”€β”€ Shared Expert (512-dim, always active)
β”‚       └── 16 Sparse Experts (256-dim, Top-2 routing)
└── Output Head (16M)

πŸ“œ Citation

Research Paper

Read the Full Paper on Zenodo

@misc{mirror2026moe,
  title={Mirror-MoE-80M: Anchor-Stabilized Granular Mixture of Experts for Low-Resource Training},
  author={Dipesh Majithia},
  year={2026},
  publisher={Zenodo},
  doi={10.5281/zenodo.18473273},
  url={https://zenodo.org/records/18473273}
}

⚠️ Disclaimer

This is a research model. Outputs may be incorrect or biased. Not for production use without additional safety measures.

πŸ“„ License

CC BY 4.0 - Free to use with attribution to MirrorAI / Dipesh Majithia

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using dipeshmajithia/Mirror-80M-MoE 1