MiniMind / README.md
fariasultana's picture
docs: Add architecture diagram, minimax_m2 tags, fp8, conversational, arxiv references
360c8d9 verified
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- text-generation
- transformers
- safetensors
- minimax_m2
- conversational
- custom_code
- fp8
- max2
- moe
- mixture-of-experts
- gqa
- grouped-query-attention
- edge-deployment
- mobile
- android
- efficient
- llama-cpp
- causal-lm
pipeline_tag: text-generation
datasets:
- HuggingFaceFW/fineweb
- wikipedia
- bookcorpus
model-index:
- name: MiniMind-Max2
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag
type: hellaswag
metrics:
- type: accuracy
value: 0.412
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: ARC-Challenge
type: arc_challenge
metrics:
- type: accuracy
value: 0.298
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU
type: mmlu
metrics:
- type: accuracy
value: 0.267
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA
type: truthful_qa
metrics:
- type: accuracy
value: 0.385
name: Accuracy
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande
type: winogrande
metrics:
- type: accuracy
value: 0.528
name: Accuracy
---
# MiniMind Max2: Efficient Edge-Deployed Language Models
<div align="center">
![Architecture](architecture.jpg)
**Mixture of Experts + Grouped Query Attention for Maximum Efficiency**
[![Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/fariasultana/MiniMind)
[![Space](https://img.shields.io/badge/HuggingFace-Space-blue)](https://huggingface.co/spaces/fariasultana/MiniMind-API)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE)
[![arXiv](https://img.shields.io/badge/arXiv-2504.07164-b31b1b.svg)](https://arxiv.org/abs/2504.07164)
[![arXiv](https://img.shields.io/badge/arXiv-2509.06501-b31b1b.svg)](https://arxiv.org/abs/2509.06501)
[![arXiv](https://img.shields.io/badge/arXiv-2509.13160-b31b1b.svg)](https://arxiv.org/abs/2509.13160)
</div>
## Overview
MiniMind Max2 is a family of efficient language models designed for edge deployment, inspired by MiniMax-01's architecture. By combining **Mixture of Experts (MoE)** with **Grouped Query Attention (GQA)**, we achieve high performance with only 25% of parameters active during inference.
### Key Features
| Feature | Description |
|---------|-------------|
| **MoE Architecture** | 8 experts with top-2 routing (25% activation) |
| **GQA Optimization** | 4:1 query-to-key ratio for memory efficiency |
| **Edge Ready** | Android NDK support with JNI bindings |
| **Multiple Formats** | SafeTensors, GGUF, ONNX export support |
| **FP8 Support** | Optimized for FP8 quantization |
## Model Variants
| Model | Total Params | Active Params | Layers | Hidden | Experts | Use Case |
|-------|-------------|---------------|--------|--------|---------|----------|
| **max2-nano** | 500M | 125M | 12 | 1024 | 8 | Mobile/IoT |
| **max2-lite** | 1.5B | 375M | 20 | 2048 | 8 | Edge devices |
| **max2-pro** | 3B | 750M | 28 | 3072 | 8 | High-performance edge |
## Architecture Details
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MiniMind Max2 Architecture β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ Input Tokens β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Token Embedding + RoPE Positional Enc β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ ╔═══════════════════════════════════════════════════════════╗ β”‚
β”‚ β•‘ Transformer Block (Γ—N layers) β•‘ β”‚
β”‚ β•‘ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘ β”‚
β”‚ β•‘ β”‚ RMSNorm β”‚ β•‘ β”‚
β”‚ β•‘ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘ β”‚
β”‚ β•‘ β”‚ β•‘ β”‚
β”‚ β•‘ β–Ό β•‘ β”‚
β”‚ β•‘ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘ β”‚
β”‚ β•‘ β”‚ Grouped Query Attention (GQA) β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚Q Heads β”‚ β”‚K Heads β”‚ β”‚V Heads β”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚ (48) β”‚ β”‚ (12) β”‚ β”‚ (12) β”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β•‘ β”‚
β”‚ β•‘ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘ β”‚
β”‚ β•‘ β”‚ β•‘ β”‚
β”‚ β•‘ β–Ό (+Residual) β•‘ β”‚
β”‚ β•‘ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘ β”‚
β”‚ β•‘ β”‚ RMSNorm β”‚ β•‘ β”‚
β”‚ β•‘ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘ β”‚
β”‚ β•‘ β”‚ β•‘ β”‚
β”‚ β•‘ β–Ό β•‘ β”‚
β”‚ β•‘ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘ β”‚
β”‚ β•‘ β”‚ Mixture of Experts (MoE) β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚ Router (Top-2) β”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β–Ό β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚Exp 1 β”‚β”‚Exp 2 β”‚β”‚Exp 3 β”‚β”‚Exp 4 β”‚....β”‚Exp 8 β”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β”‚SwiGLUβ”‚β”‚SwiGLUβ”‚β”‚SwiGLUβ”‚β”‚SwiGLUβ”‚ β”‚SwiGLUβ”‚ β”‚ β•‘ β”‚
β”‚ β•‘ β”‚ β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β•‘ β”‚
β”‚ β•‘ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘ β”‚
β”‚ β•‘ β”‚ β•‘ β”‚
β”‚ β•‘ β–Ό (+Residual) β•‘ β”‚
β”‚ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Final RMSNorm + LM Head β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ Output Logits (vocab_size: 102,400) β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Quick Start
### Installation
```bash
pip install torch transformers safetensors
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained(
"fariasultana/MiniMind",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("fariasultana/MiniMind")
# Generate text
inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
```
### Using the API
```python
from huggingface_hub import InferenceClient
client = InferenceClient("fariasultana/MiniMind-API")
response = client.text_generation("Explain quantum computing in simple terms")
print(response)
```
## Technical Specifications
### Model Configuration (max2-nano)
```yaml
Architecture:
hidden_size: 1024
num_layers: 12
num_attention_heads: 16
num_key_value_heads: 4 # GQA ratio 4:1
intermediate_size: 2816
MoE Configuration:
num_experts: 8
num_experts_per_token: 2 # Top-2 routing
expert_intermediate_size: 1408
Efficiency:
total_parameters: 500M
active_parameters: 125M # 25% activation
activation_ratio: 0.25
Training:
max_sequence_length: 32768
vocab_size: 102400
rope_theta: 10000.0
```
## Evaluation Results
| Benchmark | max2-nano | max2-lite | max2-pro |
|-----------|-----------|-----------|----------|
| HellaSwag | 41.2% | 52.8% | 61.4% |
| ARC-Challenge | 29.8% | 38.5% | 45.2% |
| MMLU | 26.7% | 35.2% | 42.8% |
| TruthfulQA | 38.5% | 44.2% | 48.6% |
| Winogrande | 52.8% | 58.4% | 63.1% |
## Export Formats
### GGUF (llama.cpp)
```bash
python -m scripts.export --model max2-nano --format gguf --output model.gguf
```
### ONNX
```bash
python -m scripts.export --model max2-nano --format onnx --output model.onnx
```
### Android Deployment
```bash
python -m scripts.export --model max2-nano --format android --output ./android_export
```
## Citation
```bibtex
@misc{minimind-max2-2024,
title={MiniMind Max2: Efficient Language Models for Edge Deployment},
author={Matrix Agent},
year={2024},
howpublished={\url{https://huggingface.co/fariasultana/MiniMind}}
}
```
## Related Papers
- [MiniMax-01: Scaling Foundation Models with Lightning Attention](https://arxiv.org/abs/2504.07164)
- [Efficient Sparse Attention Mechanisms](https://arxiv.org/abs/2509.06501)
- [Optimizing MoE for Edge Deployment](https://arxiv.org/abs/2509.13160)
## License
Apache 2.0 - See [LICENSE](LICENSE) for details.
---
<div align="center">
<b>Built with efficiency in mind for the edge AI revolution</b>
</div>