QuantMobileLLM โ Lightweight GPT-Style Language Model
MobileLLM is a lightweight GPT-style language model designed for efficiency, fast inference, and small deployment environments.
Itโs trained on FineWeb-MINI and optimized with modern attention techniques.
๐ Model Highlights
- Architecture: Decoder-only GPT-style transformer
- Parameters: ~17M (6 layers, 8 heads, 256 embedding dim)
- Context Length: 512 tokens
- Vocabulary Size: 50,304 tokens
- Precision: Supports both
fp16 and bf16
- Optimized for: Small GPUs, mobile inference,
๐ง Architecture Details
| Component |
Value |
| Layers |
6 |
| Attention Heads |
8 |
| KV Heads |
4 |
| Embedding Dim |
256 |
| Context Length |
512 |
| Vocab Size |
50,304 |
| Attention Type |
Multi-Query Attention |
| Norm Type |
RMSNorm |
| Position Encoding |
Rotary Position Embeddings (RoPE) |
| FFN Activation |
SwiGLU (silu) |
๐น Key Optimizations
- RMSNorm โ Improves training stability over LayerNorm.
- Multi-Query Attention โ Reduces KV-cache size โ lower memory footprint.
- Rotary Embeddings (RoPE) โ Better handling of long context windows.
safetensors checkpoints โ Faster & safer loading.
๐ Training Setup
| Property |
Value |
| Dataset |
FineWeb-MINI |
| Tokens Trained |
~100M |
| Optimizer |
AdamW |
| Learning Rate |
6e-4 (cosine decay) |
| Warmup Steps |
100 |
| Batch Size |
64 ร 2 grad accum |
| Effective Batch Size |
128 |
| Mixed Precision |
fp16 / bf16 (auto-detect) |
| Distributed Training |
DDP |
| Logging |
Weights & Biases (wandb) |
| Checkpoint Format |
.safetensors |
๐งฉ Model Checkpoints
| Step |
Filename |
Format |
| Final |
mobile_llm_final.safetensors |
safetensors |
| Intermediate |
checkpoints/mobile_llm_step_<step>.safetensors |
safetensors |
๐ฎ Roadmap
๐ License
This model is licensed under the MIT License.
๐ Links