| # QuantMobileLLM โ Lightweight GPT-Style Language Model | |
| MobileLLM is a **lightweight GPT-style language model** designed for efficiency, fast inference, and small deployment environments. | |
| Itโs trained on **FineWeb-MINI** and optimized with **modern attention techniques**. | |
| --- | |
| ## ๐ Model Highlights | |
| - **Architecture**: Decoder-only GPT-style transformer | |
| - **Parameters**: ~17M (6 layers, 8 heads, 256 embedding dim) | |
| - **Context Length**: 512 tokens | |
| - **Vocabulary Size**: 50,304 tokens | |
| - **Precision**: Supports both `fp16` and `bf16` | |
| - **Optimized for**: Small GPUs, mobile inference, | |
| --- | |
| ## ๐ง Architecture Details | |
| | **Component** | **Value** | | |
| |--------------------|-----------| | |
| | Layers | 6 | | |
| | Attention Heads | 8 | | |
| | KV Heads | 4 | | |
| | Embedding Dim | 256 | | |
| | Context Length | 512 | | |
| | Vocab Size | 50,304 | | |
| | Attention Type | Multi-Query Attention | | |
| | Norm Type | RMSNorm | | |
| | Position Encoding | Rotary Position Embeddings (RoPE) | | |
| | FFN Activation | SwiGLU (`silu`) | | |
| ### ๐น Key Optimizations | |
| - **RMSNorm** โ Improves training stability over LayerNorm. | |
| - **Multi-Query Attention** โ Reduces KV-cache size โ lower memory footprint. | |
| - **Rotary Embeddings (RoPE)** โ Better handling of long context windows. | |
| - **`safetensors` checkpoints** โ Faster & safer loading. | |
| --- | |
| ## ๐ Training Setup | |
| | **Property** | **Value** | | |
| |------------------------|-----------| | |
| | Dataset | [FineWeb-MINI](https://huggingface.co/datasets/AryanNsc/FineWeb-Mini) | | |
| | Tokens Trained | ~100M | | |
| | Optimizer | AdamW | | |
| | Learning Rate | 6e-4 (cosine decay) | | |
| | Warmup Steps | 100 | | |
| | Batch Size | 64 ร 2 grad accum | | |
| | Effective Batch Size | 128 | | |
| | Mixed Precision | `fp16` / `bf16` (auto-detect) | | |
| | Distributed Training | DDP | | |
| | Logging | Weights & Biases (`wandb`) | | |
| | Checkpoint Format | `.safetensors` | | |
| --- | |
| ## ๐งฉ Model Checkpoints | |
| | **Step** | **Filename** | **Format** | | |
| |----------|------------|------------| | |
| | Final | `mobile_llm_final.safetensors` | safetensors | | |
| | Intermediate | `checkpoints/mobile_llm_step_<step>.safetensors` | safetensors | | |
| --- | |
| ## ๐ฎ Roadmap | |
| - [x] Train **MobileLLM** on **FineWeb-MINI** | |
| - [x] Add **multi-query attention** | |
| - [x] Export **safetensors** checkpoints | |
| - [ ] Quantized **int8** & **int4** inference | |
| - [ ] Expand training on **FineWeb-1B** | |
| --- | |
| ## ๐ License | |
| This model is licensed under the [MIT License](LICENSE). | |
| --- | |
| ## ๐ Links | |
| - **Github** โ [MobileLLM training code](https://github.com/Guney-olu/Quantgpt) | |