# QuantMobileLLM — Lightweight GPT-Style Language Model MobileLLM is a **lightweight GPT-style language model** designed for efficiency, fast inference, and small deployment environments. It’s trained on **FineWeb-MINI** and optimized with **modern attention techniques**. --- ## 🚀 Model Highlights - **Architecture**: Decoder-only GPT-style transformer - **Parameters**: ~17M (6 layers, 8 heads, 256 embedding dim) - **Context Length**: 512 tokens - **Vocabulary Size**: 50,304 tokens - **Precision**: Supports both `fp16` and `bf16` - **Optimized for**: Small GPUs, mobile inference, --- ## 🧠 Architecture Details | **Component** | **Value** | |--------------------|-----------| | Layers | 6 | | Attention Heads | 8 | | KV Heads | 4 | | Embedding Dim | 256 | | Context Length | 512 | | Vocab Size | 50,304 | | Attention Type | Multi-Query Attention | | Norm Type | RMSNorm | | Position Encoding | Rotary Position Embeddings (RoPE) | | FFN Activation | SwiGLU (`silu`) | ### 🔹 Key Optimizations - **RMSNorm** → Improves training stability over LayerNorm. - **Multi-Query Attention** → Reduces KV-cache size → lower memory footprint. - **Rotary Embeddings (RoPE)** → Better handling of long context windows. - **`safetensors` checkpoints** → Faster & safer loading. --- ## 📊 Training Setup | **Property** | **Value** | |------------------------|-----------| | Dataset | [FineWeb-MINI](https://huggingface.co/datasets/AryanNsc/FineWeb-Mini) | | Tokens Trained | ~100M | | Optimizer | AdamW | | Learning Rate | 6e-4 (cosine decay) | | Warmup Steps | 100 | | Batch Size | 64 × 2 grad accum | | Effective Batch Size | 128 | | Mixed Precision | `fp16` / `bf16` (auto-detect) | | Distributed Training | DDP | | Logging | Weights & Biases (`wandb`) | | Checkpoint Format | `.safetensors` | --- ## 🧩 Model Checkpoints | **Step** | **Filename** | **Format** | |----------|------------|------------| | Final | `mobile_llm_final.safetensors` | safetensors | | Intermediate | `checkpoints/mobile_llm_step_.safetensors` | safetensors | --- ## 🔮 Roadmap - [x] Train **MobileLLM** on **FineWeb-MINI** - [x] Add **multi-query attention** - [x] Export **safetensors** checkpoints - [ ] Quantized **int8** & **int4** inference - [ ] Expand training on **FineWeb-1B** --- ## 📜 License This model is licensed under the [MIT License](LICENSE). --- ## 🌐 Links - **Github** → [MobileLLM training code](https://github.com/Guney-olu/Quantgpt)