QuantMobileLLM — Lightweight GPT-Style Language Model

MobileLLM is a lightweight GPT-style language model designed for efficiency, fast inference, and small deployment environments.
It’s trained on FineWeb-MINI and optimized with modern attention techniques.

🚀 Model Highlights

Architecture: Decoder-only GPT-style transformer
Parameters: ~17M (6 layers, 8 heads, 256 embedding dim)
Context Length: 512 tokens
Vocabulary Size: 50,304 tokens
Precision: Supports both fp16 and bf16
Optimized for: Small GPUs, mobile inference,

🧠 Architecture Details

Component	Value
Layers	6
Attention Heads	8
KV Heads	4
Embedding Dim	256
Context Length	512
Vocab Size	50,304
Attention Type	Multi-Query Attention
Norm Type	RMSNorm
Position Encoding	Rotary Position Embeddings (RoPE)
FFN Activation	SwiGLU (`silu`)

🔹 Key Optimizations

RMSNorm → Improves training stability over LayerNorm.
Multi-Query Attention → Reduces KV-cache size → lower memory footprint.
Rotary Embeddings (RoPE) → Better handling of long context windows.
safetensors checkpoints → Faster & safer loading.

📊 Training Setup

Property	Value
Dataset	FineWeb-MINI
Tokens Trained	~100M
Optimizer	AdamW
Learning Rate	6e-4 (cosine decay)
Warmup Steps	100
Batch Size	64 × 2 grad accum
Effective Batch Size	128
Mixed Precision	`fp16` / `bf16` (auto-detect)
Distributed Training	DDP
Logging	Weights & Biases (`wandb`)
Checkpoint Format	`.safetensors`

🧩 Model Checkpoints

Step	Filename	Format
Final	`mobile_llm_final.safetensors`	safetensors
Intermediate	`checkpoints/mobile_llm_step_<step>.safetensors`	safetensors

🔮 Roadmap

Train MobileLLM on FineWeb-MINI
Add multi-query attention
Export safetensors checkpoints
Quantized int8 & int4 inference
Expand training on FineWeb-1B

📜 License

This model is licensed under the MIT License.

🌐 Links

Github → MobileLLM training code