YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

QuantMobileLLM โ€” Lightweight GPT-Style Language Model

MobileLLM is a lightweight GPT-style language model designed for efficiency, fast inference, and small deployment environments.
Itโ€™s trained on FineWeb-MINI and optimized with modern attention techniques.


๐Ÿš€ Model Highlights

  • Architecture: Decoder-only GPT-style transformer
  • Parameters: ~17M (6 layers, 8 heads, 256 embedding dim)
  • Context Length: 512 tokens
  • Vocabulary Size: 50,304 tokens
  • Precision: Supports both fp16 and bf16
  • Optimized for: Small GPUs, mobile inference,

๐Ÿง  Architecture Details

Component Value
Layers 6
Attention Heads 8
KV Heads 4
Embedding Dim 256
Context Length 512
Vocab Size 50,304
Attention Type Multi-Query Attention
Norm Type RMSNorm
Position Encoding Rotary Position Embeddings (RoPE)
FFN Activation SwiGLU (silu)

๐Ÿ”น Key Optimizations

  • RMSNorm โ†’ Improves training stability over LayerNorm.
  • Multi-Query Attention โ†’ Reduces KV-cache size โ†’ lower memory footprint.
  • Rotary Embeddings (RoPE) โ†’ Better handling of long context windows.
  • safetensors checkpoints โ†’ Faster & safer loading.

๐Ÿ“Š Training Setup

Property Value
Dataset FineWeb-MINI
Tokens Trained ~100M
Optimizer AdamW
Learning Rate 6e-4 (cosine decay)
Warmup Steps 100
Batch Size 64 ร— 2 grad accum
Effective Batch Size 128
Mixed Precision fp16 / bf16 (auto-detect)
Distributed Training DDP
Logging Weights & Biases (wandb)
Checkpoint Format .safetensors

๐Ÿงฉ Model Checkpoints

Step Filename Format
Final mobile_llm_final.safetensors safetensors
Intermediate checkpoints/mobile_llm_step_<step>.safetensors safetensors

๐Ÿ”ฎ Roadmap

  • Train MobileLLM on FineWeb-MINI
  • Add multi-query attention
  • Export safetensors checkpoints
  • Quantized int8 & int4 inference
  • Expand training on FineWeb-1B

๐Ÿ“œ License

This model is licensed under the MIT License.


๐ŸŒ Links

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support