QuantMobile-17M / README.md

Create README.md

4be8ffd verified 4 months ago

2.65 kB

	# QuantMobileLLM — Lightweight GPT-Style Language Model

	MobileLLM is a lightweight GPT-style language model designed for efficiency, fast inference, and small deployment environments.
	It’s trained on FineWeb-MINI and optimized with modern attention techniques.

	---

	## 🚀 Model Highlights
	- Architecture: Decoder-only GPT-style transformer
	- Parameters: ~17M (6 layers, 8 heads, 256 embedding dim)
	- Context Length: 512 tokens
	- Vocabulary Size: 50,304 tokens
	- Precision: Supports both `fp16` and `bf16`
	- Optimized for: Small GPUs, mobile inference,

	---

	## 🧠 Architecture Details

	\| Component \| Value \|
	\|--------------------\|-----------\|
	\| Layers \| 6 \|
	\| Attention Heads \| 8 \|
	\| KV Heads \| 4 \|
	\| Embedding Dim \| 256 \|
	\| Context Length \| 512 \|
	\| Vocab Size \| 50,304 \|
	\| Attention Type \| Multi-Query Attention \|
	\| Norm Type \| RMSNorm \|
	\| Position Encoding \| Rotary Position Embeddings (RoPE) \|
	\| FFN Activation \| SwiGLU (`silu`) \|

	### 🔹 Key Optimizations
	- RMSNorm → Improves training stability over LayerNorm.
	- Multi-Query Attention → Reduces KV-cache size → lower memory footprint.
	- Rotary Embeddings (RoPE) → Better handling of long context windows.
	- `safetensors` checkpoints → Faster & safer loading.

	---

	## 📊 Training Setup

	\| Property \| Value \|
	\|------------------------\|-----------\|
	\| Dataset \| [FineWeb-MINI](https://huggingface.co/datasets/AryanNsc/FineWeb-Mini) \|
	\| Tokens Trained \| ~100M \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 6e-4 (cosine decay) \|
	\| Warmup Steps \| 100 \|
	\| Batch Size \| 64 × 2 grad accum \|
	\| Effective Batch Size \| 128 \|
	\| Mixed Precision \| `fp16` / `bf16` (auto-detect) \|
	\| Distributed Training \| DDP \|
	\| Logging \| Weights & Biases (`wandb`) \|
	\| Checkpoint Format \| `.safetensors` \|

	---

	## 🧩 Model Checkpoints

	\| Step \| Filename \| Format \|
	\|----------\|------------\|------------\|
	\| Final \| `mobile_llm_final.safetensors` \| safetensors \|
	\| Intermediate \| `checkpoints/mobile_llm_step_<step>.safetensors` \| safetensors \|

	---

	## 🔮 Roadmap
	- [x] Train MobileLLM on FineWeb-MINI
	- [x] Add multi-query attention
	- [x] Export safetensors checkpoints
	- [ ] Quantized int8 & int4 inference
	- [ ] Expand training on FineWeb-1B

	---

	## 📜 License
	This model is licensed under the [MIT License](LICENSE).

	---

	## 🌐 Links
	- Github → [MobileLLM training code](https://github.com/Guney-olu/Quantgpt)

	# QuantMobileLLM — Lightweight GPT-Style Language Model

	MobileLLM is a lightweight GPT-style language model designed for efficiency, fast inference, and small deployment environments.
	It’s trained on FineWeb-MINI and optimized with modern attention techniques.

	---

	## 🚀 Model Highlights
	- Architecture: Decoder-only GPT-style transformer
	- Parameters: ~17M (6 layers, 8 heads, 256 embedding dim)
	- Context Length: 512 tokens
	- Vocabulary Size: 50,304 tokens
	- Precision: Supports both `fp16` and `bf16`
	- Optimized for: Small GPUs, mobile inference,

	---

	## 🧠 Architecture Details

	\| Component \| Value \|
	\|--------------------\|-----------\|
	\| Layers \| 6 \|
	\| Attention Heads \| 8 \|
	\| KV Heads \| 4 \|
	\| Embedding Dim \| 256 \|
	\| Context Length \| 512 \|
	\| Vocab Size \| 50,304 \|
	\| Attention Type \| Multi-Query Attention \|
	\| Norm Type \| RMSNorm \|
	\| Position Encoding \| Rotary Position Embeddings (RoPE) \|
	\| FFN Activation \| SwiGLU (`silu`) \|

	### 🔹 Key Optimizations
	- RMSNorm → Improves training stability over LayerNorm.
	- Multi-Query Attention → Reduces KV-cache size → lower memory footprint.
	- Rotary Embeddings (RoPE) → Better handling of long context windows.
	- `safetensors` checkpoints → Faster & safer loading.

	---

	## 📊 Training Setup

	\| Property \| Value \|
	\|------------------------\|-----------\|
	\| Dataset \| [FineWeb-MINI](https://huggingface.co/datasets/AryanNsc/FineWeb-Mini) \|
	\| Tokens Trained \| ~100M \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 6e-4 (cosine decay) \|
	\| Warmup Steps \| 100 \|
	\| Batch Size \| 64 × 2 grad accum \|
	\| Effective Batch Size \| 128 \|
	\| Mixed Precision \| `fp16` / `bf16` (auto-detect) \|
	\| Distributed Training \| DDP \|
	\| Logging \| Weights & Biases (`wandb`) \|
	\| Checkpoint Format \| `.safetensors` \|

	---

	## 🧩 Model Checkpoints

	\| Step \| Filename \| Format \|
	\|----------\|------------\|------------\|
	\| Final \| `mobile_llm_final.safetensors` \| safetensors \|
	\| Intermediate \| `checkpoints/mobile_llm_step_<step>.safetensors` \| safetensors \|

	---

	## 🔮 Roadmap
	- [x] Train MobileLLM on FineWeb-MINI
	- [x] Add multi-query attention
	- [x] Export safetensors checkpoints
	- [ ] Quantized int8 & int4 inference
	- [ ] Expand training on FineWeb-1B

	---

	## 📜 License
	This model is licensed under the [MIT License](LICENSE).

	---

	## 🌐 Links
	- Github → [MobileLLM training code](https://github.com/Guney-olu/Quantgpt)