AryanNsc commited on
Commit
4be8ffd
ยท
verified ยท
1 Parent(s): 2ac8573

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # QuantMobileLLM โ€” Lightweight GPT-Style Language Model
2
+
3
+ MobileLLM is a **lightweight GPT-style language model** designed for efficiency, fast inference, and small deployment environments.
4
+ Itโ€™s trained on **FineWeb-MINI** and optimized with **modern attention techniques**.
5
+
6
+ ---
7
+
8
+ ## ๐Ÿš€ Model Highlights
9
+ - **Architecture**: Decoder-only GPT-style transformer
10
+ - **Parameters**: ~17M (6 layers, 8 heads, 256 embedding dim)
11
+ - **Context Length**: 512 tokens
12
+ - **Vocabulary Size**: 50,304 tokens
13
+ - **Precision**: Supports both `fp16` and `bf16`
14
+ - **Optimized for**: Small GPUs, mobile inference,
15
+
16
+ ---
17
+
18
+ ## ๐Ÿง  Architecture Details
19
+
20
+ | **Component** | **Value** |
21
+ |--------------------|-----------|
22
+ | Layers | 6 |
23
+ | Attention Heads | 8 |
24
+ | KV Heads | 4 |
25
+ | Embedding Dim | 256 |
26
+ | Context Length | 512 |
27
+ | Vocab Size | 50,304 |
28
+ | Attention Type | Multi-Query Attention |
29
+ | Norm Type | RMSNorm |
30
+ | Position Encoding | Rotary Position Embeddings (RoPE) |
31
+ | FFN Activation | SwiGLU (`silu`) |
32
+
33
+ ### ๐Ÿ”น Key Optimizations
34
+ - **RMSNorm** โ†’ Improves training stability over LayerNorm.
35
+ - **Multi-Query Attention** โ†’ Reduces KV-cache size โ†’ lower memory footprint.
36
+ - **Rotary Embeddings (RoPE)** โ†’ Better handling of long context windows.
37
+ - **`safetensors` checkpoints** โ†’ Faster & safer loading.
38
+
39
+ ---
40
+
41
+ ## ๐Ÿ“Š Training Setup
42
+
43
+ | **Property** | **Value** |
44
+ |------------------------|-----------|
45
+ | Dataset | [FineWeb-MINI](https://huggingface.co/datasets/AryanNsc/FineWeb-Mini) |
46
+ | Tokens Trained | ~100M |
47
+ | Optimizer | AdamW |
48
+ | Learning Rate | 6e-4 (cosine decay) |
49
+ | Warmup Steps | 100 |
50
+ | Batch Size | 64 ร— 2 grad accum |
51
+ | Effective Batch Size | 128 |
52
+ | Mixed Precision | `fp16` / `bf16` (auto-detect) |
53
+ | Distributed Training | DDP |
54
+ | Logging | Weights & Biases (`wandb`) |
55
+ | Checkpoint Format | `.safetensors` |
56
+
57
+ ---
58
+
59
+ ## ๐Ÿงฉ Model Checkpoints
60
+
61
+ | **Step** | **Filename** | **Format** |
62
+ |----------|------------|------------|
63
+ | Final | `mobile_llm_final.safetensors` | safetensors |
64
+ | Intermediate | `checkpoints/mobile_llm_step_<step>.safetensors` | safetensors |
65
+
66
+ ---
67
+
68
+ ## ๐Ÿ”ฎ Roadmap
69
+ - [x] Train **MobileLLM** on **FineWeb-MINI**
70
+ - [x] Add **multi-query attention**
71
+ - [x] Export **safetensors** checkpoints
72
+ - [ ] Quantized **int8** & **int4** inference
73
+ - [ ] Expand training on **FineWeb-1B**
74
+
75
+ ---
76
+
77
+ ## ๐Ÿ“œ License
78
+ This model is licensed under the [MIT License](LICENSE).
79
+
80
+ ---
81
+
82
+ ## ๐ŸŒ Links
83
+ - **Github** โ†’ [MobileLLM training code](https://github.com/Guney-olu/Quantgpt)