Update README.md
Browse files
README.md
CHANGED
|
@@ -55,15 +55,15 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
|
|
| 55 |
* For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
|
| 56 |
* Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
|
| 57 |
|
| 58 |
-
## 🔧 Model details
|
| 59 |
|
| 60 |
* **Training steps**: ≈14,850 (completed at ~epoch 2.00)
|
| 61 |
* **Epochs**: 2
|
| 62 |
-
* **Effective batch size**:
|
| 63 |
-
* **Learning rate & schedule**: final logged LR ≈ 8.998e-10
|
| 64 |
* **Optimizer**: AdamW (β1=0.9, β2=0.999)
|
| 65 |
-
* **Weight decay**: 0.01
|
| 66 |
-
* **Mixed precision**:
|
| 67 |
* **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
|
| 68 |
* **Context length**: **1024 tokens**
|
| 69 |
* **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
|
|
@@ -79,10 +79,9 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
|
|
| 79 |
* **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
|
| 80 |
* **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
|
| 81 |
* **Human eval notes**: TBD (fluency, coherence)
|
| 82 |
-
|
| 83 |
-
* **
|
| 84 |
-
* **
|
| 85 |
-
* **Perplexity (valid/test)**: TBD
|
| 86 |
* **Human eval notes**: TBD (fluency, coherence)
|
| 87 |
|
| 88 |
## 📁 Repo layout
|
|
@@ -114,9 +113,6 @@ outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperatur
|
|
| 114 |
print(tok.decode(outputs[0], skip_special_tokens=True))
|
| 115 |
```
|
| 116 |
|
| 117 |
-
## 🏷️ License
|
| 118 |
-
|
| 119 |
-
MIT (placeholder) — replace with your chosen license.
|
| 120 |
|
| 121 |
## 📣 Citation
|
| 122 |
|
|
|
|
| 55 |
* For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
|
| 56 |
* Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
|
| 57 |
|
| 58 |
+
## 🔧 Model details
|
| 59 |
|
| 60 |
* **Training steps**: ≈14,850 (completed at ~epoch 2.00)
|
| 61 |
* **Epochs**: 2
|
| 62 |
+
* **Effective batch size**: 64
|
| 63 |
+
* **Learning rate & schedule**: final logged LR ≈ 8.998e-10
|
| 64 |
* **Optimizer**: AdamW (β1=0.9, β2=0.999)
|
| 65 |
+
* **Weight decay**: 0.01
|
| 66 |
+
* **Mixed precision**: bf16
|
| 67 |
* **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
|
| 68 |
* **Context length**: **1024 tokens**
|
| 69 |
* **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
|
|
|
|
| 79 |
* **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
|
| 80 |
* **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
|
| 81 |
* **Human eval notes**: TBD (fluency, coherence)
|
| 82 |
+
* **Train loss**: **1.8449**
|
| 83 |
+
* **Eval/validation loss**: **1.78604**
|
| 84 |
+
* **Perplexity (valid/test)**: **5.9658**
|
|
|
|
| 85 |
* **Human eval notes**: TBD (fluency, coherence)
|
| 86 |
|
| 87 |
## 📁 Repo layout
|
|
|
|
| 113 |
print(tok.decode(outputs[0], skip_special_tokens=True))
|
| 114 |
```
|
| 115 |
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
## 📣 Citation
|
| 118 |
|