FatihJimale
/

gpt2-medium-somali

Model card Files Files and versions

FatihJimale commited on Oct 6, 2025

Commit

26f2cca

·

verified ·

1 Parent(s): 8dd25a1

Update README.md

Files changed (1) hide show

README.md +8 -12

README.md CHANGED Viewed

@@ -55,15 +55,15 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
 * For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
 * Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
-## 🔧 Model details (fill in your exact values)
 * **Training steps**: ≈14,850 (completed at ~epoch 2.00)
 * **Epochs**: 2
-* **Effective batch size**: TBD
-* **Learning rate & schedule**: final logged LR ≈ 8.998e-10 (schedule specifics TBD)
 * **Optimizer**: AdamW (β1=0.9, β2=0.999)
-* **Weight decay**: 0.01 *(if used)*
-* **Mixed precision**: fp16/bf16 *(during training if applicable)*
 * **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
 * **Context length**: **1024 tokens**
 * **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
@@ -79,10 +79,9 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
 * **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
 * **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
 * **Human eval notes**: TBD (fluency, coherence)
-  (please populate)
-* **Train loss**: TBD
-* **Eval/validation loss**: TBD
-* **Perplexity (valid/test)**: TBD
 * **Human eval notes**: TBD (fluency, coherence)
 ## 📁 Repo layout
@@ -114,9 +113,6 @@ outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperatur
 print(tok.decode(outputs[0], skip_special_tokens=True))
 ```
-## 🏷️ License
-MIT (placeholder) — replace with your chosen license.
 ## 📣 Citation

 * For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
 * Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
+## 🔧 Model details
 * **Training steps**: ≈14,850 (completed at ~epoch 2.00)
 * **Epochs**: 2
+* **Effective batch size**: 64
+* **Learning rate & schedule**: final logged LR ≈ 8.998e-10
 * **Optimizer**: AdamW (β1=0.9, β2=0.999)
+* **Weight decay**: 0.01
+* **Mixed precision**: bf16
 * **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
 * **Context length**: **1024 tokens**
 * **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
 * **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
 * **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
 * **Human eval notes**: TBD (fluency, coherence)
+* **Train loss**: **1.8449**
+* **Eval/validation loss**: **1.78604**
+* **Perplexity (valid/test)**: **5.9658**
 * **Human eval notes**: TBD (fluency, coherence)
 ## 📁 Repo layout
 print(tok.decode(outputs[0], skip_special_tokens=True))
 ```
 ## 📣 Citation