FatihJimale commited on
Commit
26f2cca
·
verified ·
1 Parent(s): 8dd25a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -12
README.md CHANGED
@@ -55,15 +55,15 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
55
  * For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
56
  * Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
57
 
58
- ## 🔧 Model details (fill in your exact values)
59
 
60
  * **Training steps**: ≈14,850 (completed at ~epoch 2.00)
61
  * **Epochs**: 2
62
- * **Effective batch size**: TBD
63
- * **Learning rate & schedule**: final logged LR ≈ 8.998e-10 (schedule specifics TBD)
64
  * **Optimizer**: AdamW (β1=0.9, β2=0.999)
65
- * **Weight decay**: 0.01 *(if used)*
66
- * **Mixed precision**: fp16/bf16 *(during training if applicable)*
67
  * **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
68
  * **Context length**: **1024 tokens**
69
  * **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
@@ -79,10 +79,9 @@ print(tok.decode(outputs[0], skip_special_tokens=True))
79
  * **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
80
  * **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
81
  * **Human eval notes**: TBD (fluency, coherence)
82
- (please populate)
83
- * **Train loss**: TBD
84
- * **Eval/validation loss**: TBD
85
- * **Perplexity (valid/test)**: TBD
86
  * **Human eval notes**: TBD (fluency, coherence)
87
 
88
  ## 📁 Repo layout
@@ -114,9 +113,6 @@ outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperatur
114
  print(tok.decode(outputs[0], skip_special_tokens=True))
115
  ```
116
 
117
- ## 🏷️ License
118
-
119
- MIT (placeholder) — replace with your chosen license.
120
 
121
  ## 📣 Citation
122
 
 
55
  * For more focused generations, reduce `max_new_tokens` and set `top_p` around 0.9.
56
  * Deterministic output: `do_sample=False`, tune `top_k=None`, `temperature=1.0`.
57
 
58
+ ## 🔧 Model details
59
 
60
  * **Training steps**: ≈14,850 (completed at ~epoch 2.00)
61
  * **Epochs**: 2
62
+ * **Effective batch size**: 64
63
+ * **Learning rate & schedule**: final logged LR ≈ 8.998e-10
64
  * **Optimizer**: AdamW (β1=0.9, β2=0.999)
65
+ * **Weight decay**: 0.01
66
+ * **Mixed precision**: bf16
67
  * **Hardware**: AWS `ml.g5.24xlarge` — **4× NVIDIA A10 (24 GB each)**, 96 vCPU, 384 GiB RAM; data-parallel across 4 GPUs
68
  * **Context length**: **1024 tokens**
69
  * **Tokenizer**: **GPT‑2 BPE (fast)** (no custom Somali tokenizer in this version)
 
79
  * **Perplexity (valid/test)**: **5.9658** *(final recorded value @ 2025‑09‑25 09:06:42)*
80
  * **Eval runtime**: **1652.22 s**, **72.272 samples/s**, **9.035 steps/s**
81
  * **Human eval notes**: TBD (fluency, coherence)
82
+ * **Train loss**: **1.8449**
83
+ * **Eval/validation loss**: **1.78604**
84
+ * **Perplexity (valid/test)**: **5.9658**
 
85
  * **Human eval notes**: TBD (fluency, coherence)
86
 
87
  ## 📁 Repo layout
 
113
  print(tok.decode(outputs[0], skip_special_tokens=True))
114
  ```
115
 
 
 
 
116
 
117
  ## 📣 Citation
118