Update README.md
Browse files
README.md
CHANGED
|
@@ -758,7 +758,7 @@ We did the pretraining on a single RTX 5060 Ti 16GB for 30,000 iterations for ~3
|
|
| 758 |
Out final `val loss` value was **3.0450** and our final `train loss` was **3.0719**.
|
| 759 |
|
| 760 |
## 5.2 Finetuning results
|
| 761 |
-
After pretraining, we finetuned our model for
|
| 762 |
1. Final `val loss`: **?**
|
| 763 |
2. Final `train loss`: **?**
|
| 764 |
|
|
@@ -773,7 +773,7 @@ We tested our finetuned model a lot:
|
|
| 773 |
1. Andrej Karpathy for his nanoGPT Code and his YouTube Videos in the make-mode-series
|
| 774 |
2. HugginfaceTW for the Fineweb-Edu-10BT-Sample Training Dataset
|
| 775 |
3. Yahma for the alpaca-cleaned dataset for the finetuning
|
| 776 |
-
4. My dad for his support
|
| 777 |
5. My GPU for training and running my new model ;-)
|
| 778 |
|
| 779 |
---
|
|
|
|
| 758 |
Out final `val loss` value was **3.0450** and our final `train loss` was **3.0719**.
|
| 759 |
|
| 760 |
## 5.2 Finetuning results
|
| 761 |
+
After pretraining, we finetuned our model for 1500 iterations for ~3 hours:
|
| 762 |
1. Final `val loss`: **?**
|
| 763 |
2. Final `train loss`: **?**
|
| 764 |
|
|
|
|
| 773 |
1. Andrej Karpathy for his nanoGPT Code and his YouTube Videos in the make-mode-series
|
| 774 |
2. HugginfaceTW for the Fineweb-Edu-10BT-Sample Training Dataset
|
| 775 |
3. Yahma for the alpaca-cleaned dataset for the finetuning
|
| 776 |
+
4. My dad for his support <3
|
| 777 |
5. My GPU for training and running my new model ;-)
|
| 778 |
|
| 779 |
---
|