Update README.md
Browse files
README.md
CHANGED
|
@@ -763,23 +763,10 @@ if __name__ == "__main__":
|
|
| 763 |
```
|
| 764 |
|
| 765 |
# 5. Our training results
|
| 766 |
-
## 5.1 Pretraining results
|
| 767 |
We did the pretraining on a single RTX 5060 Ti 16GB for 30,000 iterations for ~3 days.
|
| 768 |
Out final `val loss` value was **3.0450** and our final `train loss` was **3.0719**.
|
| 769 |
|
| 770 |
-
#
|
| 771 |
-
After pretraining, we finetuned our model for 1500 iterations for ~3 hours:
|
| 772 |
-
1. Final `val loss`: **?**
|
| 773 |
-
2. Final `train loss`: **?**
|
| 774 |
-
|
| 775 |
-
# 6. Exampleprompts and -results
|
| 776 |
-
We tested our finetuned model a lot:
|
| 777 |
-
|
| 778 |
-
1. Question: What is Artificial Intelligence?
|
| 779 |
-
--> Answer:
|
| 780 |
-
2. ...
|
| 781 |
-
|
| 782 |
-
# 7. Thanks to...
|
| 783 |
1. Andrej Karpathy for his nanoGPT Code and his YouTube Videos in the make-mode-series
|
| 784 |
2. HugginfaceTW for the Fineweb-Edu-10BT-Sample Training Dataset
|
| 785 |
3. Yahma for the alpaca-cleaned dataset for the finetuning
|
|
|
|
| 763 |
```
|
| 764 |
|
| 765 |
# 5. Our training results
|
|
|
|
| 766 |
We did the pretraining on a single RTX 5060 Ti 16GB for 30,000 iterations for ~3 days.
|
| 767 |
Out final `val loss` value was **3.0450** and our final `train loss` was **3.0719**.
|
| 768 |
|
| 769 |
+
# 6. Thanks to...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 770 |
1. Andrej Karpathy for his nanoGPT Code and his YouTube Videos in the make-mode-series
|
| 771 |
2. HugginfaceTW for the Fineweb-Edu-10BT-Sample Training Dataset
|
| 772 |
3. Yahma for the alpaca-cleaned dataset for the finetuning
|