Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,6 @@
|
|
| 1 |
-
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
tags: []
|
| 4 |
-
---
|
| 5 |
-
|
| 6 |
-
This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU.
|
| 7 |
-
|
| 8 |
-
A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/677 .
|
| 9 |
-
|
| 10 |
-
This model has a bit of a complicated history. I wanted to train it for 400K steps, i.e. (`-x 400000`), but it became unstable later in training and exploded around step 330K. Because I was losing my computing quota shortly, I decided to just rewind back to checkpoint 300K, and then instead of going all the way to 400K I started annealing linearly down to 330K. This went without incident and produced this model.
|
| 11 |
-
|
| 12 |
-
This is the longest I've trained a GPT-2 model for, and it reaches HellaSwag of 62.7 by the end.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
tags: []
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU i.e around 300B Tokens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|