AmeerH commited on
Commit
6c5511e
·
verified ·
1 Parent(s): 2c309ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -12
README.md CHANGED
@@ -1,12 +1,6 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
5
-
6
- This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU.
7
-
8
- A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/677 .
9
-
10
- This model has a bit of a complicated history. I wanted to train it for 400K steps, i.e. (`-x 400000`), but it became unstable later in training and exploded around step 330K. Because I was losing my computing quota shortly, I decided to just rewind back to checkpoint 300K, and then instead of going all the way to 400K I started annealing linearly down to 330K. This went without incident and produced this model.
11
-
12
- This is the longest I've trained a GPT-2 model for, and it reaches HellaSwag of 62.7 by the end.
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ This is a GPT-2 model trained in llm.c for 330K steps (of 1M batch size) on FineWeb-EDU i.e around 300B Tokens.