samkeet commited on
Commit
18e3ca6
·
verified ·
1 Parent(s): fa72749

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -47,7 +47,9 @@ GPT-124M is a decoder-only transformer model based on OpenAI’s GPT-2 architect
47
  ## Model Sources
48
 
49
  - **Paper:** [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
50
- - **Paper:** [Language Modeling with Transformers](https://arxiv.org/pdf/2005.14165)
 
 
51
  - **Demo:** [More Information Needed]
52
 
53
  ## Model Details
@@ -66,7 +68,7 @@ GPT-124M is a lightweight generative language model fine-tuned on the `fineweb-e
66
  - **Dataset:** `fineweb-edu` (10 billion tokens)
67
  - **Training Date:** `January 2025`
68
  - **Validation Dataset:** 100 million tokens of HuggingFaceFW/fineweb-edu
69
- -
70
  ## Usage
71
 
72
  ### Direct Use
 
47
  ## Model Sources
48
 
49
  - **Paper:** [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
50
+ - **Paper:** [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165)
51
+ - **Paper:** [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556)
52
+ - **Video:** [Andrej Karpathy-Let's reproduce GPT-2 (124M)](https://youtu.be/l8pRSuU81PU?si=KAo1y9dHYQAGJmj5)
53
  - **Demo:** [More Information Needed]
54
 
55
  ## Model Details
 
68
  - **Dataset:** `fineweb-edu` (10 billion tokens)
69
  - **Training Date:** `January 2025`
70
  - **Validation Dataset:** 100 million tokens of HuggingFaceFW/fineweb-edu
71
+
72
  ## Usage
73
 
74
  ### Direct Use