Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,9 @@ GPT-124M is a decoder-only transformer model based on OpenAI’s GPT-2 architect
|
|
| 47 |
## Model Sources
|
| 48 |
|
| 49 |
- **Paper:** [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
|
| 50 |
-
- **Paper:** [Language
|
|
|
|
|
|
|
| 51 |
- **Demo:** [More Information Needed]
|
| 52 |
|
| 53 |
## Model Details
|
|
@@ -66,7 +68,7 @@ GPT-124M is a lightweight generative language model fine-tuned on the `fineweb-e
|
|
| 66 |
- **Dataset:** `fineweb-edu` (10 billion tokens)
|
| 67 |
- **Training Date:** `January 2025`
|
| 68 |
- **Validation Dataset:** 100 million tokens of HuggingFaceFW/fineweb-edu
|
| 69 |
-
|
| 70 |
## Usage
|
| 71 |
|
| 72 |
### Direct Use
|
|
|
|
| 47 |
## Model Sources
|
| 48 |
|
| 49 |
- **Paper:** [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
|
| 50 |
+
- **Paper:** [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165)
|
| 51 |
+
- **Paper:** [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556)
|
| 52 |
+
- **Video:** [Andrej Karpathy-Let's reproduce GPT-2 (124M)](https://youtu.be/l8pRSuU81PU?si=KAo1y9dHYQAGJmj5)
|
| 53 |
- **Demo:** [More Information Needed]
|
| 54 |
|
| 55 |
## Model Details
|
|
|
|
| 68 |
- **Dataset:** `fineweb-edu` (10 billion tokens)
|
| 69 |
- **Training Date:** `January 2025`
|
| 70 |
- **Validation Dataset:** 100 million tokens of HuggingFaceFW/fineweb-edu
|
| 71 |
+
|
| 72 |
## Usage
|
| 73 |
|
| 74 |
### Direct Use
|