Fix dataset info: FineWebEdu -> TinyStories
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ tags:
|
|
| 10 |
- pre-ln
|
| 11 |
- causal-lm
|
| 12 |
datasets:
|
| 13 |
-
-
|
| 14 |
library_name: transformers
|
| 15 |
pipeline_tag: text-generation
|
| 16 |
metrics:
|
|
@@ -26,7 +26,7 @@ widget:
|
|
| 26 |
|
| 27 |
# NanoGPT 53M - Pre-LN Transformer
|
| 28 |
|
| 29 |
-
A 53-million parameter GPT model trained from scratch on
|
| 30 |
|
| 31 |
> **Model Format:** PyTorch (cross-platform compatible)
|
| 32 |
> **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
|
|
@@ -47,7 +47,7 @@ A 53-million parameter GPT model trained from scratch on 10M tokens of FineWebEd
|
|
| 47 |
|
| 48 |
### Training
|
| 49 |
- **Framework:** Apple MLX (training), PyTorch (export)
|
| 50 |
-
- **Dataset:**
|
| 51 |
- **Training Hardware:** Apple M2 Pro (16GB unified memory)
|
| 52 |
- **Checkpoint:** 20000 iterations
|
| 53 |
- **Training Method:** Base pretraining from scratch
|
|
@@ -70,8 +70,8 @@ Pre-LN provides better training stability and is used in modern transformers (GP
|
|
| 70 |
|
| 71 |
## Training Details
|
| 72 |
|
| 73 |
-
- **Dataset:**
|
| 74 |
-
- **Training Tokens:** ~
|
| 75 |
- **Total Iterations:** 20,000
|
| 76 |
- **Batch Size:** 12 sequences/batch
|
| 77 |
- **Sequence Length:** 512 tokens
|
|
@@ -240,7 +240,7 @@ If you use this model, please cite:
|
|
| 240 |
|
| 241 |
- **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
|
| 242 |
- **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
|
| 243 |
-
- **Training Dataset:** [
|
| 244 |
|
| 245 |
## License
|
| 246 |
|
|
|
|
| 10 |
- pre-ln
|
| 11 |
- causal-lm
|
| 12 |
datasets:
|
| 13 |
+
- roneneldan/TinyStories
|
| 14 |
library_name: transformers
|
| 15 |
pipeline_tag: text-generation
|
| 16 |
metrics:
|
|
|
|
| 26 |
|
| 27 |
# NanoGPT 53M - Pre-LN Transformer
|
| 28 |
|
| 29 |
+
A 53-million parameter GPT model trained from scratch on TinyStories dataset. This model implements a **Pre-LayerNorm (Pre-LN) transformer architecture** and serves as a demonstration of efficient training on Apple Silicon using the MLX framework.
|
| 30 |
|
| 31 |
> **Model Format:** PyTorch (cross-platform compatible)
|
| 32 |
> **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
|
|
|
|
| 47 |
|
| 48 |
### Training
|
| 49 |
- **Framework:** Apple MLX (training), PyTorch (export)
|
| 50 |
+
- **Dataset:** TinyStories - Simple children's stories for language learning
|
| 51 |
- **Training Hardware:** Apple M2 Pro (16GB unified memory)
|
| 52 |
- **Checkpoint:** 20000 iterations
|
| 53 |
- **Training Method:** Base pretraining from scratch
|
|
|
|
| 70 |
|
| 71 |
## Training Details
|
| 72 |
|
| 73 |
+
- **Dataset:** TinyStories (simple children's stories)
|
| 74 |
+
- **Training Tokens:** ~2M training tokens
|
| 75 |
- **Total Iterations:** 20,000
|
| 76 |
- **Batch Size:** 12 sequences/batch
|
| 77 |
- **Sequence Length:** 512 tokens
|
|
|
|
| 240 |
|
| 241 |
- **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
|
| 242 |
- **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
|
| 243 |
+
- **Training Dataset:** [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
|
| 244 |
|
| 245 |
## License
|
| 246 |
|