jacksuuuu commited on
Commit
090a610
·
verified ·
1 Parent(s): 70a2326

Fix dataset info: FineWebEdu -> TinyStories

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - pre-ln
11
  - causal-lm
12
  datasets:
13
- - HuggingFaceFW/fineweb-edu
14
  library_name: transformers
15
  pipeline_tag: text-generation
16
  metrics:
@@ -26,7 +26,7 @@ widget:
26
 
27
  # NanoGPT 53M - Pre-LN Transformer
28
 
29
- A 53-million parameter GPT model trained from scratch on 10M tokens of FineWebEdu educational content. This model implements a **Pre-LayerNorm (Pre-LN) transformer architecture** and serves as a demonstration of efficient training on Apple Silicon using the MLX framework.
30
 
31
  > **Model Format:** PyTorch (cross-platform compatible)
32
  > **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
@@ -47,7 +47,7 @@ A 53-million parameter GPT model trained from scratch on 10M tokens of FineWebEd
47
 
48
  ### Training
49
  - **Framework:** Apple MLX (training), PyTorch (export)
50
- - **Dataset:** FineWebEdu - 10M tokens of educational web content
51
  - **Training Hardware:** Apple M2 Pro (16GB unified memory)
52
  - **Checkpoint:** 20000 iterations
53
  - **Training Method:** Base pretraining from scratch
@@ -70,8 +70,8 @@ Pre-LN provides better training stability and is used in modern transformers (GP
70
 
71
  ## Training Details
72
 
73
- - **Dataset:** FineWebEdu (diverse educational web content)
74
- - **Training Tokens:** ~10.2M tokens from educational web pages
75
  - **Total Iterations:** 20,000
76
  - **Batch Size:** 12 sequences/batch
77
  - **Sequence Length:** 512 tokens
@@ -240,7 +240,7 @@ If you use this model, please cite:
240
 
241
  - **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
242
  - **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
243
- - **Training Dataset:** [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
244
 
245
  ## License
246
 
 
10
  - pre-ln
11
  - causal-lm
12
  datasets:
13
+ - roneneldan/TinyStories
14
  library_name: transformers
15
  pipeline_tag: text-generation
16
  metrics:
 
26
 
27
  # NanoGPT 53M - Pre-LN Transformer
28
 
29
+ A 53-million parameter GPT model trained from scratch on TinyStories dataset. This model implements a **Pre-LayerNorm (Pre-LN) transformer architecture** and serves as a demonstration of efficient training on Apple Silicon using the MLX framework.
30
 
31
  > **Model Format:** PyTorch (cross-platform compatible)
32
  > **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
 
47
 
48
  ### Training
49
  - **Framework:** Apple MLX (training), PyTorch (export)
50
+ - **Dataset:** TinyStories - Simple children's stories for language learning
51
  - **Training Hardware:** Apple M2 Pro (16GB unified memory)
52
  - **Checkpoint:** 20000 iterations
53
  - **Training Method:** Base pretraining from scratch
 
70
 
71
  ## Training Details
72
 
73
+ - **Dataset:** TinyStories (simple children's stories)
74
+ - **Training Tokens:** ~2M training tokens
75
  - **Total Iterations:** 20,000
76
  - **Batch Size:** 12 sequences/batch
77
  - **Sequence Length:** 512 tokens
 
240
 
241
  - **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
242
  - **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
243
+ - **Training Dataset:** [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
244
 
245
  ## License
246