jacksuuuu
/

tinystories

@@ -10,7 +10,7 @@ tags:
 - pre-ln
 - causal-lm
 datasets:
-- HuggingFaceFW/fineweb-edu
 library_name: transformers
 pipeline_tag: text-generation
 metrics:
@@ -26,7 +26,7 @@ widget:
 # NanoGPT 53M - Pre-LN Transformer
-A 53-million parameter GPT model trained from scratch on 10M tokens of FineWebEdu educational content. This model implements a **Pre-LayerNorm (Pre-LN) transformer architecture** and serves as a demonstration of efficient training on Apple Silicon using the MLX framework.
 > **Model Format:** PyTorch (cross-platform compatible)
 > **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
@@ -47,7 +47,7 @@ A 53-million parameter GPT model trained from scratch on 10M tokens of FineWebEd
 ### Training
 - **Framework:** Apple MLX (training), PyTorch (export)
-- **Dataset:** FineWebEdu - 10M tokens of educational web content
 - **Training Hardware:** Apple M2 Pro (16GB unified memory)
 - **Checkpoint:** 20000 iterations
 - **Training Method:** Base pretraining from scratch
@@ -70,8 +70,8 @@ Pre-LN provides better training stability and is used in modern transformers (GP
 ## Training Details
-- **Dataset:** FineWebEdu (diverse educational web content)
-- **Training Tokens:** ~10.2M tokens from educational web pages
 - **Total Iterations:** 20,000
 - **Batch Size:** 12 sequences/batch
 - **Sequence Length:** 512 tokens
@@ -240,7 +240,7 @@ If you use this model, please cite:
 - **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
 - **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
-- **Training Dataset:** [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
 ## License

 - pre-ln
 - causal-lm
 datasets:
+- roneneldan/TinyStories
 library_name: transformers
 pipeline_tag: text-generation
 metrics:
 # NanoGPT 53M - Pre-LN Transformer
+A 53-million parameter GPT model trained from scratch on TinyStories dataset. This model implements a **Pre-LayerNorm (Pre-LN) transformer architecture** and serves as a demonstration of efficient training on Apple Silicon using the MLX framework.
 > **Model Format:** PyTorch (cross-platform compatible)
 > **Training Framework:** Apple MLX (exported to PyTorch for universal compatibility)
 ### Training
 - **Framework:** Apple MLX (training), PyTorch (export)
+- **Dataset:** TinyStories - Simple children's stories for language learning
 - **Training Hardware:** Apple M2 Pro (16GB unified memory)
 - **Checkpoint:** 20000 iterations
 - **Training Method:** Base pretraining from scratch
 ## Training Details
+- **Dataset:** TinyStories (simple children's stories)
+- **Training Tokens:** ~2M training tokens
 - **Total Iterations:** 20,000
 - **Batch Size:** 12 sequences/batch
 - **Sequence Length:** 512 tokens
 - **GitHub Repository:** [JackSuuu/nanoGPT-on-MLX](https://github.com/JackSuuu/nanoGPT-on-MLX)
 - **MLX Framework:** [ml-explore/mlx](https://github.com/ml-explore/mlx)
+- **Training Dataset:** [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
 ## License