abhilash88
/

tinystories-slm-gpt

+# Small Language Model (SLM) - TinyStories GPT
+A compact GPT-style language model trained from scratch on the TinyStories dataset, designed for generating simple, coherent stories suitable for children.
+## Model Description
+This is a small-scale transformer language model built with the following architecture:
+- **Model Type**: GPT (Generative Pre-trained Transformer)
+- **Parameters**: ~22M parameters
+- **Context Length**: 128 tokens
+- **Vocabulary Size**: 50,257 (GPT-2 tokenizer)
+### Architecture Details
+- **Layers**: 6 transformer blocks
+- **Attention Heads**: 6
+- **Hidden Size**: 384
+- **Feed-forward Size**: 1536 (4 × hidden_size)
+- **Dropout**: 0.1
+- **Activation**: GELU
+## Training Details
+### Dataset
+- **Training Data**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset
+- **Tokenizer**: GPT-2 tokenizer (tiktoken)
+- **Training Examples**: ~2.1M stories
+- **Validation Examples**: ~22K stories
+### Training Configuration
+- **Optimizer**: AdamW (lr=1e-4, betas=(0.9, 0.95), weight_decay=0.1)
+- **Learning Rate Schedule**: Linear warmup (1000 steps) + Cosine annealing
+- **Batch Size**: 32
+- **Gradient Accumulation Steps**: 32
+- **Training Steps**: 20,000
+- **Mixed Precision**: bfloat16/float16
+- **Gradient Clipping**: 0.5
+### Training Results
+- **Final Training Loss**: ~2.39
+- **Final Validation Loss**: ~2.39
+- **Best Validation Loss**: ~2.39 (achieved around step 19,000)
+The model shows good convergence with training and validation losses closely aligned, indicating minimal overfitting.
+## Usage
+### Requirements
+```bash
+pip install torch tiktoken numpy
+```
+### Quick Start
+```python
+import torch
+import tiktoken
+from model import GPT, GPTConfig  # Your model implementation
+# Load tokenizer
+enc = tiktoken.get_encoding("gpt2")
+# Model configuration
+config = GPTConfig(
+    vocab_size=50257,
+    block_size=128,
+    n_layer=6,
+    n_head=6,
+    n_embd=384,
+    dropout=0.0,  # Set to 0 for inference
+    bias=True
+)
+# Load model
+model = GPT(config)
+model.load_state_dict(torch.load('pytorch_model.bin', map_location='cpu'))
+model.eval()
+```
+### Alternative: Using Hugging Face Hub
+```python
+from huggingface_hub import hf_hub_download
+import torch
+import tiktoken
+# Download model files
+model_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="pytorch_model.bin")
+config_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="config.json")
+# Load tokenizer
+enc = tiktoken.get_encoding("gpt2")
+# Load configuration and model
+import json
+with open(config_path, 'r') as f:
+    config_dict = json.load(f)
+config = GPTConfig(**config_dict)
+model = GPT(config)
+model.load_state_dict(torch.load(model_path, map_location='cpu'))
+model.eval()
+```
+### Text Generation
+```python
+# Generate text
+def generate_story(prompt, max_tokens=200, temperature=1.0, top_k=None):
+    context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
+    with torch.no_grad():
+        generated = model.generate(
+            context,
+            max_new_tokens=max_tokens,
+            temperature=temperature,
+            top_k=top_k
+        )
+    return enc.decode(generated.squeeze().tolist())
+# Example usage
+story = generate_story("Once upon a time there was a pumpkin.")
+print(story)
+```
+### Sample Outputs
+**Prompt**: "Once upon a time there was a pumpkin."
+```
+Once upon a time there was a pumpkin. The pumpkin was very much. No one was upon a better. The egg was missing. The windows were okay. The bee put the seeds away.
+Then one day, the pumpkin and the sun went on a lunch. As the sun went flying to the beach, the Baby was sad...
+```
+**Prompt**: "A little girl went to the woods"
+```
+A little girl went to the woods and saw some big, colourful flowers. She jumped over and reached for a key. Suddenly, there was a small sock! The girl picked up the tie and started to growled...
+```
+## Model Performance
+### Capabilities
+- ✅ Generates coherent short stories
+- ✅ Maintains simple narrative structure
+- ✅ Uses child-friendly vocabulary
+- ✅ Fast inference due to small size
+- ✅ Good for educational purposes and experimentation
+### Limitations
+- ❌ Limited context window (128 tokens)
+- ❌ Simple vocabulary and concepts
+- ❌ May generate repetitive or nonsensical content
+- ❌ Not suitable for complex reasoning tasks
+- ❌ Grammar and coherence issues in longer texts
+## Technical Specifications
+| Specification | Value |
+|---------------|-------|
+| Model Size | ~22M parameters |
+| Architecture | GPT (decoder-only transformer) |
+| Context Length | 128 tokens |
+| Vocabulary | 50,257 tokens |
+| Precision | Mixed (bfloat16/float16) |
+| Framework | PyTorch |
+## Files Structure
+```
+├── config.json           # Model configuration
+├── pytorch_model.bin     # Trained model weights
+├── model.py             # Model architecture implementation
+├── tokenizer.json       # Tokenizer configuration (optional)
+├── README.md            # This file
+└── requirements.txt     # Dependencies
+```
+### Required Files for HuggingFace Upload
+**1. config.json** - Model configuration file:
+```json
+{
+  "architectures": ["GPT"],
+  "vocab_size": 50257,
+  "n_positions": 128,
+  "n_embd": 384,
+  "n_layer": 6,
+  "n_head": 6,
+  "block_size": 128,
+  "dropout": 0.1,
+  "bias": true,
+  "model_type": "gpt",
+  "torch_dtype": "float32",
+  "transformers_version": "4.21.0"
+}
+```
+**2. pytorch_model.bin** - Your converted model weights
+**3. model.py** - Your model implementation (should include the GPT and GPTConfig classes)
+## Training Infrastructure
+- **Hardware**: NVIDIA Tesla T4 GPU
+- **Environment**: Kaggle Notebook
+- **Training Time**: ~3.5 hours
+- **Memory Usage**: ~15GB GPU memory
+## Evaluation Metrics
+The model was evaluated using perplexity on the validation set:
+- **Best Validation Perplexity**: ~10.9 (exp(2.39))
+- **Training Convergence**: Achieved stable loss around step 15,000
+- **Overfitting**: Minimal (train/val loss difference < 0.01)
+## Use Cases
+- Educational tool for understanding transformer architecture
+- Story generation for children's content
+- Baseline model for NLP experiments
+- Demonstration of training small language models
+- Research into efficient model architectures
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{tinystories-slm-2025,
+  title={Small Language Model trained on TinyStories},
+  author={Abhilash},
+  year={2025},
+  howpublished={HuggingFace Model Hub},
+  url={https://huggingface.co/abhilash88/tinystories-slm-gpt}
+}
+```
+## License
+This model is released under the MIT License. The TinyStories dataset follows its original licensing terms.
+## Acknowledgments
+- [TinyStories Dataset](https://huggingface.co/datasets/roneneldan/TinyStories) by Ronen Eldan et al.
+- [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy for architecture inspiration
+- OpenAI for the GPT-2 tokenizer
+## Contact
+For questions or issues, please open an issue in the repository.
+---
+*Model trained and uploaded on July 31, 2025*