abhilash88 commited on
Commit
e83d515
Β·
verified Β·
1 Parent(s): 6f8d7e7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +242 -0
README.md ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Small Language Model (SLM) - TinyStories GPT
2
+
3
+ A compact GPT-style language model trained from scratch on the TinyStories dataset, designed for generating simple, coherent stories suitable for children.
4
+
5
+ ## Model Description
6
+
7
+ This is a small-scale transformer language model built with the following architecture:
8
+ - **Model Type**: GPT (Generative Pre-trained Transformer)
9
+ - **Parameters**: ~22M parameters
10
+ - **Context Length**: 128 tokens
11
+ - **Vocabulary Size**: 50,257 (GPT-2 tokenizer)
12
+
13
+ ### Architecture Details
14
+ - **Layers**: 6 transformer blocks
15
+ - **Attention Heads**: 6
16
+ - **Hidden Size**: 384
17
+ - **Feed-forward Size**: 1536 (4 Γ— hidden_size)
18
+ - **Dropout**: 0.1
19
+ - **Activation**: GELU
20
+
21
+ ## Training Details
22
+
23
+ ### Dataset
24
+ - **Training Data**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset
25
+ - **Tokenizer**: GPT-2 tokenizer (tiktoken)
26
+ - **Training Examples**: ~2.1M stories
27
+ - **Validation Examples**: ~22K stories
28
+
29
+ ### Training Configuration
30
+ - **Optimizer**: AdamW (lr=1e-4, betas=(0.9, 0.95), weight_decay=0.1)
31
+ - **Learning Rate Schedule**: Linear warmup (1000 steps) + Cosine annealing
32
+ - **Batch Size**: 32
33
+ - **Gradient Accumulation Steps**: 32
34
+ - **Training Steps**: 20,000
35
+ - **Mixed Precision**: bfloat16/float16
36
+ - **Gradient Clipping**: 0.5
37
+
38
+ ### Training Results
39
+ - **Final Training Loss**: ~2.39
40
+ - **Final Validation Loss**: ~2.39
41
+ - **Best Validation Loss**: ~2.39 (achieved around step 19,000)
42
+
43
+ The model shows good convergence with training and validation losses closely aligned, indicating minimal overfitting.
44
+
45
+ ## Usage
46
+
47
+ ### Requirements
48
+ ```bash
49
+ pip install torch tiktoken numpy
50
+ ```
51
+
52
+ ### Quick Start
53
+ ```python
54
+ import torch
55
+ import tiktoken
56
+ from model import GPT, GPTConfig # Your model implementation
57
+
58
+ # Load tokenizer
59
+ enc = tiktoken.get_encoding("gpt2")
60
+
61
+ # Model configuration
62
+ config = GPTConfig(
63
+ vocab_size=50257,
64
+ block_size=128,
65
+ n_layer=6,
66
+ n_head=6,
67
+ n_embd=384,
68
+ dropout=0.0, # Set to 0 for inference
69
+ bias=True
70
+ )
71
+
72
+ # Load model
73
+ model = GPT(config)
74
+ model.load_state_dict(torch.load('pytorch_model.bin', map_location='cpu'))
75
+ model.eval()
76
+ ```
77
+
78
+ ### Alternative: Using Hugging Face Hub
79
+ ```python
80
+ from huggingface_hub import hf_hub_download
81
+ import torch
82
+ import tiktoken
83
+
84
+ # Download model files
85
+ model_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="pytorch_model.bin")
86
+ config_path = hf_hub_download(repo_id="abhilash88/tinystories-slm-gpt", filename="config.json")
87
+
88
+ # Load tokenizer
89
+ enc = tiktoken.get_encoding("gpt2")
90
+
91
+ # Load configuration and model
92
+ import json
93
+ with open(config_path, 'r') as f:
94
+ config_dict = json.load(f)
95
+
96
+ config = GPTConfig(**config_dict)
97
+ model = GPT(config)
98
+ model.load_state_dict(torch.load(model_path, map_location='cpu'))
99
+ model.eval()
100
+ ```
101
+
102
+ ### Text Generation
103
+ ```python
104
+ # Generate text
105
+ def generate_story(prompt, max_tokens=200, temperature=1.0, top_k=None):
106
+ context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
107
+
108
+ with torch.no_grad():
109
+ generated = model.generate(
110
+ context,
111
+ max_new_tokens=max_tokens,
112
+ temperature=temperature,
113
+ top_k=top_k
114
+ )
115
+
116
+ return enc.decode(generated.squeeze().tolist())
117
+
118
+ # Example usage
119
+ story = generate_story("Once upon a time there was a pumpkin.")
120
+ print(story)
121
+ ```
122
+
123
+ ### Sample Outputs
124
+
125
+ **Prompt**: "Once upon a time there was a pumpkin."
126
+ ```
127
+ Once upon a time there was a pumpkin. The pumpkin was very much. No one was upon a better. The egg was missing. The windows were okay. The bee put the seeds away.
128
+
129
+ Then one day, the pumpkin and the sun went on a lunch. As the sun went flying to the beach, the Baby was sad...
130
+ ```
131
+
132
+ **Prompt**: "A little girl went to the woods"
133
+ ```
134
+ A little girl went to the woods and saw some big, colourful flowers. She jumped over and reached for a key. Suddenly, there was a small sock! The girl picked up the tie and started to growled...
135
+ ```
136
+
137
+ ## Model Performance
138
+
139
+ ### Capabilities
140
+ - βœ… Generates coherent short stories
141
+ - βœ… Maintains simple narrative structure
142
+ - βœ… Uses child-friendly vocabulary
143
+ - βœ… Fast inference due to small size
144
+ - βœ… Good for educational purposes and experimentation
145
+
146
+ ### Limitations
147
+ - ❌ Limited context window (128 tokens)
148
+ - ❌ Simple vocabulary and concepts
149
+ - ❌ May generate repetitive or nonsensical content
150
+ - ❌ Not suitable for complex reasoning tasks
151
+ - ❌ Grammar and coherence issues in longer texts
152
+
153
+ ## Technical Specifications
154
+
155
+ | Specification | Value |
156
+ |---------------|-------|
157
+ | Model Size | ~22M parameters |
158
+ | Architecture | GPT (decoder-only transformer) |
159
+ | Context Length | 128 tokens |
160
+ | Vocabulary | 50,257 tokens |
161
+ | Precision | Mixed (bfloat16/float16) |
162
+ | Framework | PyTorch |
163
+
164
+ ## Files Structure
165
+ ```
166
+ β”œβ”€β”€ config.json # Model configuration
167
+ β”œβ”€β”€ pytorch_model.bin # Trained model weights
168
+ β”œβ”€β”€ model.py # Model architecture implementation
169
+ β”œβ”€β”€ tokenizer.json # Tokenizer configuration (optional)
170
+ β”œβ”€β”€ README.md # This file
171
+ └── requirements.txt # Dependencies
172
+ ```
173
+
174
+ ### Required Files for HuggingFace Upload
175
+
176
+ **1. config.json** - Model configuration file:
177
+ ```json
178
+ {
179
+ "architectures": ["GPT"],
180
+ "vocab_size": 50257,
181
+ "n_positions": 128,
182
+ "n_embd": 384,
183
+ "n_layer": 6,
184
+ "n_head": 6,
185
+ "block_size": 128,
186
+ "dropout": 0.1,
187
+ "bias": true,
188
+ "model_type": "gpt",
189
+ "torch_dtype": "float32",
190
+ "transformers_version": "4.21.0"
191
+ }
192
+ ```
193
+
194
+ **2. pytorch_model.bin** - Your converted model weights
195
+
196
+ **3. model.py** - Your model implementation (should include the GPT and GPTConfig classes)
197
+
198
+ ## Training Infrastructure
199
+ - **Hardware**: NVIDIA Tesla T4 GPU
200
+ - **Environment**: Kaggle Notebook
201
+ - **Training Time**: ~3.5 hours
202
+ - **Memory Usage**: ~15GB GPU memory
203
+
204
+ ## Evaluation Metrics
205
+ The model was evaluated using perplexity on the validation set:
206
+ - **Best Validation Perplexity**: ~10.9 (exp(2.39))
207
+ - **Training Convergence**: Achieved stable loss around step 15,000
208
+ - **Overfitting**: Minimal (train/val loss difference < 0.01)
209
+
210
+ ## Use Cases
211
+ - Educational tool for understanding transformer architecture
212
+ - Story generation for children's content
213
+ - Baseline model for NLP experiments
214
+ - Demonstration of training small language models
215
+ - Research into efficient model architectures
216
+
217
+ ## Citation
218
+ If you use this model in your research, please cite:
219
+ ```bibtex
220
+ @misc{tinystories-slm-2025,
221
+ title={Small Language Model trained on TinyStories},
222
+ author={Abhilash},
223
+ year={2025},
224
+ howpublished={HuggingFace Model Hub},
225
+ url={https://huggingface.co/abhilash88/tinystories-slm-gpt}
226
+ }
227
+ ```
228
+
229
+ ## License
230
+ This model is released under the MIT License. The TinyStories dataset follows its original licensing terms.
231
+
232
+ ## Acknowledgments
233
+ - [TinyStories Dataset](https://huggingface.co/datasets/roneneldan/TinyStories) by Ronen Eldan et al.
234
+ - [nanoGPT](https://github.com/karpathy/nanoGPT) by Andrej Karpathy for architecture inspiration
235
+ - OpenAI for the GPT-2 tokenizer
236
+
237
+ ## Contact
238
+ For questions or issues, please open an issue in the repository.
239
+
240
+ ---
241
+
242
+ *Model trained and uploaded on July 31, 2025*