moashmawy
/

tinystories-gpt-small

+---
+tags:
+- causal-lm
+- text-generation
+- pre-trained
+- pytorch
+---
+# tinystories-gpt-small
+This is a custom GPT model **pre-trained from scratch on the TinyStories dataset**.
+It demonstrates foundational language modeling capabilities and can be used for text generation.
+## Model Details
+* **Architecture:** Custom GPT
+    * `n_layer`: 8
+    * `n_head`: 8
+    * `n_embd`: 512
+    * `block_size`: 1024
+    * `vocab_size`: 50257
+    * `dropout`: 0.1
+* **Pre-training Dataset:** TinyStories (a synthetic dataset of short, simple stories designed to teach language models basic reasoning and coherence).
+* **Purpose:** This model is a base language model. It has learned to predict the next token in a sequence based on the patterns found in the TinyStories dataset. It is suitable for demonstrating basic generative text capabilities and serves as a foundation for further fine-tuning on specific downstream tasks (e.g., question answering, chatbot).
+## How to Use (Inference)
+Since this model uses `tiktoken` for tokenization, you'll need to explicitly load the tokenizer using `tiktoken`.
+```python
+import torch
+import tiktoken
+from model import GPT, GPTConfig # Assuming model.py is available or its classes are defined
+# 1. Define model configuration (must match the trained model's config.json)
+# You can load this from config.json if you save it, or define it manually
+config = GPTConfig(
+    vocab_size=50257,
+    block_size=1024,
+    n_layer=8,
+    n_head=8,
+    n_embd=512,
+    dropout=0.1,
+    bias=True
+)
+# 2. Initialize the model and load weights
+model = GPT(config)
+state_dict = torch.load("pytorch_model.bin", map_location='cpu') # Replace with path to downloaded model
+model.load_state_dict(state_dict)
+model.eval() # Set to evaluation mode
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+model.to(device)
+# 3. Load the tiktoken tokenizer
+tokenizer = tiktoken.get_encoding("gpt2")
+EOT_TOKEN_ID = tokenizer.eot_token
+# 4. Prepare your prompt for text generation
+prompt_text = "Once upon a time there was a pumpkin."
+# Encode the prompt
+allowed_special_tokens = 'all'
+input_ids = tokenizer.encode(prompt_text, allowed_special=allowed_special_tokens)
+input_ids_tensor = torch.tensor([input_ids], dtype=torch.long).to(device)
+# 5. Generate text
+# Adjust max_new_tokens, temperature, top_k as needed
+generated_output_ids = model.generate(
+    idx=input_ids_tensor,
+    max_new_tokens=100, # Max length for the generated text
+    temperature=0.7,
+    top_k=50
+)
+# Decode the generated text (excluding the prompt part)
+generated_text_ids = generated_output_ids[0, len(input_ids):].tolist()
+generated_text = tokenizer.decode(generated_text_ids)
+# Clean up any leftover EOT tokens from generation
+generated_text = generated_text.replace(tokenizer.decode([EOT_TOKEN_ID]), "").strip()
+print(f"Generated Text: {generated_text}")
+```
+## Limitations and Bias
+* This model is a relatively small GPT (50.95M parameters) and its generative capabilities are limited by its size and the simplicity of the TinyStories dataset.
+* It is a base language model and has not been instruction-tuned or fine-tuned for specific tasks like complex question answering or dialogue. Therefore, its responses may be incoherent or non-factual for out-of-distribution prompts.
+* Like all language models, it may generate biased or incorrect information based on its training data.
+## License
+Apache 2.0