--- tags: - causal-lm - text-generation - pre-trained - pytorch --- # tinystories-gpt-small This is a custom GPT model **pre-trained from scratch on the TinyStories dataset**. It demonstrates foundational language modeling capabilities and can be used for text generation. ## Model Details * **Architecture:** Custom GPT * `n_layer`: 8 * `n_head`: 8 * `n_embd`: 512 * `block_size`: 1024 * `vocab_size`: 50257 * `dropout`: 0.1 * **Pre-training Dataset:** TinyStories (a synthetic dataset of short, simple stories designed to teach language models basic reasoning and coherence). * **Purpose:** This model is a base language model. It has learned to predict the next token in a sequence based on the patterns found in the TinyStories dataset. It is suitable for demonstrating basic generative text capabilities and serves as a foundation for further fine-tuning on specific downstream tasks (e.g., question answering, chatbot). ## How to Use (Inference) ```python import torch import tiktoken from model import GPT, GPTConfig # Assuming model.py is available or its classes are defined # 1. Define model configuration (must match the trained model's config.json) # You can load this from config.json if you save it, or define it manually config = GPTConfig( vocab_size=50257, block_size=1024, n_layer=8, n_head=8, n_embd=512, dropout=0.1, bias=True ) # 2. Initialize the model and load weights model = GPT(config) state_dict = torch.load("pytorch_model.bin", map_location='cpu') # Replace with path to downloaded model model.load_state_dict(state_dict) model.eval() # Set to evaluation mode device = 'cuda' if torch.cuda.is_available() else 'cpu' model.to(device) # 3. Load the tiktoken tokenizer tokenizer = tiktoken.get_encoding("gpt2") EOT_TOKEN_ID = tokenizer.eot_token # 4. Prepare your prompt for text generation prompt_text = "Once upon a time there was a pumpkin." # Encode the prompt allowed_special_tokens = 'all' input_ids = tokenizer.encode(prompt_text, allowed_special=allowed_special_tokens) input_ids_tensor = torch.tensor([input_ids], dtype=torch.long).to(device) # 5. Generate text # Adjust max_new_tokens, temperature, top_k as needed generated_output_ids = model.generate( idx=input_ids_tensor, max_new_tokens=100, # Max length for the generated text temperature=0.7, top_k=50 ) # Decode the generated text (excluding the prompt part) generated_text_ids = generated_output_ids[0, len(input_ids):].tolist() generated_text = tokenizer.decode(generated_text_ids) # Clean up any leftover EOT tokens from generation generated_text = generated_text.replace(tokenizer.decode([EOT_TOKEN_ID]), "").strip() print(f"Generated Text: {generated_text}") ``` ## Limitations and Bias * This model is a relatively small GPT (50.95M parameters) and its generative capabilities are limited by its size and the simplicity of the TinyStories dataset. * It is a base language model and has not been instruction-tuned or fine-tuned for specific tasks like complex question answering or dialogue. Therefore, its responses may be incoherent or non-factual for out-of-distribution prompts. * Like all language models, it may generate biased or incorrect information based on its training data. ## License Apache 2.0