license: mit language: - en tags: - pytorch - causal-lm - gpt2 - text-generation - transformers library_name: pytorch pipeline_tag: text-generation --- # GPT-2 Style Language Model This is a GPT-2 style autoregressive language model trained from scratch using PyTorch. ## Model Description This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences. ### Model Architecture - **Model Type**: Causal Language Model (Decoder-only Transformer) - **Architecture**: GPT-2 - **Framework**: PyTorch - **Parameters**: - Number of Layers: {model.config.n_layer} - Number of Attention Heads: {model.config.n_head} - Embedding Dimension: {model.config.n_embd} - Vocabulary Size: {model.config.vocab_size} - Maximum Sequence Length: {model.config.block_size} tokens - Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M ### Training Details - **Training Steps**: 500 - **Batch Size**: 4 - **Sequence Length**: 32 tokens - **Optimizer**: AdamW - **Learning Rate**: 3e-4 - **Final Training Loss**: {loss.item():.4f} - **Tokenizer**: GPT-2 BPE tokenizer (tiktoken) ### Intended Use This model is intended for: - Text generation tasks - Educational purposes and research - Experimentation with language model fine-tuning - Understanding transformer architectures ### Limitations - Trained on a limited dataset with only 500 steps - May not generalize well to all text domains - Can produce biased or nonsensical outputs - Not suitable for production use without further training - Limited context window of {model.config.block_size} tokens ## Usage ### Requirements ```bash pip install torch tiktoken huggingface_hub ``` ### Loading the Model ```python import torch from huggingface_hub import hf_hub_download # Download the model model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt") # Load the checkpoint checkpoint = torch.load(model_path, map_location='cpu') # Print model configuration print("Model Configuration:", checkpoint['config']) # To use the model, you'll need to define the GPT class # (See the model architecture code in the repository) from dataclasses import dataclass import torch.nn as nn from torch.nn import functional as F @dataclass class GPTConfig: block_size: int = 1024 vocab_size: int = 50257 n_layer: int = 12 n_head: int = 12 n_embd: int = 768 # Recreate the model with the saved configuration config = GPTConfig(**checkpoint['config']) model = GPT(config) # You'll need the full GPT class definition model.load_state_dict(checkpoint['model_state_dict']) model.eval() print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters") ``` ### Text Generation ```python import tiktoken # Initialize tokenizer enc = tiktoken.get_encoding('gpt2') # Prepare input prompt = "Once upon a time" tokens = enc.encode(prompt) x = torch.tensor(tokens).unsqueeze(0) # Add batch dimension # Generate text model.eval() max_length = 50 with torch.no_grad(): while x.size(1) < max_length: logits, _ = model(x) logits = logits[:, -1, :] # Get last token logits probs = F.softmax(logits, dim=-1) # Top-k sampling topk_probs, topk_indices = torch.topk(probs, 50, dim=-1) ix = torch.multinomial(topk_probs, 1) xcol = torch.gather(topk_indices, -1, ix) x = torch.cat((x, xcol), dim=1) # Decode and print generated_tokens = x[0].tolist() generated_text = enc.decode(generated_tokens) print(generated_text) ``` ## Training Data The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details. ## Evaluation This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended. ## Citation If you use this model, please cite: ```bibtex @misc{{gpt2_tokeniser_2025, author = {{agileabhi}}, title = {{GPT-2 Style Language Model}}, year = {{2025}}, publisher = {{Hugging Face}}, howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}} }} ``` ## Model Card Authors - agileabhi ## License MIT License - See LICENSE file for details