| license: mit | |
| language: | |
| - en | |
| tags: | |
| - pytorch | |
| - causal-lm | |
| - gpt2 | |
| - text-generation | |
| - transformers | |
| library_name: pytorch | |
| pipeline_tag: text-generation | |
| --- | |
| # GPT-2 Style Language Model | |
| This is a GPT-2 style autoregressive language model trained from scratch using PyTorch. | |
| ## Model Description | |
| This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences. | |
| ### Model Architecture | |
| - **Model Type**: Causal Language Model (Decoder-only Transformer) | |
| - **Architecture**: GPT-2 | |
| - **Framework**: PyTorch | |
| - **Parameters**: | |
| - Number of Layers: {model.config.n_layer} | |
| - Number of Attention Heads: {model.config.n_head} | |
| - Embedding Dimension: {model.config.n_embd} | |
| - Vocabulary Size: {model.config.vocab_size} | |
| - Maximum Sequence Length: {model.config.block_size} tokens | |
| - Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M | |
| ### Training Details | |
| - **Training Steps**: 500 | |
| - **Batch Size**: 4 | |
| - **Sequence Length**: 32 tokens | |
| - **Optimizer**: AdamW | |
| - **Learning Rate**: 3e-4 | |
| - **Final Training Loss**: {loss.item():.4f} | |
| - **Tokenizer**: GPT-2 BPE tokenizer (tiktoken) | |
| ### Intended Use | |
| This model is intended for: | |
| - Text generation tasks | |
| - Educational purposes and research | |
| - Experimentation with language model fine-tuning | |
| - Understanding transformer architectures | |
| ### Limitations | |
| - Trained on a limited dataset with only 500 steps | |
| - May not generalize well to all text domains | |
| - Can produce biased or nonsensical outputs | |
| - Not suitable for production use without further training | |
| - Limited context window of {model.config.block_size} tokens | |
| ## Usage | |
| ### Requirements | |
| ```bash | |
| pip install torch tiktoken huggingface_hub | |
| ``` | |
| ### Loading the Model | |
| ```python | |
| import torch | |
| from huggingface_hub import hf_hub_download | |
| # Download the model | |
| model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt") | |
| # Load the checkpoint | |
| checkpoint = torch.load(model_path, map_location='cpu') | |
| # Print model configuration | |
| print("Model Configuration:", checkpoint['config']) | |
| # To use the model, you'll need to define the GPT class | |
| # (See the model architecture code in the repository) | |
| from dataclasses import dataclass | |
| import torch.nn as nn | |
| from torch.nn import functional as F | |
| @dataclass | |
| class GPTConfig: | |
| block_size: int = 1024 | |
| vocab_size: int = 50257 | |
| n_layer: int = 12 | |
| n_head: int = 12 | |
| n_embd: int = 768 | |
| # Recreate the model with the saved configuration | |
| config = GPTConfig(**checkpoint['config']) | |
| model = GPT(config) # You'll need the full GPT class definition | |
| model.load_state_dict(checkpoint['model_state_dict']) | |
| model.eval() | |
| print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters") | |
| ``` | |
| ### Text Generation | |
| ```python | |
| import tiktoken | |
| # Initialize tokenizer | |
| enc = tiktoken.get_encoding('gpt2') | |
| # Prepare input | |
| prompt = "Once upon a time" | |
| tokens = enc.encode(prompt) | |
| x = torch.tensor(tokens).unsqueeze(0) # Add batch dimension | |
| # Generate text | |
| model.eval() | |
| max_length = 50 | |
| with torch.no_grad(): | |
| while x.size(1) < max_length: | |
| logits, _ = model(x) | |
| logits = logits[:, -1, :] # Get last token logits | |
| probs = F.softmax(logits, dim=-1) | |
| # Top-k sampling | |
| topk_probs, topk_indices = torch.topk(probs, 50, dim=-1) | |
| ix = torch.multinomial(topk_probs, 1) | |
| xcol = torch.gather(topk_indices, -1, ix) | |
| x = torch.cat((x, xcol), dim=1) | |
| # Decode and print | |
| generated_tokens = x[0].tolist() | |
| generated_text = enc.decode(generated_tokens) | |
| print(generated_text) | |
| ``` | |
| ## Training Data | |
| The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details. | |
| ## Evaluation | |
| This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{{gpt2_tokeniser_2025, | |
| author = {{agileabhi}}, | |
| title = {{GPT-2 Style Language Model}}, | |
| year = {{2025}}, | |
| publisher = {{Hugging Face}}, | |
| howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}} | |
| }} | |
| ``` | |
| ## Model Card Authors | |
| - agileabhi | |
| ## License | |
| MIT License - See LICENSE file for details |