YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
license: mit language: - en tags: - pytorch - causal-lm - gpt2 - text-generation - transformers library_name: pytorch pipeline_tag: text-generation
GPT-2 Style Language Model
This is a GPT-2 style autoregressive language model trained from scratch using PyTorch.
Model Description
This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences.
Model Architecture
- Model Type: Causal Language Model (Decoder-only Transformer)
- Architecture: GPT-2
- Framework: PyTorch
- Parameters:
- Number of Layers: {model.config.n_layer}
- Number of Attention Heads: {model.config.n_head}
- Embedding Dimension: {model.config.n_embd}
- Vocabulary Size: {model.config.vocab_size}
- Maximum Sequence Length: {model.config.block_size} tokens
- Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M
Training Details
- Training Steps: 500
- Batch Size: 4
- Sequence Length: 32 tokens
- Optimizer: AdamW
- Learning Rate: 3e-4
- Final Training Loss: {loss.item():.4f}
- Tokenizer: GPT-2 BPE tokenizer (tiktoken)
Intended Use
This model is intended for:
- Text generation tasks
- Educational purposes and research
- Experimentation with language model fine-tuning
- Understanding transformer architectures
Limitations
- Trained on a limited dataset with only 500 steps
- May not generalize well to all text domains
- Can produce biased or nonsensical outputs
- Not suitable for production use without further training
- Limited context window of {model.config.block_size} tokens
Usage
Requirements
pip install torch tiktoken huggingface_hub
Loading the Model
import torch
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt")
# Load the checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Print model configuration
print("Model Configuration:", checkpoint['config'])
# To use the model, you'll need to define the GPT class
# (See the model architecture code in the repository)
from dataclasses import dataclass
import torch.nn as nn
from torch.nn import functional as F
@dataclass
class GPTConfig:
block_size: int = 1024
vocab_size: int = 50257
n_layer: int = 12
n_head: int = 12
n_embd: int = 768
# Recreate the model with the saved configuration
config = GPTConfig(**checkpoint['config'])
model = GPT(config) # You'll need the full GPT class definition
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters")
Text Generation
import tiktoken
# Initialize tokenizer
enc = tiktoken.get_encoding('gpt2')
# Prepare input
prompt = "Once upon a time"
tokens = enc.encode(prompt)
x = torch.tensor(tokens).unsqueeze(0) # Add batch dimension
# Generate text
model.eval()
max_length = 50
with torch.no_grad():
while x.size(1) < max_length:
logits, _ = model(x)
logits = logits[:, -1, :] # Get last token logits
probs = F.softmax(logits, dim=-1)
# Top-k sampling
topk_probs, topk_indices = torch.topk(probs, 50, dim=-1)
ix = torch.multinomial(topk_probs, 1)
xcol = torch.gather(topk_indices, -1, ix)
x = torch.cat((x, xcol), dim=1)
# Decode and print
generated_tokens = x[0].tolist()
generated_text = enc.decode(generated_tokens)
print(generated_text)
Training Data
The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details.
Evaluation
This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended.
Citation
If you use this model, please cite:
@misc{{gpt2_tokeniser_2025,
author = {{agileabhi}},
title = {{GPT-2 Style Language Model}},
year = {{2025}},
publisher = {{Hugging Face}},
howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}}
}}
Model Card Authors
- agileabhi
License
MIT License - See LICENSE file for details
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support