license: mit language: - en tags: - pytorch - causal-lm - gpt2 - text-generation - transformers library_name: pytorch pipeline_tag: text-generation

GPT-2 Style Language Model

This is a GPT-2 style autoregressive language model trained from scratch using PyTorch.

Model Description

This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences.

Model Architecture

Model Type: Causal Language Model (Decoder-only Transformer)
Architecture: GPT-2
Framework: PyTorch
Parameters:
- Number of Layers: {model.config.n_layer}
- Number of Attention Heads: {model.config.n_head}
- Embedding Dimension: {model.config.n_embd}
- Vocabulary Size: {model.config.vocab_size}
- Maximum Sequence Length: {model.config.block_size} tokens
- Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M

Training Details

Training Steps: 500
Batch Size: 4
Sequence Length: 32 tokens
Optimizer: AdamW
Learning Rate: 3e-4
Final Training Loss: {loss.item():.4f}
Tokenizer: GPT-2 BPE tokenizer (tiktoken)

Intended Use

This model is intended for:

Text generation tasks
Educational purposes and research
Experimentation with language model fine-tuning
Understanding transformer architectures

Limitations

Trained on a limited dataset with only 500 steps
May not generalize well to all text domains
Can produce biased or nonsensical outputs
Not suitable for production use without further training
Limited context window of {model.config.block_size} tokens

Usage

Requirements

pip install torch tiktoken huggingface_hub

Loading the Model

import torch
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt")

# Load the checkpoint
checkpoint = torch.load(model_path, map_location='cpu')

# Print model configuration
print("Model Configuration:", checkpoint['config'])

# To use the model, you'll need to define the GPT class
# (See the model architecture code in the repository)
from dataclasses import dataclass
import torch.nn as nn
from torch.nn import functional as F

@dataclass
class GPTConfig:
    block_size: int = 1024
    vocab_size: int = 50257
    n_layer: int = 12
    n_head: int = 12
    n_embd: int = 768

# Recreate the model with the saved configuration
config = GPTConfig(**checkpoint['config'])
model = GPT(config)  # You'll need the full GPT class definition
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters")

Text Generation

import tiktoken

# Initialize tokenizer
enc = tiktoken.get_encoding('gpt2')

# Prepare input
prompt = "Once upon a time"
tokens = enc.encode(prompt)
x = torch.tensor(tokens).unsqueeze(0)  # Add batch dimension

# Generate text
model.eval()
max_length = 50

with torch.no_grad():
    while x.size(1) < max_length:
        logits, _ = model(x)
        logits = logits[:, -1, :]  # Get last token logits
        probs = F.softmax(logits, dim=-1)

        # Top-k sampling
        topk_probs, topk_indices = torch.topk(probs, 50, dim=-1)
        ix = torch.multinomial(topk_probs, 1)
        xcol = torch.gather(topk_indices, -1, ix)
        x = torch.cat((x, xcol), dim=1)

# Decode and print
generated_tokens = x[0].tolist()
generated_text = enc.decode(generated_tokens)
print(generated_text)

Training Data

The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details.

Evaluation

This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended.

Citation

If you use this model, please cite:

@misc{{gpt2_tokeniser_2025,
  author = {{agileabhi}},
  title = {{GPT-2 Style Language Model}},
  year = {{2025}},
  publisher = {{Hugging Face}},
  howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}}
}}

Model Card Authors

agileabhi

License

MIT License - See LICENSE file for details

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

agileabhi
/

gpt_tokeniser