YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

license: mit language: - en tags: - pytorch - causal-lm - gpt2 - text-generation - transformers library_name: pytorch pipeline_tag: text-generation

GPT-2 Style Language Model

This is a GPT-2 style autoregressive language model trained from scratch using PyTorch.

Model Description

This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences.

Model Architecture

  • Model Type: Causal Language Model (Decoder-only Transformer)
  • Architecture: GPT-2
  • Framework: PyTorch
  • Parameters:
    • Number of Layers: {model.config.n_layer}
    • Number of Attention Heads: {model.config.n_head}
    • Embedding Dimension: {model.config.n_embd}
    • Vocabulary Size: {model.config.vocab_size}
    • Maximum Sequence Length: {model.config.block_size} tokens
    • Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M

Training Details

  • Training Steps: 500
  • Batch Size: 4
  • Sequence Length: 32 tokens
  • Optimizer: AdamW
  • Learning Rate: 3e-4
  • Final Training Loss: {loss.item():.4f}
  • Tokenizer: GPT-2 BPE tokenizer (tiktoken)

Intended Use

This model is intended for:

  • Text generation tasks
  • Educational purposes and research
  • Experimentation with language model fine-tuning
  • Understanding transformer architectures

Limitations

  • Trained on a limited dataset with only 500 steps
  • May not generalize well to all text domains
  • Can produce biased or nonsensical outputs
  • Not suitable for production use without further training
  • Limited context window of {model.config.block_size} tokens

Usage

Requirements

pip install torch tiktoken huggingface_hub

Loading the Model

import torch
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt")

# Load the checkpoint
checkpoint = torch.load(model_path, map_location='cpu')

# Print model configuration
print("Model Configuration:", checkpoint['config'])

# To use the model, you'll need to define the GPT class
# (See the model architecture code in the repository)
from dataclasses import dataclass
import torch.nn as nn
from torch.nn import functional as F

@dataclass
class GPTConfig:
    block_size: int = 1024
    vocab_size: int = 50257
    n_layer: int = 12
    n_head: int = 12
    n_embd: int = 768

# Recreate the model with the saved configuration
config = GPTConfig(**checkpoint['config'])
model = GPT(config)  # You'll need the full GPT class definition
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters")

Text Generation

import tiktoken

# Initialize tokenizer
enc = tiktoken.get_encoding('gpt2')

# Prepare input
prompt = "Once upon a time"
tokens = enc.encode(prompt)
x = torch.tensor(tokens).unsqueeze(0)  # Add batch dimension

# Generate text
model.eval()
max_length = 50

with torch.no_grad():
    while x.size(1) < max_length:
        logits, _ = model(x)
        logits = logits[:, -1, :]  # Get last token logits
        probs = F.softmax(logits, dim=-1)

        # Top-k sampling
        topk_probs, topk_indices = torch.topk(probs, 50, dim=-1)
        ix = torch.multinomial(topk_probs, 1)
        xcol = torch.gather(topk_indices, -1, ix)
        x = torch.cat((x, xcol), dim=1)

# Decode and print
generated_tokens = x[0].tolist()
generated_text = enc.decode(generated_tokens)
print(generated_text)

Training Data

The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details.

Evaluation

This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended.

Citation

If you use this model, please cite:

@misc{{gpt2_tokeniser_2025,
  author = {{agileabhi}},
  title = {{GPT-2 Style Language Model}},
  year = {{2025}},
  publisher = {{Hugging Face}},
  howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}}
}}

Model Card Authors

  • agileabhi

License

MIT License - See LICENSE file for details

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using agileabhi/gpt_tokeniser 1