gpt_tokeniser / README.md
agileabhi's picture
Upload README.md with huggingface_hub
677a822 verified
license: mit
language:
- en
tags:
- pytorch
- causal-lm
- gpt2
- text-generation
- transformers
library_name: pytorch
pipeline_tag: text-generation
---
# GPT-2 Style Language Model
This is a GPT-2 style autoregressive language model trained from scratch using PyTorch.
## Model Description
This model implements the GPT-2 architecture with causal self-attention mechanism for next-token prediction. It has been trained on custom text data to learn language patterns and generate coherent text sequences.
### Model Architecture
- **Model Type**: Causal Language Model (Decoder-only Transformer)
- **Architecture**: GPT-2
- **Framework**: PyTorch
- **Parameters**:
- Number of Layers: {model.config.n_layer}
- Number of Attention Heads: {model.config.n_head}
- Embedding Dimension: {model.config.n_embd}
- Vocabulary Size: {model.config.vocab_size}
- Maximum Sequence Length: {model.config.block_size} tokens
- Total Parameters: ~{sum(p.numel() for p in model.parameters()) / 1e6:.2f}M
### Training Details
- **Training Steps**: 500
- **Batch Size**: 4
- **Sequence Length**: 32 tokens
- **Optimizer**: AdamW
- **Learning Rate**: 3e-4
- **Final Training Loss**: {loss.item():.4f}
- **Tokenizer**: GPT-2 BPE tokenizer (tiktoken)
### Intended Use
This model is intended for:
- Text generation tasks
- Educational purposes and research
- Experimentation with language model fine-tuning
- Understanding transformer architectures
### Limitations
- Trained on a limited dataset with only 500 steps
- May not generalize well to all text domains
- Can produce biased or nonsensical outputs
- Not suitable for production use without further training
- Limited context window of {model.config.block_size} tokens
## Usage
### Requirements
```bash
pip install torch tiktoken huggingface_hub
```
### Loading the Model
```python
import torch
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(repo_id="{repo_id}", filename="model.pt")
# Load the checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Print model configuration
print("Model Configuration:", checkpoint['config'])
# To use the model, you'll need to define the GPT class
# (See the model architecture code in the repository)
from dataclasses import dataclass
import torch.nn as nn
from torch.nn import functional as F
@dataclass
class GPTConfig:
block_size: int = 1024
vocab_size: int = 50257
n_layer: int = 12
n_head: int = 12
n_embd: int = 768
# Recreate the model with the saved configuration
config = GPTConfig(**checkpoint['config'])
model = GPT(config) # You'll need the full GPT class definition
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
print(f"Model loaded successfully with {{sum(p.numel() for p in model.parameters()):,}} parameters")
```
### Text Generation
```python
import tiktoken
# Initialize tokenizer
enc = tiktoken.get_encoding('gpt2')
# Prepare input
prompt = "Once upon a time"
tokens = enc.encode(prompt)
x = torch.tensor(tokens).unsqueeze(0) # Add batch dimension
# Generate text
model.eval()
max_length = 50
with torch.no_grad():
while x.size(1) < max_length:
logits, _ = model(x)
logits = logits[:, -1, :] # Get last token logits
probs = F.softmax(logits, dim=-1)
# Top-k sampling
topk_probs, topk_indices = torch.topk(probs, 50, dim=-1)
ix = torch.multinomial(topk_probs, 1)
xcol = torch.gather(topk_indices, -1, ix)
x = torch.cat((x, xcol), dim=1)
# Decode and print
generated_tokens = x[0].tolist()
generated_text = enc.decode(generated_tokens)
print(generated_text)
```
## Training Data
The model was trained on custom text data using the GPT-2 tokenizer. Please refer to the training script for specific dataset details.
## Evaluation
This model checkpoint represents an early training stage (500 steps) and should be considered experimental. For production use, significantly more training is recommended.
## Citation
If you use this model, please cite:
```bibtex
@misc{{gpt2_tokeniser_2025,
author = {{agileabhi}},
title = {{GPT-2 Style Language Model}},
year = {{2025}},
publisher = {{Hugging Face}},
howpublished = {{\\url{{https://huggingface.co/{repo_id}}}}}
}}
```
## Model Card Authors
- agileabhi
## License
MIT License - See LICENSE file for details