gpt2-wikitext-finetuned
GPT-2 model fine-tuned on WikiText-2 as part of the LLM-Study project - a comprehensive multi-platform GPT-2 training framework.
Model Description
This model was trained using a multi-platform GPT-2 training framework, which supports:
- 🖥️ MacBook M3 (FP16 training with Metal Performance Shaders)
- 🎮 Dual GTX 1080 (Multi-GPU DDP with PyTorch)
- ☁️ Google Colab (Cloud training with Tensor Cores)
Model Details
| Attribute | Value |
|---|---|
| Base Model | GPT-2 |
| Parameters | 124M (124,439,808) |
| Training Platform | Dual GTX 1080 (Multi-GPU DDP) |
| Precision | FP32 |
| Context Length | 1024 tokens |
| Vocabulary Size | 50,257 |
Training Details
| Hyperparameter | Value |
|---|---|
| Dataset | WikiText-2 |
| Training Steps | 1719 |
| Epochs | 3 |
| Learning Rate | 5e-5 |
| Effective Batch Size | 64 |
| LR Scheduler | Cosine with warmup |
| Warmup Steps | 500 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
Training Results
- Final Training Loss: 17.67
- Evaluation Loss: 3.26
- HellaSwag Accuracy: 28.96% (zero-shot)
Usage
from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("abhinema/gpt2-wikitext-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("abhinema/gpt2-wikitext-finetuned")
# Option 1: Using pipeline (easy)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
output = generator("The future of artificial intelligence", max_length=100, num_return_sequences=1)
print(output[0]['generated_text'])
# Option 2: Direct generation (more control)
import torch
inputs = tokenizer("The quick brown fox", return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_length=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Framework & Environment
| Component | Version |
|---|---|
| Framework | PyTorch 2.7.1 |
| Transformers | 4.28.1 |
| Accelerate | 0.25.0 |
| CUDA | 11.8 |
| GPU | 2× NVIDIA GTX 1080 (8GB each) |
Limitations
⚠️ Important Considerations:
- English only - Trained exclusively on English text
- Potential biases - May generate biased or inappropriate content
- Not production-ready - Requires additional filtering and safety measures
- Domain specific - Optimized for encyclopedic/Wikipedia-style text
Intended Use
✅ Recommended:
- Research and experimentation
- Learning about LLM training and fine-tuning
- Text generation in controlled environments
- Further fine-tuning for specific domains
❌ Not Recommended:
- Production applications without safety filters
- Generation of factual or medical information
- Use cases requiring high reliability
Ethical Considerations
This model inherits biases from:
- The original GPT-2 pretraining data
- WikiText dataset (derived from Wikipedia)
Users should implement appropriate content filtering for any real-world applications.
Citation
If you use this model, please reference the LLM-Study project:
@misc{llm-study-gpt2-2026,
author = {Abhishek Nema},
title = {GPT-2 Multi-Platform Training - gpt2-wikitext-finetuned},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/abhinema/gpt2-wikitext-finetuned}}
}
Contact
- Author: Abhishek Nema
- Email: abhinema@gmail.com
- Project: LLM-Study on GitHub
License
MIT License - Free for academic and commercial use with attribution.
- Downloads last month
- -
Dataset used to train abhinema/gpt2-wikitext-finetuned
Evaluation results
- Perplexity on WikiText-2self-reported17.670