gpt2-wikitext-finetuned

GPT-2 model fine-tuned on WikiText-2 as part of the LLM-Study project - a comprehensive multi-platform GPT-2 training framework.

Model Description

This model was trained using a multi-platform GPT-2 training framework, which supports:

🖥️ MacBook M3 (FP16 training with Metal Performance Shaders)
🎮 Dual GTX 1080 (Multi-GPU DDP with PyTorch)
☁️ Google Colab (Cloud training with Tensor Cores)

Model Details

Attribute	Value
Base Model	GPT-2
Parameters	124M (124,439,808)
Training Platform	Dual GTX 1080 (Multi-GPU DDP)
Precision	FP32
Context Length	1024 tokens
Vocabulary Size	50,257

Training Details

Hyperparameter	Value
Dataset	WikiText-2
Training Steps	1719
Epochs	3
Learning Rate	5e-5
Effective Batch Size	64
LR Scheduler	Cosine with warmup
Warmup Steps	500
Optimizer	AdamW
Weight Decay	0.01

Training Results

Final Training Loss: 17.67
Evaluation Loss: 3.26
HellaSwag Accuracy: 28.96% (zero-shot)

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("abhinema/gpt2-wikitext-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("abhinema/gpt2-wikitext-finetuned")

# Option 1: Using pipeline (easy)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
output = generator("The future of artificial intelligence", max_length=100, num_return_sequences=1)
print(output[0]['generated_text'])

# Option 2: Direct generation (more control)
import torch
inputs = tokenizer("The quick brown fox", return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_length=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Framework & Environment

Component	Version
Framework	PyTorch 2.7.1
Transformers	4.28.1
Accelerate	0.25.0
CUDA	11.8
GPU	2× NVIDIA GTX 1080 (8GB each)

Limitations

⚠️ Important Considerations:

English only - Trained exclusively on English text
Potential biases - May generate biased or inappropriate content
Not production-ready - Requires additional filtering and safety measures
Domain specific - Optimized for encyclopedic/Wikipedia-style text

Intended Use

✅ Recommended:

Research and experimentation
Learning about LLM training and fine-tuning
Text generation in controlled environments
Further fine-tuning for specific domains

❌ Not Recommended:

Production applications without safety filters
Generation of factual or medical information
Use cases requiring high reliability

Ethical Considerations

This model inherits biases from:

The original GPT-2 pretraining data
WikiText dataset (derived from Wikipedia)

Users should implement appropriate content filtering for any real-world applications.

Citation

If you use this model, please reference the LLM-Study project:

@misc{llm-study-gpt2-2026,
  author = {Abhishek Nema},
  title = {GPT-2 Multi-Platform Training - gpt2-wikitext-finetuned},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/abhinema/gpt2-wikitext-finetuned}}
}

Contact

Author: Abhishek Nema
Email: abhinema@gmail.com
Project: LLM-Study on GitHub

License

MIT License - Free for academic and commercial use with attribution.

Downloads last month: 3

Dataset used to train abhinema/gpt2-wikitext-finetuned

Evaluation results

Perplexity on WikiText-2
self-reported

17.670