gpt2-wikitext-finetuned

GPT-2 model fine-tuned on WikiText-2 as part of the LLM-Study project - a comprehensive multi-platform GPT-2 training framework.

Model Description

This model was trained using a multi-platform GPT-2 training framework, which supports:

  • 🖥️ MacBook M3 (FP16 training with Metal Performance Shaders)
  • 🎮 Dual GTX 1080 (Multi-GPU DDP with PyTorch)
  • ☁️ Google Colab (Cloud training with Tensor Cores)

Model Details

Attribute Value
Base Model GPT-2
Parameters 124M (124,439,808)
Training Platform Dual GTX 1080 (Multi-GPU DDP)
Precision FP32
Context Length 1024 tokens
Vocabulary Size 50,257

Training Details

Hyperparameter Value
Dataset WikiText-2
Training Steps 1719
Epochs 3
Learning Rate 5e-5
Effective Batch Size 64
LR Scheduler Cosine with warmup
Warmup Steps 500
Optimizer AdamW
Weight Decay 0.01

Training Results

  • Final Training Loss: 17.67
  • Evaluation Loss: 3.26
  • HellaSwag Accuracy: 28.96% (zero-shot)

Usage

from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("abhinema/gpt2-wikitext-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("abhinema/gpt2-wikitext-finetuned")

# Option 1: Using pipeline (easy)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
output = generator("The future of artificial intelligence", max_length=100, num_return_sequences=1)
print(output[0]['generated_text'])

# Option 2: Direct generation (more control)
import torch
inputs = tokenizer("The quick brown fox", return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_length=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Framework & Environment

Component Version
Framework PyTorch 2.7.1
Transformers 4.28.1
Accelerate 0.25.0
CUDA 11.8
GPU 2× NVIDIA GTX 1080 (8GB each)

Limitations

⚠️ Important Considerations:

  • English only - Trained exclusively on English text
  • Potential biases - May generate biased or inappropriate content
  • Not production-ready - Requires additional filtering and safety measures
  • Domain specific - Optimized for encyclopedic/Wikipedia-style text

Intended Use

Recommended:

  • Research and experimentation
  • Learning about LLM training and fine-tuning
  • Text generation in controlled environments
  • Further fine-tuning for specific domains

Not Recommended:

  • Production applications without safety filters
  • Generation of factual or medical information
  • Use cases requiring high reliability

Ethical Considerations

This model inherits biases from:

  1. The original GPT-2 pretraining data
  2. WikiText dataset (derived from Wikipedia)

Users should implement appropriate content filtering for any real-world applications.

Citation

If you use this model, please reference the LLM-Study project:

@misc{llm-study-gpt2-2026,
  author = {Abhishek Nema},
  title = {GPT-2 Multi-Platform Training - gpt2-wikitext-finetuned},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/abhinema/gpt2-wikitext-finetuned}}
}

Contact

License

MIT License - Free for academic and commercial use with attribution.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train abhinema/gpt2-wikitext-finetuned

Evaluation results