GPT-2 Model - Trained from Scratch
This is a GPT-2 model trained from scratch on [your dataset].
Model Details
- Architecture: GPT-2
- Parameters: 124M
- Training Steps: 19073
- Validation Loss: 4.21
- HellaSwag Accuracy: 25.58%
Training Configuration
- Batch Size: 32
- Sequence Length: 1024
- Learning Rate: 6e-4
- Total Tokens: ~1B tokens
Usage
import torch
# Load checkpoint
checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training
config = checkpoint['config']
# Initialize model
from your_model_file import GPT # Your GPT class
model = GPT(config)
model.load_state_dict(checkpoint['model'])
model.eval()
# Generate text
# ... your generation code ...
Training Details
Trained using custom GPT-2 implementation following Andrej Karpathy's approach.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support