gpt2_124M_1BTokens / README.md
mtmx's picture
update Readme.md
6ed2a4d verified
metadata
language: en
license: mit
tags:
  - gpt2
  - language-model
  - pytorch

GPT-2 Model - Trained from Scratch

This is a GPT-2 model trained from scratch on [your dataset].

Model Details

  • Architecture: GPT-2
  • Parameters: 124M
  • Training Steps: 19073
  • Validation Loss: 4.21
  • HellaSwag Accuracy: 25.58%

Training Configuration

  • Batch Size: 32
  • Sequence Length: 1024
  • Learning Rate: 6e-4
  • Total Tokens: ~1B tokens

Usage

import torch

# Load checkpoint
checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training
config = checkpoint['config']

# Initialize model
from your_model_file import GPT  # Your GPT class
model = GPT(config)
model.load_state_dict(checkpoint['model'])
model.eval()

# Generate text
# ... your generation code ...

Training Details

Trained using custom GPT-2 implementation following Andrej Karpathy's approach.