gpt2_124M_1BTokens / README.md
mtmx's picture
update Readme.md
6ed2a4d verified
---
language: en
license: mit
tags:
- gpt2
- language-model
- pytorch
---
# GPT-2 Model - Trained from Scratch
This is a GPT-2 model trained from scratch on [your dataset].
## Model Details
- **Architecture**: GPT-2
- **Parameters**: 124M
- **Training Steps**: 19073
- **Validation Loss**: 4.21
- **HellaSwag Accuracy**: 25.58%
## Training Configuration
- Batch Size: 32
- Sequence Length: 1024
- Learning Rate: 6e-4
- Total Tokens: ~1B tokens
## Usage
```python
import torch
# Load checkpoint
checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training
config = checkpoint['config']
# Initialize model
from your_model_file import GPT # Your GPT class
model = GPT(config)
model.load_state_dict(checkpoint['model'])
model.eval()
# Generate text
# ... your generation code ...
```
## Training Details
Trained using custom GPT-2 implementation following Andrej Karpathy's approach.