|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- gpt2 |
|
|
- language-model |
|
|
- pytorch |
|
|
--- |
|
|
|
|
|
# GPT-2 Model - Trained from Scratch |
|
|
|
|
|
This is a GPT-2 model trained from scratch on [your dataset]. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: GPT-2 |
|
|
- **Parameters**: 124M |
|
|
- **Training Steps**: 19073 |
|
|
- **Validation Loss**: 4.21 |
|
|
- **HellaSwag Accuracy**: 25.58% |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
- Batch Size: 32 |
|
|
- Sequence Length: 1024 |
|
|
- Learning Rate: 6e-4 |
|
|
- Total Tokens: ~1B tokens |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
import torch |
|
|
|
|
|
# Load checkpoint |
|
|
checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training |
|
|
config = checkpoint['config'] |
|
|
|
|
|
# Initialize model |
|
|
from your_model_file import GPT # Your GPT class |
|
|
model = GPT(config) |
|
|
model.load_state_dict(checkpoint['model']) |
|
|
model.eval() |
|
|
|
|
|
# Generate text |
|
|
# ... your generation code ... |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
Trained using custom GPT-2 implementation following Andrej Karpathy's approach. |