---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---

## GPT-2 style model

This is a custom PyTorch GPT-2 model with 124M parameters trained on [nikolina-p/gutenberg_flat](https://huggingface.co/datasets/nikolina-p/gutenberg_flat) (3.6B tokens) and [nikolina-p/fineweb_10BT_tokenized](https://huggingface.co/datasets/nikolina-p/fineweb_10BT_tokenized) (10B tokens) datasets, for only one epoch each. 
Code can be found at [this GitHub repository](https://github.com/nikolina-p/gpt2base)

### Model parameters

vocabulary size: 50304,        
context length: 1024,     
emb dim": 768,            
number of heads: 12,              
number of layers: 12,             
drop_rate: 0.1,

### Loss
- Final training loss: 3.2248
- Final validation loss: 3.1318