gpt2base / README.md
nikolina-p's picture
Update README.md
3789f95 verified
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---
## GPT-2 style model
This is a custom PyTorch GPT-2 model with 124M parameters trained on [nikolina-p/gutenberg_flat](https://huggingface.co/datasets/nikolina-p/gutenberg_flat) (3.6B tokens) and [nikolina-p/fineweb_10BT_tokenized](https://huggingface.co/datasets/nikolina-p/fineweb_10BT_tokenized) (10B tokens) datasets, for only one epoch each.
Code can be found at [this GitHub repository](https://github.com/nikolina-p/gpt2base)
### Model parameters
vocabulary size: 50304,
context length: 1024,
emb dim": 768,
number of heads: 12,
number of layers: 12,
drop_rate: 0.1,
### Loss
- Final training loss: 3.2248
- Final validation loss: 3.1318