--- tags: - model_hub_mixin - pytorch_model_hub_mixin --- ## GPT-2 style model This is a custom PyTorch GPT-2 model with 124M parameters trained on [nikolina-p/gutenberg_flat](https://huggingface.co/datasets/nikolina-p/gutenberg_flat) (3.6B tokens) and [nikolina-p/fineweb_10BT_tokenized](https://huggingface.co/datasets/nikolina-p/fineweb_10BT_tokenized) (10B tokens) datasets, for only one epoch each. Code can be found at [this GitHub repository](https://github.com/nikolina-p/gpt2base) ### Model parameters vocabulary size: 50304, context length: 1024, emb dim": 768, number of heads: 12, number of layers: 12, drop_rate: 0.1, ### Loss - Final training loss: 3.2248 - Final validation loss: 3.1318