| | --- |
| | tags: |
| | - model_hub_mixin |
| | - pytorch_model_hub_mixin |
| | --- |
| | |
| | ## GPT-2 style model |
| |
|
| | This is a custom PyTorch GPT-2 model with 124M parameters trained on [nikolina-p/gutenberg_flat](https://huggingface.co/datasets/nikolina-p/gutenberg_flat) (3.6B tokens) and [nikolina-p/fineweb_10BT_tokenized](https://huggingface.co/datasets/nikolina-p/fineweb_10BT_tokenized) (10B tokens) datasets, for only one epoch each. |
| | Code can be found at [this GitHub repository](https://github.com/nikolina-p/gpt2base) |
| |
|
| | ### Model parameters |
| |
|
| | vocabulary size: 50304, |
| | context length: 1024, |
| | emb dim": 768, |
| | number of heads: 12, |
| | number of layers: 12, |
| | drop_rate: 0.1, |
| | |
| | ### Loss |
| | - Final training loss: 3.2248 |
| | - Final validation loss: 3.1318 |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |