GPT-2 style model

This is a custom PyTorch GPT-2 model with 124M parameters trained on nikolina-p/gutenberg_flat (3.6B tokens) and nikolina-p/fineweb_10BT_tokenized (10B tokens) datasets, for only one epoch each. Code can be found at this GitHub repository

Model parameters

vocabulary size: 50304,
context length: 1024,
emb dim": 768,
number of heads: 12,
number of layers: 12,
drop_rate: 0.1,

Loss

  • Final training loss: 3.2248
  • Final validation loss: 3.1318
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support