nikolina-p
/

gpt2base

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

gpt2base / README.md

nikolina-p's picture

Update README.md

3789f95 verified 3 months ago

|

history blame contribute delete

770 Bytes

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	---

	## GPT-2 style model

	This is a custom PyTorch GPT-2 model with 124M parameters trained on [nikolina-p/gutenberg_flat](https://huggingface.co/datasets/nikolina-p/gutenberg_flat) (3.6B tokens) and [nikolina-p/fineweb_10BT_tokenized](https://huggingface.co/datasets/nikolina-p/fineweb_10BT_tokenized) (10B tokens) datasets, for only one epoch each.
	Code can be found at [this GitHub repository](https://github.com/nikolina-p/gpt2base)

	### Model parameters

	vocabulary size: 50304,
	context length: 1024,
	emb dim": 768,
	number of heads: 12,
	number of layers: 12,
	drop_rate: 0.1,

	### Loss
	- Final training loss: 3.2248
	- Final validation loss: 3.1318