jonasknobloch
/

gpt2_m100_tiny-stories_1024_dpos

Generated from Trainer

Eval Results (legacy)

Model card Files Files and versions

gpt2_m100_tiny-stories_1024_dpos / README.md

jonasknobloch's picture

Upload folder using huggingface_hub

cc39a90 verified 12 months ago

|

history blame contribute delete

2.99 kB

	---
	tags:
	- generated_from_trainer
	datasets:
	- roneneldan/TinyStories
	metrics:
	- accuracy
	model-index:
	- name: gpt2_m100_tiny-stories_1024_dpos
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: roneneldan/TinyStories
	type: roneneldan/TinyStories
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.6900624734182258
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/scads-nlp/morph-gpt_gpt2_tiny-stories_dpos/runs/y12q0b1d)
	# gpt2_m100_tiny-stories_1024_dpos

	This model is a fine-tuned version of [](https://huggingface.co/) on the roneneldan/TinyStories dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.1579
	- Accuracy: 0.6901

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|
	\| 2.801 \| 0.0506 \| 1000 \| 2.3554 \| 0.4658 \|
	\| 1.8977 \| 0.1012 \| 2000 \| 1.7248 \| 0.5834 \|
	\| 1.658 \| 0.1518 \| 3000 \| 1.5498 \| 0.6145 \|
	\| 1.5426 \| 0.2024 \| 4000 \| 1.4542 \| 0.6321 \|
	\| 1.4721 \| 0.2530 \| 5000 \| 1.3930 \| 0.6435 \|
	\| 1.4237 \| 0.3036 \| 6000 \| 1.3497 \| 0.6517 \|
	\| 1.387 \| 0.3543 \| 7000 \| 1.3162 \| 0.6580 \|
	\| 1.3537 \| 0.4049 \| 8000 \| 1.2899 \| 0.6633 \|
	\| 1.3306 \| 0.4555 \| 9000 \| 1.2683 \| 0.6676 \|
	\| 1.3127 \| 0.5061 \| 10000 \| 1.2474 \| 0.6716 \|
	\| 1.2925 \| 0.5567 \| 11000 \| 1.2326 \| 0.6745 \|
	\| 1.2779 \| 0.6073 \| 12000 \| 1.2171 \| 0.6778 \|
	\| 1.262 \| 0.6579 \| 13000 \| 1.2051 \| 0.6802 \|
	\| 1.2502 \| 0.7085 \| 14000 \| 1.1949 \| 0.6823 \|
	\| 1.2413 \| 0.7591 \| 15000 \| 1.1852 \| 0.6843 \|
	\| 1.2354 \| 0.8097 \| 16000 \| 1.1773 \| 0.6857 \|
	\| 1.2254 \| 0.8603 \| 17000 \| 1.1699 \| 0.6874 \|
	\| 1.2186 \| 0.9109 \| 18000 \| 1.1639 \| 0.6887 \|
	\| 1.2155 \| 0.9615 \| 19000 \| 1.1597 \| 0.6897 \|


	### Framework versions

	- Transformers 4.42.3
	- Pytorch 2.2.2+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1