rbelanec
/

train_math_qa_123_1760637721

Text Generation

Model card Files Files and versions

train_math_qa_123_1760637721 / README.md

rbelanec's picture

End of training

107f0f2 verified 3 months ago

|

history blame contribute delete

3.07 kB

	---
	library_name: peft
	license: llama3
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	tags:
	- base_model:adapter:meta-llama/Meta-Llama-3-8B-Instruct
	- llama-factory
	- transformers
	pipeline_tag: text-generation
	model-index:
	- name: train_math_qa_123_1760637721
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# train_math_qa_123_1760637721

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the math_qa dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7989
	- Num Input Tokens Seen: 77961608

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.03
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 123
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Input Tokens Seen \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:-----------------:\|
	\| 0.8045 \| 1.0 \| 6714 \| 0.8024 \| 3894688 \|
	\| 0.8422 \| 2.0 \| 13428 \| 0.8020 \| 7789256 \|
	\| 0.792 \| 3.0 \| 20142 \| 0.8027 \| 11683856 \|
	\| 0.7935 \| 4.0 \| 26856 \| 0.8017 \| 15585872 \|
	\| 0.8031 \| 5.0 \| 33570 \| 0.8017 \| 19482128 \|
	\| 0.815 \| 6.0 \| 40284 \| 0.8029 \| 23376776 \|
	\| 0.8242 \| 7.0 \| 46998 \| 0.8019 \| 27278720 \|
	\| 0.7873 \| 8.0 \| 53712 \| 0.8020 \| 31180904 \|
	\| 0.8132 \| 9.0 \| 60426 \| 0.8010 \| 35077032 \|
	\| 0.8102 \| 10.0 \| 67140 \| 0.8017 \| 38976336 \|
	\| 0.7992 \| 11.0 \| 73854 \| 0.8003 \| 42875264 \|
	\| 0.7982 \| 12.0 \| 80568 \| 0.8000 \| 46772480 \|
	\| 0.8272 \| 13.0 \| 87282 \| 0.7996 \| 50673016 \|
	\| 0.8153 \| 14.0 \| 93996 \| 0.7991 \| 54573896 \|
	\| 0.8029 \| 15.0 \| 100710 \| 0.8003 \| 58472760 \|
	\| 0.7927 \| 16.0 \| 107424 \| 0.7989 \| 62371472 \|
	\| 0.7752 \| 17.0 \| 114138 \| 0.8002 \| 66268336 \|
	\| 0.81 \| 18.0 \| 120852 \| 0.7994 \| 70167432 \|
	\| 0.8225 \| 19.0 \| 127566 \| 0.7995 \| 74065984 \|
	\| 0.8063 \| 20.0 \| 134280 \| 0.7992 \| 77961608 \|


	### Framework versions

	- PEFT 0.17.1
	- Transformers 4.51.3
	- Pytorch 2.9.0+cu128
	- Datasets 4.0.0
	- Tokenizers 0.21.4