mtzig
/

v3c_mistral_lora_lastn

Generated from Trainer

Model card Files Files and versions

v3c_mistral_lora_lastn / README.md

mtzig's picture

Model save

9716aac verified about 1 year ago

|

history blame contribute delete

3.72 kB

	---
	library_name: peft
	base_model: peiyi9979/math-shepherd-mistral-7b-prm
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	model-index:
	- name: v3c_mistral_lora_lastn
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# v3c_mistral_lora_lastn

	This model is a fine-tuned version of [peiyi9979/math-shepherd-mistral-7b-prm](https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3067
	- Accuracy: 0.8592
	- Precision: 0.8580
	- Recall: 0.5968
	- F1: 0.7040

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 765837
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- total_eval_batch_size: 32
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Precision \| Recall \| F1 \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:---------:\|:------:\|:------:\|
	\| No log \| 0 \| 0 \| 0.6026 \| 0.7339 \| 0.6 \| 0.1542 \| 0.2453 \|
	\| 0.5522 \| 0.0495 \| 20 \| 0.5768 \| 0.7395 \| 0.5682 \| 0.2964 \| 0.3896 \|
	\| 0.4818 \| 0.0990 \| 40 \| 0.4859 \| 0.7761 \| 0.6835 \| 0.3755 \| 0.4847 \|
	\| 0.3892 \| 0.1485 \| 60 \| 0.4218 \| 0.7982 \| 0.6766 \| 0.5375 \| 0.5991 \|
	\| 0.2916 \| 0.1980 \| 80 \| 0.3747 \| 0.8237 \| 0.7701 \| 0.5296 \| 0.6276 \|
	\| 0.2191 \| 0.2475 \| 100 \| 0.3538 \| 0.8304 \| 0.7778 \| 0.5534 \| 0.6467 \|
	\| 0.2189 \| 0.2970 \| 120 \| 0.3754 \| 0.8248 \| 0.88 \| 0.4348 \| 0.5820 \|
	\| 0.1841 \| 0.3465 \| 140 \| 0.3427 \| 0.8415 \| 0.8438 \| 0.5336 \| 0.6538 \|
	\| 0.2144 \| 0.3960 \| 160 \| 0.3301 \| 0.8404 \| 0.8303 \| 0.5415 \| 0.6555 \|
	\| 0.2638 \| 0.4455 \| 180 \| 0.3202 \| 0.8470 \| 0.8485 \| 0.5534 \| 0.6699 \|
	\| 0.2032 \| 0.4950 \| 200 \| 0.3125 \| 0.8570 \| 0.8370 \| 0.6087 \| 0.7048 \|
	\| 0.1703 \| 0.5446 \| 220 \| 0.3295 \| 0.8337 \| 0.8552 \| 0.4901 \| 0.6231 \|
	\| 0.175 \| 0.5941 \| 240 \| 0.3116 \| 0.8503 \| 0.8471 \| 0.5692 \| 0.6809 \|
	\| 0.1927 \| 0.6436 \| 260 \| 0.3218 \| 0.8459 \| 0.8654 \| 0.5336 \| 0.6601 \|
	\| 0.1848 \| 0.6931 \| 280 \| 0.3069 \| 0.8647 \| 0.8659 \| 0.6126 \| 0.7176 \|
	\| 0.222 \| 0.7426 \| 300 \| 0.3036 \| 0.8581 \| 0.8613 \| 0.5889 \| 0.6995 \|
	\| 0.1693 \| 0.7921 \| 320 \| 0.3096 \| 0.8525 \| 0.8614 \| 0.5652 \| 0.6826 \|
	\| 0.1752 \| 0.8416 \| 340 \| 0.3108 \| 0.8503 \| 0.8554 \| 0.5613 \| 0.6778 \|
	\| 0.2353 \| 0.8911 \| 360 \| 0.3072 \| 0.8592 \| 0.8580 \| 0.5968 \| 0.7040 \|
	\| 0.1984 \| 0.9406 \| 380 \| 0.3078 \| 0.8603 \| 0.8629 \| 0.5968 \| 0.7056 \|
	\| 0.2194 \| 0.9901 \| 400 \| 0.3067 \| 0.8592 \| 0.8580 \| 0.5968 \| 0.7040 \|


	### Framework versions

	- PEFT 0.13.2
	- Transformers 4.46.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.1.0
	- Tokenizers 0.20.3