mazzaqq
/

DPO_davide

Generated from Trainer

Model card Files Files and versions

DPO_davide / README.md

mazzaqq's picture

Model save

4971402 verified almost 2 years ago

|

history blame contribute delete

3.61 kB

	---
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-chat-hf
	model-index:
	- name: sigmoid_lr2e-05_b0.1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# sigmoid_lr2e-05_b0.1

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1587
	- Rewards/chosen: 1.0566
	- Rewards/rejected: -3.7222
	- Rewards/accuracies: 0.9348
	- Rewards/margins: 4.7788
	- Logps/rejected: -100.3634
	- Logps/chosen: -68.2028
	- Logits/rejected: -1.2118
	- Logits/chosen: -1.1884

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.1582 \| 0.1 \| 341 \| 0.3412 \| 1.4613 \| -0.6760 \| 0.8442 \| 2.1373 \| -69.9019 \| -64.1564 \| -0.9916 \| -0.9274 \|
	\| 0.2165 \| 0.2 \| 682 \| 0.2655 \| 1.8031 \| -1.3141 \| 0.8714 \| 3.1172 \| -76.2827 \| -60.7382 \| -1.0115 \| -0.9525 \|
	\| 0.0864 \| 0.3 \| 1023 \| 0.2379 \| 0.6173 \| -3.1475 \| 0.8877 \| 3.7648 \| -94.6172 \| -72.5967 \| -1.0623 \| -1.0198 \|
	\| 0.3192 \| 0.4 \| 1364 \| 0.2003 \| 1.3681 \| -2.3819 \| 0.9185 \| 3.7500 \| -86.9604 \| -65.0880 \| -1.1691 \| -1.1334 \|
	\| 0.5707 \| 0.5 \| 1705 \| 0.1831 \| 1.2028 \| -3.2640 \| 0.9293 \| 4.4667 \| -95.7812 \| -66.7415 \| -1.2287 \| -1.1992 \|
	\| 0.0427 \| 0.6 \| 2046 \| 0.1718 \| 1.3838 \| -3.1327 \| 0.9312 \| 4.5166 \| -94.4690 \| -64.9309 \| -1.1900 \| -1.1566 \|
	\| 0.1956 \| 0.7 \| 2387 \| 0.1608 \| 1.0344 \| -3.7242 \| 0.9366 \| 4.7586 \| -100.3841 \| -68.4254 \| -1.2044 \| -1.1795 \|
	\| 0.0319 \| 0.8 \| 2728 \| 0.1595 \| 1.0398 \| -3.7445 \| 0.9348 \| 4.7843 \| -100.5868 \| -68.3711 \| -1.2077 \| -1.1849 \|
	\| 0.0173 \| 0.9 \| 3069 \| 0.1587 \| 1.0566 \| -3.7222 \| 0.9348 \| 4.7788 \| -100.3634 \| -68.2028 \| -1.2118 \| -1.1884 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.36.2
	- Pytorch 2.1.2
	- Datasets 2.15.0
	- Tokenizers 0.15.0