mtg-dpo-fail / README.md

TrevorJS/mtg-phi-1_5-2-dpo

6139024 over 2 years ago

6.6 kB

	---
	license: other
	base_model: microsoft/phi-1_5
	tags:
	- generated_from_trainer
	model-index:
	- name: dpo
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# dpo

	This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0000
	- Rewards/chosen: -8.4849
	- Rewards/rejected: -25.9483
	- Rewards/accuracies: 1.0
	- Rewards/margins: 17.4633
	- Logps/rejected: -293.3352
	- Logps/chosen: -152.1862
	- Logits/rejected: -0.9014
	- Logits/chosen: -0.4994

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 4
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 2500

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.0318 \| 0.07 \| 100 \| 0.0384 \| -0.3956 \| -7.7708 \| 0.9835 \| 7.3753 \| -111.5607 \| -71.2923 \| 1.1941 \| 1.0925 \|
	\| 0.0187 \| 0.15 \| 200 \| 0.0196 \| -2.0328 \| -10.9862 \| 0.9922 \| 8.9535 \| -143.7145 \| -87.6645 \| -0.8539 \| -0.9067 \|
	\| 0.0101 \| 0.22 \| 300 \| 0.0351 \| -2.7345 \| -12.1219 \| 0.9896 \| 9.3874 \| -155.0717 \| -94.6821 \| 0.4420 \| 0.5220 \|
	\| 0.046 \| 0.29 \| 400 \| 0.0199 \| -6.6027 \| -18.5556 \| 0.9922 \| 11.9529 \| -219.4086 \| -133.3638 \| -2.3908 \| -2.0500 \|
	\| 0.0005 \| 0.36 \| 500 \| 0.0101 \| -6.4299 \| -20.5496 \| 0.9965 \| 14.1197 \| -239.3484 \| -131.6356 \| -1.0029 \| -0.6334 \|
	\| 0.0003 \| 0.44 \| 600 \| 0.0092 \| -9.0181 \| -23.0513 \| 0.9965 \| 14.0332 \| -264.3652 \| -157.5181 \| -1.6334 \| -1.1488 \|
	\| 0.0004 \| 0.51 \| 700 \| 0.0043 \| -5.7377 \| -21.3127 \| 0.9991 \| 15.5749 \| -246.9788 \| -124.7142 \| -0.8477 \| -0.4037 \|
	\| 0.0001 \| 0.58 \| 800 \| 0.0040 \| -8.9021 \| -23.9436 \| 0.9991 \| 15.0415 \| -273.2885 \| -156.3581 \| 0.2782 \| 0.8244 \|
	\| 0.0001 \| 0.66 \| 900 \| 0.0031 \| -9.3191 \| -24.3563 \| 0.9991 \| 15.0371 \| -277.4149 \| -160.5282 \| -0.7279 \| -0.2168 \|
	\| 0.002 \| 0.73 \| 1000 \| 0.0066 \| -6.8680 \| -23.5822 \| 0.9974 \| 16.7142 \| -269.6745 \| -136.0172 \| -0.6629 \| 0.2962 \|
	\| 0.0002 \| 0.8 \| 1100 \| 0.0015 \| -9.1417 \| -27.6276 \| 0.9991 \| 18.4859 \| -310.1280 \| -158.7536 \| -1.2030 \| -0.5215 \|
	\| 0.0823 \| 0.87 \| 1200 \| 0.0057 \| -4.4568 \| -18.4378 \| 0.9974 \| 13.9810 \| -218.2306 \| -111.9051 \| 0.2236 \| 0.7934 \|
	\| 0.0 \| 0.95 \| 1300 \| 0.0171 \| -8.1530 \| -25.5603 \| 0.9983 \| 17.4073 \| -289.4550 \| -148.8665 \| -1.2413 \| -0.9611 \|
	\| 0.0007 \| 1.02 \| 1400 \| 0.0019 \| -7.9402 \| -25.1905 \| 0.9983 \| 17.2503 \| -285.7569 \| -146.7384 \| -1.2325 \| -0.8924 \|
	\| 0.0002 \| 1.09 \| 1500 \| 0.0010 \| -8.1543 \| -25.2960 \| 0.9991 \| 17.1417 \| -286.8122 \| -148.8794 \| -1.0005 \| -0.6261 \|
	\| 0.0 \| 1.17 \| 1600 \| 0.0010 \| -8.4019 \| -25.6275 \| 0.9991 \| 17.2256 \| -290.1275 \| -151.3556 \| -1.0850 \| -0.7170 \|
	\| 0.0 \| 1.24 \| 1700 \| 0.0011 \| -8.8691 \| -26.2284 \| 0.9991 \| 17.3593 \| -296.1366 \| -156.0278 \| -1.1426 \| -0.7830 \|
	\| 0.0 \| 1.31 \| 1800 \| 0.0010 \| -9.2896 \| -26.9277 \| 0.9991 \| 17.6381 \| -303.1297 \| -160.2331 \| -1.1169 \| -0.7512 \|
	\| 0.0001 \| 1.39 \| 1900 \| 0.0011 \| -9.2869 \| -26.9301 \| 0.9991 \| 17.6432 \| -303.1532 \| -160.2053 \| -1.1213 \| -0.7560 \|
	\| 0.0 \| 1.46 \| 2000 \| 0.0008 \| -8.4453 \| -25.9094 \| 0.9991 \| 17.4641 \| -292.9459 \| -151.7894 \| -0.8854 \| -0.4791 \|
	\| 0.0 \| 1.53 \| 2100 \| 0.0007 \| -8.4600 \| -25.9284 \| 0.9991 \| 17.4684 \| -293.1361 \| -151.9364 \| -0.8893 \| -0.4835 \|
	\| 0.0 \| 1.6 \| 2200 \| 0.0000 \| -8.4501 \| -25.9071 \| 1.0 \| 17.4569 \| -292.9228 \| -151.8381 \| -0.8823 \| -0.4759 \|
	\| 0.0 \| 1.68 \| 2300 \| 0.0000 \| -8.4800 \| -25.9444 \| 1.0 \| 17.4644 \| -293.2967 \| -152.1372 \| -0.8982 \| -0.4964 \|
	\| 0.0 \| 1.75 \| 2400 \| 0.0000 \| -8.4864 \| -25.9459 \| 1.0 \| 17.4596 \| -293.3117 \| -152.2005 \| -0.9013 \| -0.4999 \|
	\| 0.0 \| 1.82 \| 2500 \| 0.0000 \| -8.4849 \| -25.9483 \| 1.0 \| 17.4633 \| -293.3352 \| -152.1862 \| -0.9014 \| -0.4994 \|


	### Framework versions

	- Transformers 4.33.2
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.5
	- Tokenizers 0.13.3