llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 1.3724
Rewards/chosen: 0.0743
Rewards/rejected: -0.3491
Rewards/accuracies: 0.7659
Rewards/margins: 0.4234
Logps/rejected: -1.5243
Logps/chosen: -1.2637
Logits/rejected: -1.0913
Logits/chosen: -1.0660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
1.3853	0.0523	100	-1.0093	-1.0407	-1.2745	-1.4996	1.3812	0.6448	-0.0333	0.0685	-0.1018
1.4035	0.1047	200	-1.0131	-1.0427	-1.2751	-1.5113	1.3794	0.7083	-0.0391	0.1794	-0.2185
1.3833	0.1570	300	-1.0412	-1.0703	-1.2702	-1.5149	1.3844	0.7063	0.0092	0.2639	-0.2547
1.3934	0.2094	400	-1.0425	-1.0709	-1.2629	-1.5130	1.3933	0.7262	0.0824	0.3188	-0.2364
1.4106	0.2617	500	1.4109	0.1926	-0.1813	0.7202	0.3739	-1.5075	-1.2519	-1.0638	-1.0374
1.4054	0.3141	600	1.3984	0.0015	-0.3629	0.7361	0.3644	-1.5257	-1.2710	-1.0834	-1.0564
1.3595	0.3664	700	1.3980	-0.0178	-0.4030	0.7282	0.3853	-1.5297	-1.2729	-1.0677	-1.0416
1.4312	0.4187	800	1.3940	0.0198	-0.3635	0.7321	0.3833	-1.5258	-1.2692	-1.0938	-1.0670
1.3978	0.4711	900	1.3915	0.0711	-0.3160	0.7440	0.3871	-1.5210	-1.2640	-1.0902	-1.0633
1.3815	0.5234	1000	1.3852	0.1178	-0.2864	0.7520	0.4042	-1.5180	-1.2594	-1.0909	-1.0657
1.378	0.5758	1100	1.3877	0.1649	-0.2299	0.7440	0.3947	-1.5124	-1.2547	-1.0936	-1.0682
1.3868	0.6281	1200	1.3771	0.0725	-0.3378	0.7480	0.4103	-1.5232	-1.2639	-1.0760	-1.0512
1.3653	0.6805	1300	1.3791	0.0379	-0.3772	0.7460	0.4152	-1.5271	-1.2674	-1.0568	-1.0335
1.3524	0.7328	1400	1.3824	0.1389	-0.2862	0.7440	0.4250	-1.5180	-1.2573	-1.0780	-1.0533
1.3716	0.7851	1500	1.3744	0.0827	-0.3438	0.7480	0.4265	-1.5238	-1.2629	-1.0928	-1.0670
1.3846	0.8375	1600	1.3734	0.0947	-0.3299	0.7520	0.4246	-1.5224	-1.2617	-1.0883	-1.0631
1.3631	0.8898	1700	1.3721	0.0610	-0.3667	0.7619	0.4277	-1.5261	-1.2650	-1.0636	-1.0404
1.3646	0.9422	1800	1.3719	0.0800	-0.3489	0.7639	0.4289	-1.5243	-1.2631	-1.0865	-1.0616
1.3606	0.9945	1900	1.3722	0.0718	-0.3524	0.7560	0.4241	-1.5246	-1.2640	-1.0887	-1.0636

Framework versions

Transformers 4.43.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Base model

princeton-nlp/Llama-3-Base-8B-SFT

Finetuned

(38)

this model

aaaalongaa
/

llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Dataset used to train aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Evaluation results