llama3-8b-mypo3_sim-full-beta10.0-lr4e-7
This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 1.3724
- Rewards/chosen: 0.0743
- Rewards/rejected: -0.3491
- Rewards/accuracies: 0.7659
- Rewards/margins: 0.4234
- Logps/rejected: -1.5243
- Logps/chosen: -1.2637
- Logits/rejected: -1.0913
- Logits/chosen: -1.0660
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.3853 | 0.0523 | 100 | -1.0093 | -1.0407 | -1.2745 | -1.4996 | 1.3812 | 0.6448 | -0.0333 | 0.0685 | -0.1018 |
| 1.4035 | 0.1047 | 200 | -1.0131 | -1.0427 | -1.2751 | -1.5113 | 1.3794 | 0.7083 | -0.0391 | 0.1794 | -0.2185 |
| 1.3833 | 0.1570 | 300 | -1.0412 | -1.0703 | -1.2702 | -1.5149 | 1.3844 | 0.7063 | 0.0092 | 0.2639 | -0.2547 |
| 1.3934 | 0.2094 | 400 | -1.0425 | -1.0709 | -1.2629 | -1.5130 | 1.3933 | 0.7262 | 0.0824 | 0.3188 | -0.2364 |
| 1.4106 | 0.2617 | 500 | 1.4109 | 0.1926 | -0.1813 | 0.7202 | 0.3739 | -1.5075 | -1.2519 | -1.0638 | -1.0374 |
| 1.4054 | 0.3141 | 600 | 1.3984 | 0.0015 | -0.3629 | 0.7361 | 0.3644 | -1.5257 | -1.2710 | -1.0834 | -1.0564 |
| 1.3595 | 0.3664 | 700 | 1.3980 | -0.0178 | -0.4030 | 0.7282 | 0.3853 | -1.5297 | -1.2729 | -1.0677 | -1.0416 |
| 1.4312 | 0.4187 | 800 | 1.3940 | 0.0198 | -0.3635 | 0.7321 | 0.3833 | -1.5258 | -1.2692 | -1.0938 | -1.0670 |
| 1.3978 | 0.4711 | 900 | 1.3915 | 0.0711 | -0.3160 | 0.7440 | 0.3871 | -1.5210 | -1.2640 | -1.0902 | -1.0633 |
| 1.3815 | 0.5234 | 1000 | 1.3852 | 0.1178 | -0.2864 | 0.7520 | 0.4042 | -1.5180 | -1.2594 | -1.0909 | -1.0657 |
| 1.378 | 0.5758 | 1100 | 1.3877 | 0.1649 | -0.2299 | 0.7440 | 0.3947 | -1.5124 | -1.2547 | -1.0936 | -1.0682 |
| 1.3868 | 0.6281 | 1200 | 1.3771 | 0.0725 | -0.3378 | 0.7480 | 0.4103 | -1.5232 | -1.2639 | -1.0760 | -1.0512 |
| 1.3653 | 0.6805 | 1300 | 1.3791 | 0.0379 | -0.3772 | 0.7460 | 0.4152 | -1.5271 | -1.2674 | -1.0568 | -1.0335 |
| 1.3524 | 0.7328 | 1400 | 1.3824 | 0.1389 | -0.2862 | 0.7440 | 0.4250 | -1.5180 | -1.2573 | -1.0780 | -1.0533 |
| 1.3716 | 0.7851 | 1500 | 1.3744 | 0.0827 | -0.3438 | 0.7480 | 0.4265 | -1.5238 | -1.2629 | -1.0928 | -1.0670 |
| 1.3846 | 0.8375 | 1600 | 1.3734 | 0.0947 | -0.3299 | 0.7520 | 0.4246 | -1.5224 | -1.2617 | -1.0883 | -1.0631 |
| 1.3631 | 0.8898 | 1700 | 1.3721 | 0.0610 | -0.3667 | 0.7619 | 0.4277 | -1.5261 | -1.2650 | -1.0636 | -1.0404 |
| 1.3646 | 0.9422 | 1800 | 1.3719 | 0.0800 | -0.3489 | 0.7639 | 0.4289 | -1.5243 | -1.2631 | -1.0865 | -1.0616 |
| 1.3606 | 0.9945 | 1900 | 1.3722 | 0.0718 | -0.3524 | 0.7560 | 0.4241 | -1.5246 | -1.2640 | -1.0887 | -1.0636 |
Framework versions
- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7
Base model
princeton-nlp/Llama-3-Base-8B-SFT