model_hh_shp3_400
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 3.1493
- Rewards/chosen: -4.1445
- Rewards/rejected: -6.0670
- Rewards/accuracies: 0.5400
- Rewards/margins: 1.9226
- Logps/rejected: -259.9296
- Logps/chosen: -239.0303
- Logits/rejected: -0.7892
- Logits/chosen: -0.7615
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
| Training Loss |
Epoch |
Step |
Validation Loss |
Rewards/chosen |
Rewards/rejected |
Rewards/accuracies |
Rewards/margins |
Logps/rejected |
Logps/chosen |
Logits/rejected |
Logits/chosen |
| 0.0216 |
4.0 |
100 |
2.2837 |
-3.0069 |
-3.4434 |
0.5400 |
0.4365 |
-257.0145 |
-237.7664 |
-0.7901 |
-0.8063 |
| 0.1476 |
8.0 |
200 |
3.5175 |
-3.9480 |
-5.1618 |
0.5300 |
1.2138 |
-258.9238 |
-238.8121 |
-0.8303 |
-0.8463 |
| 0.0108 |
12.0 |
300 |
3.2066 |
-1.3603 |
-2.7278 |
0.5600 |
1.3674 |
-256.2194 |
-235.9369 |
-0.7824 |
-0.7443 |
| 0.0 |
16.0 |
400 |
3.1558 |
-4.1573 |
-6.0643 |
0.5300 |
1.9070 |
-259.9266 |
-239.0446 |
-0.7891 |
-0.7612 |
| 0.0 |
20.0 |
500 |
3.1564 |
-4.1409 |
-6.0450 |
0.5400 |
1.9041 |
-259.9052 |
-239.0264 |
-0.7894 |
-0.7613 |
| 0.0 |
24.0 |
600 |
3.1533 |
-4.1925 |
-6.0561 |
0.5300 |
1.8636 |
-259.9174 |
-239.0837 |
-0.7890 |
-0.7614 |
| 0.0 |
28.0 |
700 |
3.1650 |
-4.1547 |
-6.0212 |
0.5300 |
1.8665 |
-259.8788 |
-239.0417 |
-0.7892 |
-0.7614 |
| 0.0 |
32.0 |
800 |
3.1593 |
-4.1704 |
-6.0572 |
0.5400 |
1.8868 |
-259.9187 |
-239.0591 |
-0.7891 |
-0.7619 |
| 0.0 |
36.0 |
900 |
3.1711 |
-4.1626 |
-6.0504 |
0.5400 |
1.8879 |
-259.9112 |
-239.0504 |
-0.7892 |
-0.7614 |
| 0.0 |
40.0 |
1000 |
3.1493 |
-4.1445 |
-6.0670 |
0.5400 |
1.9226 |
-259.9296 |
-239.0303 |
-0.7892 |
-0.7615 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2