thorirhrafn commited on
Commit
f388d56
·
verified ·
1 Parent(s): bfe5059

End of training

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.1762
22
- - Rewards/chosen: 0.3518
23
- - Rewards/rejected: -1.3902
24
  - Rewards/accuracies: 1.0
25
- - Rewards/margins: 1.7421
26
- - Logps/rejected: -198.1086
27
- - Logps/chosen: -155.2773
28
- - Logits/rejected: -1.0539
29
- - Logits/chosen: -0.8615
30
 
31
  ## Model description
32
 
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - learning_rate: 1e-06
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
@@ -53,13 +53,14 @@ The following hyperparameters were used during training:
53
  - total_train_batch_size: 8
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
- - num_epochs: 1
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.184 | 0.79 | 200 | 0.1762 | 0.3518 | -1.3902 | 1.0 | 1.7421 | -198.1086 | -155.2773 | -1.0539 | -0.8615 |
 
63
 
64
 
65
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.2506
22
+ - Rewards/chosen: 0.2764
23
+ - Rewards/rejected: -1.0388
24
  - Rewards/accuracies: 1.0
25
+ - Rewards/margins: 1.3152
26
+ - Logps/rejected: -194.5943
27
+ - Logps/chosen: -156.0318
28
+ - Logits/rejected: -1.0532
29
+ - Logits/chosen: -0.8577
30
 
31
  ## Model description
32
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - learning_rate: 5e-07
49
  - train_batch_size: 1
50
  - eval_batch_size: 1
51
  - seed: 42
 
53
  - total_train_batch_size: 8
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
+ - num_epochs: 2
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.3358 | 0.79 | 200 | 0.3244 | 0.2277 | -0.7696 | 1.0 | 0.9973 | -191.9022 | -156.5185 | -1.0547 | -0.8590 |
63
+ | 0.2428 | 1.59 | 400 | 0.2506 | 0.2764 | -1.0388 | 1.0 | 1.3152 | -194.5943 | -156.0318 | -1.0532 | -0.8577 |
64
 
65
 
66
  ### Framework versions