femT-data commited on
Commit
f24b8bd
·
verified ·
1 Parent(s): 7d71c19

End of training

Browse files
Files changed (2) hide show
  1. README.md +16 -13
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -51,10 +51,10 @@ lora_fan_in_fan_out:
51
 
52
  gradient_accumulation_steps: 4
53
  micro_batch_size: 2
54
- num_epochs: 1
55
  optimizer: adamw_bnb_8bit
56
  lr_scheduler: cosine
57
- learning_rate: 0.0002
58
 
59
  train_on_inputs: false
60
  group_by_length: false
@@ -71,7 +71,7 @@ xformers_attention:
71
  flash_attention: true
72
  s2_attention:
73
 
74
- warmup_steps: 10
75
  evals_per_epoch: 1
76
  eval_table_size:
77
  eval_max_new_tokens: 128
@@ -92,7 +92,7 @@ special_tokens:
92
 
93
  This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the None dataset.
94
  It achieves the following results on the evaluation set:
95
- - Loss: 0.0369
96
 
97
  ## Model description
98
 
@@ -111,25 +111,28 @@ More information needed
111
  ### Training hyperparameters
112
 
113
  The following hyperparameters were used during training:
114
- - learning_rate: 0.0002
115
  - train_batch_size: 2
116
  - eval_batch_size: 2
117
  - seed: 42
118
  - distributed_type: multi-GPU
119
- - num_devices: 4
120
  - gradient_accumulation_steps: 4
121
- - total_train_batch_size: 32
122
- - total_eval_batch_size: 8
123
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
124
  - lr_scheduler_type: cosine
125
- - lr_scheduler_warmup_steps: 10
126
- - num_epochs: 1
127
 
128
  ### Training results
129
 
130
- | Training Loss | Epoch | Step | Validation Loss |
131
- |:-------------:|:-----:|:----:|:---------------:|
132
- | 0.0366 | 1.0 | 297 | 0.0369 |
 
 
 
133
 
134
 
135
  ### Framework versions
 
51
 
52
  gradient_accumulation_steps: 4
53
  micro_batch_size: 2
54
+ num_epochs: 3
55
  optimizer: adamw_bnb_8bit
56
  lr_scheduler: cosine
57
+ learning_rate: 0.00002
58
 
59
  train_on_inputs: false
60
  group_by_length: false
 
71
  flash_attention: true
72
  s2_attention:
73
 
74
+ warmup_ratio: 0.04
75
  evals_per_epoch: 1
76
  eval_table_size:
77
  eval_max_new_tokens: 128
 
92
 
93
  This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the None dataset.
94
  It achieves the following results on the evaluation set:
95
+ - Loss: 0.0467
96
 
97
  ## Model description
98
 
 
111
  ### Training hyperparameters
112
 
113
  The following hyperparameters were used during training:
114
+ - learning_rate: 2e-05
115
  - train_batch_size: 2
116
  - eval_batch_size: 2
117
  - seed: 42
118
  - distributed_type: multi-GPU
119
+ - num_devices: 8
120
  - gradient_accumulation_steps: 4
121
+ - total_train_batch_size: 64
122
+ - total_eval_batch_size: 16
123
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
124
  - lr_scheduler_type: cosine
125
+ - lr_scheduler_warmup_steps: 17
126
+ - num_epochs: 3
127
 
128
  ### Training results
129
 
130
+ | Training Loss | Epoch | Step | Validation Loss |
131
+ |:-------------:|:------:|:----:|:---------------:|
132
+ | 0.3341 | 0.0067 | 1 | 0.3710 |
133
+ | 0.061 | 0.9966 | 148 | 0.0574 |
134
+ | 0.0413 | 1.9933 | 296 | 0.0476 |
135
+ | 0.0453 | 2.9899 | 444 | 0.0467 |
136
 
137
 
138
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:647907a67d19244734cc698d4946e0a5a8534f921aa7a9d5016ad091d705392b
3
  size 335706186
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbcbea02c7645350192e3d11f3dcec3f419bc7f070baa78cceb909228c19b9b4
3
  size 335706186