minpeter commited on
Commit
8251d5e
·
verified ·
1 Parent(s): 79d4269

End of training

Browse files
Files changed (1) hide show
  1. README.md +29 -16
README.md CHANGED
@@ -111,21 +111,21 @@ save_steps: 200
111
  warmup_steps: 20
112
  eval_steps: 200
113
 
114
- sequence_len: 512
115
 
116
- # false for exp
117
  sample_packing: false
118
- # true for exp
119
  train_on_inputs: true
 
120
 
121
  pad_to_sequence_len: true
122
 
123
  gradient_accumulation_steps: 4
124
- micro_batch_size: 64
125
 
126
  optimizer: paged_adamw_8bit
127
  lr_scheduler: cosine
128
- learning_rate: 2e-5
129
 
130
  bf16: auto
131
  tf32: false
@@ -157,7 +157,7 @@ weight_decay: 0.0
157
 
158
  This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
159
  It achieves the following results on the evaluation set:
160
- - Loss: 2.1993
161
 
162
  ## Model description
163
 
@@ -176,31 +176,44 @@ More information needed
176
  ### Training hyperparameters
177
 
178
  The following hyperparameters were used during training:
179
- - learning_rate: 2e-05
180
- - train_batch_size: 64
181
- - eval_batch_size: 64
182
  - seed: 42
183
  - distributed_type: multi-GPU
184
  - num_devices: 2
185
  - gradient_accumulation_steps: 4
186
- - total_train_batch_size: 512
187
- - total_eval_batch_size: 128
188
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
189
  - lr_scheduler_type: cosine
190
  - lr_scheduler_warmup_steps: 20
191
- - training_steps: 387
192
 
193
  ### Training results
194
 
195
  | Training Loss | Epoch | Step | Validation Loss |
196
  |:-------------:|:------:|:----:|:---------------:|
197
- | 4.2885 | 0.0078 | 1 | 4.3118 |
198
- | 2.1552 | 1.5504 | 200 | 2.1993 |
 
 
 
 
 
 
 
 
 
 
 
 
 
199
 
200
 
201
  ### Framework versions
202
 
203
- - Transformers 4.51.3
204
  - Pytorch 2.6.0+cu124
205
- - Datasets 3.5.1
206
  - Tokenizers 0.21.1
 
111
  warmup_steps: 20
112
  eval_steps: 200
113
 
114
+ sequence_len: 2048
115
 
116
+ # <<<< experimental settings <<<<
117
  sample_packing: false
 
118
  train_on_inputs: true
119
+ # >>>> experimental settings >>>
120
 
121
  pad_to_sequence_len: true
122
 
123
  gradient_accumulation_steps: 4
124
+ micro_batch_size: 16
125
 
126
  optimizer: paged_adamw_8bit
127
  lr_scheduler: cosine
128
+ learning_rate: 1e-3
129
 
130
  bf16: auto
131
  tf32: false
 
157
 
158
  This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
159
  It achieves the following results on the evaluation set:
160
+ - Loss: 1.5699
161
 
162
  ## Model description
163
 
 
176
  ### Training hyperparameters
177
 
178
  The following hyperparameters were used during training:
179
+ - learning_rate: 0.001
180
+ - train_batch_size: 16
181
+ - eval_batch_size: 16
182
  - seed: 42
183
  - distributed_type: multi-GPU
184
  - num_devices: 2
185
  - gradient_accumulation_steps: 4
186
+ - total_train_batch_size: 128
187
+ - total_eval_batch_size: 32
188
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
189
  - lr_scheduler_type: cosine
190
  - lr_scheduler_warmup_steps: 20
191
+ - training_steps: 2972
192
 
193
  ### Training results
194
 
195
  | Training Loss | Epoch | Step | Validation Loss |
196
  |:-------------:|:------:|:----:|:---------------:|
197
+ | 2.8061 | 0.0010 | 1 | 2.8887 |
198
+ | 1.9625 | 0.2019 | 200 | 1.9494 |
199
+ | 1.8455 | 0.4037 | 400 | 1.8601 |
200
+ | 1.7395 | 0.6056 | 600 | 1.8045 |
201
+ | 1.7769 | 0.8075 | 800 | 1.7490 |
202
+ | 1.5135 | 1.0091 | 1000 | 1.7116 |
203
+ | 1.5928 | 1.2110 | 1200 | 1.6860 |
204
+ | 1.5322 | 1.4128 | 1400 | 1.6517 |
205
+ | 1.4939 | 1.6147 | 1600 | 1.6218 |
206
+ | 1.4406 | 1.8166 | 1800 | 1.5939 |
207
+ | 1.3999 | 2.0182 | 2000 | 1.5841 |
208
+ | 1.3449 | 2.2200 | 2200 | 1.5770 |
209
+ | 1.2352 | 2.4219 | 2400 | 1.5723 |
210
+ | 1.3043 | 2.6238 | 2600 | 1.5702 |
211
+ | 1.3467 | 2.8256 | 2800 | 1.5699 |
212
 
213
 
214
  ### Framework versions
215
 
216
+ - Transformers 4.52.3
217
  - Pytorch 2.6.0+cu124
218
+ - Datasets 3.6.0
219
  - Tokenizers 0.21.1