minpeter commited on
Commit
ca09cf7
·
verified ·
1 Parent(s): 95caa39

End of training

Browse files
Files changed (1) hide show
  1. README.md +26 -21
README.md CHANGED
@@ -32,7 +32,7 @@ chat_template: chatml
32
  datasets:
33
  - path: lemon-mint/Korean-FineTome-100k
34
  type: chat_template
35
- split: train[:20%]
36
  field_messages: messages
37
  message_property_mappings:
38
  role: role
@@ -40,7 +40,7 @@ datasets:
40
 
41
  - path: lemon-mint/smol-koreantalk
42
  type: chat_template
43
- split: train[:20%]
44
  field_messages: messages
45
  message_property_mappings:
46
  role: role
@@ -48,7 +48,7 @@ datasets:
48
 
49
  - path: heegyu/open-korean-instructions-v20231020
50
  type: chat_template
51
- split: train[:20%]
52
  field_messages: conversations
53
  message_property_mappings:
54
  role: from
@@ -61,7 +61,7 @@ datasets:
61
  # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
62
  - path: FreedomIntelligence/evol-instruct-korean
63
  type: chat_template
64
- split: train[:20%]
65
  field_messages: conversations
66
  message_property_mappings:
67
  role: from
@@ -69,7 +69,7 @@ datasets:
69
 
70
  - path: FreedomIntelligence/alpaca-gpt4-korean
71
  type: chat_template
72
- split: train[:20%]
73
  field_messages: conversations
74
  message_property_mappings:
75
  role: from
@@ -77,7 +77,7 @@ datasets:
77
 
78
  - path: FreedomIntelligence/sharegpt-korean
79
  type: chat_template
80
- split: train[:20%]
81
  field_messages: conversations
82
  message_property_mappings:
83
  role: from
@@ -85,7 +85,7 @@ datasets:
85
 
86
  - path: coastral/korean-writing-style-instruct
87
  type: chat_template
88
- split: train[:20%]
89
  field_messages: conversations
90
  message_property_mappings:
91
  role: from
@@ -93,7 +93,7 @@ datasets:
93
 
94
  - path: devngho/korean-instruction-mix
95
  type: chat_template
96
- split: train[:20%]
97
  field_messages: messages
98
  message_property_mappings:
99
  role: from
@@ -108,15 +108,20 @@ wandb_project: "axolotl"
108
  wandb_entity: "kasfiekfs-e"
109
 
110
  save_steps: 200
111
- warmup_steps: 100
112
  eval_steps: 200
113
 
114
  sequence_len: 512
115
- sample_packing: true
 
 
 
 
 
116
  pad_to_sequence_len: true
117
 
118
  gradient_accumulation_steps: 4
119
- micro_batch_size: 56
120
 
121
  optimizer: paged_adamw_8bit
122
  lr_scheduler: cosine
@@ -152,7 +157,7 @@ weight_decay: 0.0
152
 
153
  This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
154
  It achieves the following results on the evaluation set:
155
- - Loss: 2.0944
156
 
157
  ## Model description
158
 
@@ -172,25 +177,25 @@ More information needed
172
 
173
  The following hyperparameters were used during training:
174
  - learning_rate: 2e-05
175
- - train_batch_size: 56
176
- - eval_batch_size: 56
177
  - seed: 42
178
  - distributed_type: multi-GPU
179
- - num_devices: 4
180
  - gradient_accumulation_steps: 4
181
- - total_train_batch_size: 896
182
- - total_eval_batch_size: 224
183
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
184
  - lr_scheduler_type: cosine
185
- - lr_scheduler_warmup_steps: 100
186
- - training_steps: 264
187
 
188
  ### Training results
189
 
190
  | Training Loss | Epoch | Step | Validation Loss |
191
  |:-------------:|:------:|:----:|:---------------:|
192
- | 3.362 | 0.0114 | 1 | 3.3719 |
193
- | 2.1121 | 2.2727 | 200 | 2.0944 |
194
 
195
 
196
  ### Framework versions
 
32
  datasets:
33
  - path: lemon-mint/Korean-FineTome-100k
34
  type: chat_template
35
+ split: train[:10%]
36
  field_messages: messages
37
  message_property_mappings:
38
  role: role
 
40
 
41
  - path: lemon-mint/smol-koreantalk
42
  type: chat_template
43
+ split: train[:10%]
44
  field_messages: messages
45
  message_property_mappings:
46
  role: role
 
48
 
49
  - path: heegyu/open-korean-instructions-v20231020
50
  type: chat_template
51
+ split: train[:10%]
52
  field_messages: conversations
53
  message_property_mappings:
54
  role: from
 
61
  # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
62
  - path: FreedomIntelligence/evol-instruct-korean
63
  type: chat_template
64
+ split: train[:10%]
65
  field_messages: conversations
66
  message_property_mappings:
67
  role: from
 
69
 
70
  - path: FreedomIntelligence/alpaca-gpt4-korean
71
  type: chat_template
72
+ split: train[:10%]
73
  field_messages: conversations
74
  message_property_mappings:
75
  role: from
 
77
 
78
  - path: FreedomIntelligence/sharegpt-korean
79
  type: chat_template
80
+ split: train[:10%]
81
  field_messages: conversations
82
  message_property_mappings:
83
  role: from
 
85
 
86
  - path: coastral/korean-writing-style-instruct
87
  type: chat_template
88
+ split: train[:10%]
89
  field_messages: conversations
90
  message_property_mappings:
91
  role: from
 
93
 
94
  - path: devngho/korean-instruction-mix
95
  type: chat_template
96
+ split: train[:10%]
97
  field_messages: messages
98
  message_property_mappings:
99
  role: from
 
108
  wandb_entity: "kasfiekfs-e"
109
 
110
  save_steps: 200
111
+ warmup_steps: 20
112
  eval_steps: 200
113
 
114
  sequence_len: 512
115
+
116
+ # false for exp
117
+ sample_packing: false
118
+ # true for exp
119
+ train_on_inputs: true
120
+
121
  pad_to_sequence_len: true
122
 
123
  gradient_accumulation_steps: 4
124
+ micro_batch_size: 64
125
 
126
  optimizer: paged_adamw_8bit
127
  lr_scheduler: cosine
 
157
 
158
  This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
159
  It achieves the following results on the evaluation set:
160
+ - Loss: 2.1993
161
 
162
  ## Model description
163
 
 
177
 
178
  The following hyperparameters were used during training:
179
  - learning_rate: 2e-05
180
+ - train_batch_size: 64
181
+ - eval_batch_size: 64
182
  - seed: 42
183
  - distributed_type: multi-GPU
184
+ - num_devices: 2
185
  - gradient_accumulation_steps: 4
186
+ - total_train_batch_size: 512
187
+ - total_eval_batch_size: 128
188
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
189
  - lr_scheduler_type: cosine
190
+ - lr_scheduler_warmup_steps: 20
191
+ - training_steps: 387
192
 
193
  ### Training results
194
 
195
  | Training Loss | Epoch | Step | Validation Loss |
196
  |:-------------:|:------:|:----:|:---------------:|
197
+ | 4.2885 | 0.0078 | 1 | 4.3118 |
198
+ | 2.1552 | 1.5504 | 200 | 2.1993 |
199
 
200
 
201
  ### Framework versions