minpeter
/

tiny-ko-sft

@@ -32,7 +32,7 @@ chat_template: chatml
 datasets:
   - path: lemon-mint/Korean-FineTome-100k
     type: chat_template
-    split: train[:20%]
     field_messages: messages
     message_property_mappings:
       role: role
@@ -40,7 +40,7 @@ datasets:
   - path: lemon-mint/smol-koreantalk
     type: chat_template
-    split: train[:20%]
     field_messages: messages
     message_property_mappings:
       role: role
@@ -48,7 +48,7 @@ datasets:
   - path: heegyu/open-korean-instructions-v20231020
     type: chat_template
-    split: train[:20%]
     field_messages: conversations
     message_property_mappings:
       role: from
@@ -61,7 +61,7 @@ datasets:
   # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
   - path: FreedomIntelligence/evol-instruct-korean
     type: chat_template
-    split: train[:20%]
     field_messages: conversations
     message_property_mappings:
       role: from
@@ -69,7 +69,7 @@ datasets:
   - path: FreedomIntelligence/alpaca-gpt4-korean
     type: chat_template
-    split: train[:20%]
     field_messages: conversations
     message_property_mappings:
       role: from
@@ -77,7 +77,7 @@ datasets:
   - path: FreedomIntelligence/sharegpt-korean
     type: chat_template
-    split: train[:20%]
     field_messages: conversations
     message_property_mappings:
       role: from
@@ -85,7 +85,7 @@ datasets:
   - path: coastral/korean-writing-style-instruct
     type: chat_template
-    split: train[:20%]
     field_messages: conversations
     message_property_mappings:
       role: from
@@ -93,7 +93,7 @@ datasets:
   - path: devngho/korean-instruction-mix
     type: chat_template
-    split: train[:20%]
     field_messages: messages
     message_property_mappings:
       role: from
@@ -108,15 +108,20 @@ wandb_project: "axolotl"
 wandb_entity: "kasfiekfs-e"
 save_steps: 200
-warmup_steps: 100
 eval_steps: 200
 sequence_len: 512
-sample_packing: true
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
-micro_batch_size: 56
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
@@ -152,7 +157,7 @@ weight_decay: 0.0
 This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
 It achieves the following results on the evaluation set:
-- Loss: 2.0944
 ## Model description
@@ -172,25 +177,25 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 56
-- eval_batch_size: 56
 - seed: 42
 - distributed_type: multi-GPU
-- num_devices: 4
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 896
-- total_eval_batch_size: 224
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 100
-- training_steps: 264
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.362         | 0.0114 | 1    | 3.3719          |
-| 2.1121        | 2.2727 | 200  | 2.0944          |
 ### Framework versions

 datasets:
   - path: lemon-mint/Korean-FineTome-100k
     type: chat_template
+    split: train[:10%]
     field_messages: messages
     message_property_mappings:
       role: role
   - path: lemon-mint/smol-koreantalk
     type: chat_template
+    split: train[:10%]
     field_messages: messages
     message_property_mappings:
       role: role
   - path: heegyu/open-korean-instructions-v20231020
     type: chat_template
+    split: train[:10%]
     field_messages: conversations
     message_property_mappings:
       role: from
   # NOTE: https://github.com/FreedomIntelligence/MultilingualSIFT
   - path: FreedomIntelligence/evol-instruct-korean
     type: chat_template
+    split: train[:10%]
     field_messages: conversations
     message_property_mappings:
       role: from
   - path: FreedomIntelligence/alpaca-gpt4-korean
     type: chat_template
+    split: train[:10%]
     field_messages: conversations
     message_property_mappings:
       role: from
   - path: FreedomIntelligence/sharegpt-korean
     type: chat_template
+    split: train[:10%]
     field_messages: conversations
     message_property_mappings:
       role: from
   - path: coastral/korean-writing-style-instruct
     type: chat_template
+    split: train[:10%]
     field_messages: conversations
     message_property_mappings:
       role: from
   - path: devngho/korean-instruction-mix
     type: chat_template
+    split: train[:10%]
     field_messages: messages
     message_property_mappings:
       role: from
 wandb_entity: "kasfiekfs-e"
 save_steps: 200
+warmup_steps: 20
 eval_steps: 200
 sequence_len: 512
+# false for exp
+sample_packing: false
+# true for exp
+train_on_inputs: true
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
+micro_batch_size: 64
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
 This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk, the heegyu/open-korean-instructions-v20231020, the FreedomIntelligence/evol-instruct-korean, the FreedomIntelligence/alpaca-gpt4-korean, the FreedomIntelligence/sharegpt-korean, the coastral/korean-writing-style-instruct and the devngho/korean-instruction-mix datasets.
 It achieves the following results on the evaluation set:
+- Loss: 2.1993
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
+- train_batch_size: 64
+- eval_batch_size: 64
 - seed: 42
 - distributed_type: multi-GPU
+- num_devices: 2
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 512
+- total_eval_batch_size: 128
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 20
+- training_steps: 387
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 4.2885        | 0.0078 | 1    | 4.3118          |
+| 2.1552        | 1.5504 | 200  | 2.1993          |
 ### Framework versions