minpeter
/

tiny-ko-sft

@@ -7,6 +7,7 @@ tags:
 datasets:
 - lemon-mint/Korean-FineTome-100k
 - lemon-mint/smol-koreantalk
 model-index:
 - name: ko-tiny-exp
   results: []
@@ -31,6 +32,7 @@ datasets:
     message_property_mappings:
       role: role
       content: content
   - path: lemon-mint/smol-koreantalk
     type: chat_template
     split: train[:20%]
@@ -38,6 +40,15 @@ datasets:
     message_property_mappings:
       role: role
       content: content
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.05
@@ -50,12 +61,12 @@ save_steps: 200
 warmup_steps: 100
 eval_steps: 200
-sequence_len: 1024
 sample_packing: true
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
-micro_batch_size: 32
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
@@ -69,7 +80,7 @@ added_tokens_overrides:
   128002: "<|im_start|>"
 special_tokens:
-  bos_token: <|begin_of_text|>
   eos_token: <|im_end|>
   pad_token: <|im_end|>
@@ -80,7 +91,7 @@ resume_from_checkpoint:
 logging_steps: 1
 flash_attention: true
-num_epochs: 2
 weight_decay: 0.0
 ```
@@ -89,9 +100,9 @@ weight_decay: 0.0
 # ko-tiny-exp
-This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k and the lemon-mint/smol-koreantalk datasets.
 It achieves the following results on the evaluation set:
-- Loss: 3.6038
 ## Model description
@@ -111,24 +122,24 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 512
-- total_eval_batch_size: 128
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
-- training_steps: 102
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.5674        | 0.0193 | 1    | 3.6038          |
 ### Framework versions

 datasets:
 - lemon-mint/Korean-FineTome-100k
 - lemon-mint/smol-koreantalk
+- FreedomIntelligence/alpaca-gpt4-korean
 model-index:
 - name: ko-tiny-exp
   results: []
     message_property_mappings:
       role: role
       content: content
   - path: lemon-mint/smol-koreantalk
     type: chat_template
     split: train[:20%]
     message_property_mappings:
       role: role
       content: content
+  - path: FreedomIntelligence/alpaca-gpt4-korean
+    type: chat_template
+    split: train[:20%]
+    field_messages: conversations
+    message_property_mappings:
+      role: from
+      content: value
 dataset_prepared_path: last_run_prepared
 val_set_size: 0.05
 warmup_steps: 100
 eval_steps: 200
+sequence_len: 512
 sample_packing: true
 pad_to_sequence_len: true
 gradient_accumulation_steps: 4
+micro_batch_size: 56
 optimizer: paged_adamw_8bit
 lr_scheduler: cosine
   128002: "<|im_start|>"
 special_tokens:
+  bos_token: <|im_start|>
   eos_token: <|im_end|>
   pad_token: <|im_end|>
 logging_steps: 1
 flash_attention: true
+num_epochs: 4
 weight_decay: 0.0
 ```
 # ko-tiny-exp
+This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk and the FreedomIntelligence/alpaca-gpt4-korean datasets.
 It achieves the following results on the evaluation set:
+- Loss: 3.5174
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
+- train_batch_size: 56
+- eval_batch_size: 56
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 896
+- total_eval_batch_size: 224
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
+- training_steps: 112
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 3.5354        | 0.0351 | 1    | 3.5174          |
 ### Framework versions