minpeter commited on
Commit
30a7914
·
verified ·
1 Parent(s): 206cd4d

End of training

Browse files
Files changed (1) hide show
  1. README.md +23 -12
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  datasets:
8
  - lemon-mint/Korean-FineTome-100k
9
  - lemon-mint/smol-koreantalk
 
10
  model-index:
11
  - name: ko-tiny-exp
12
  results: []
@@ -31,6 +32,7 @@ datasets:
31
  message_property_mappings:
32
  role: role
33
  content: content
 
34
  - path: lemon-mint/smol-koreantalk
35
  type: chat_template
36
  split: train[:20%]
@@ -38,6 +40,15 @@ datasets:
38
  message_property_mappings:
39
  role: role
40
  content: content
 
 
 
 
 
 
 
 
 
41
  dataset_prepared_path: last_run_prepared
42
  val_set_size: 0.05
43
 
@@ -50,12 +61,12 @@ save_steps: 200
50
  warmup_steps: 100
51
  eval_steps: 200
52
 
53
- sequence_len: 1024
54
  sample_packing: true
55
  pad_to_sequence_len: true
56
 
57
  gradient_accumulation_steps: 4
58
- micro_batch_size: 32
59
 
60
  optimizer: paged_adamw_8bit
61
  lr_scheduler: cosine
@@ -69,7 +80,7 @@ added_tokens_overrides:
69
  128002: "<|im_start|>"
70
 
71
  special_tokens:
72
- bos_token: <|begin_of_text|>
73
  eos_token: <|im_end|>
74
  pad_token: <|im_end|>
75
 
@@ -80,7 +91,7 @@ resume_from_checkpoint:
80
  logging_steps: 1
81
  flash_attention: true
82
 
83
- num_epochs: 2
84
  weight_decay: 0.0
85
 
86
  ```
@@ -89,9 +100,9 @@ weight_decay: 0.0
89
 
90
  # ko-tiny-exp
91
 
92
- This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k and the lemon-mint/smol-koreantalk datasets.
93
  It achieves the following results on the evaluation set:
94
- - Loss: 3.6038
95
 
96
  ## Model description
97
 
@@ -111,24 +122,24 @@ More information needed
111
 
112
  The following hyperparameters were used during training:
113
  - learning_rate: 2e-05
114
- - train_batch_size: 32
115
- - eval_batch_size: 32
116
  - seed: 42
117
  - distributed_type: multi-GPU
118
  - num_devices: 4
119
  - gradient_accumulation_steps: 4
120
- - total_train_batch_size: 512
121
- - total_eval_batch_size: 128
122
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
123
  - lr_scheduler_type: cosine
124
  - lr_scheduler_warmup_steps: 100
125
- - training_steps: 102
126
 
127
  ### Training results
128
 
129
  | Training Loss | Epoch | Step | Validation Loss |
130
  |:-------------:|:------:|:----:|:---------------:|
131
- | 3.5674 | 0.0193 | 1 | 3.6038 |
132
 
133
 
134
  ### Framework versions
 
7
  datasets:
8
  - lemon-mint/Korean-FineTome-100k
9
  - lemon-mint/smol-koreantalk
10
+ - FreedomIntelligence/alpaca-gpt4-korean
11
  model-index:
12
  - name: ko-tiny-exp
13
  results: []
 
32
  message_property_mappings:
33
  role: role
34
  content: content
35
+
36
  - path: lemon-mint/smol-koreantalk
37
  type: chat_template
38
  split: train[:20%]
 
40
  message_property_mappings:
41
  role: role
42
  content: content
43
+
44
+ - path: FreedomIntelligence/alpaca-gpt4-korean
45
+ type: chat_template
46
+ split: train[:20%]
47
+ field_messages: conversations
48
+ message_property_mappings:
49
+ role: from
50
+ content: value
51
+
52
  dataset_prepared_path: last_run_prepared
53
  val_set_size: 0.05
54
 
 
61
  warmup_steps: 100
62
  eval_steps: 200
63
 
64
+ sequence_len: 512
65
  sample_packing: true
66
  pad_to_sequence_len: true
67
 
68
  gradient_accumulation_steps: 4
69
+ micro_batch_size: 56
70
 
71
  optimizer: paged_adamw_8bit
72
  lr_scheduler: cosine
 
80
  128002: "<|im_start|>"
81
 
82
  special_tokens:
83
+ bos_token: <|im_start|>
84
  eos_token: <|im_end|>
85
  pad_token: <|im_end|>
86
 
 
91
  logging_steps: 1
92
  flash_attention: true
93
 
94
+ num_epochs: 4
95
  weight_decay: 0.0
96
 
97
  ```
 
100
 
101
  # ko-tiny-exp
102
 
103
+ This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k, the lemon-mint/smol-koreantalk and the FreedomIntelligence/alpaca-gpt4-korean datasets.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 3.5174
106
 
107
  ## Model description
108
 
 
122
 
123
  The following hyperparameters were used during training:
124
  - learning_rate: 2e-05
125
+ - train_batch_size: 56
126
+ - eval_batch_size: 56
127
  - seed: 42
128
  - distributed_type: multi-GPU
129
  - num_devices: 4
130
  - gradient_accumulation_steps: 4
131
+ - total_train_batch_size: 896
132
+ - total_eval_batch_size: 224
133
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
134
  - lr_scheduler_type: cosine
135
  - lr_scheduler_warmup_steps: 100
136
+ - training_steps: 112
137
 
138
  ### Training results
139
 
140
  | Training Loss | Epoch | Step | Validation Loss |
141
  |:-------------:|:------:|:----:|:---------------:|
142
+ | 3.5354 | 0.0351 | 1 | 3.5174 |
143
 
144
 
145
  ### Framework versions