minpeter commited on
Commit
2eb873e
·
verified ·
1 Parent(s): 6abd0b9

End of training

Browse files
Files changed (1) hide show
  1. README.md +16 -9
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - generated_from_trainer
7
  datasets:
8
  - lemon-mint/Korean-FineTome-100k
 
9
  model-index:
10
  - name: ko-tiny-exp
11
  results: []
@@ -30,6 +31,13 @@ datasets:
30
  message_property_mappings:
31
  role: role
32
  content: content
 
 
 
 
 
 
 
33
  dataset_prepared_path: last_run_prepared
34
  val_set_size: 0.05
35
 
@@ -45,7 +53,6 @@ pad_to_sequence_len: true
45
  gradient_accumulation_steps: 4
46
  micro_batch_size: 16
47
 
48
- num_epochs: 2
49
  optimizer: paged_adamw_8bit
50
  lr_scheduler: cosine
51
  learning_rate: 2e-5
@@ -61,6 +68,7 @@ logging_steps: 1
61
  flash_attention: true
62
 
63
  warmup_steps: 100
 
64
  evals_per_epoch: 2
65
  saves_per_epoch: 1
66
  weight_decay: 0.0
@@ -71,9 +79,9 @@ weight_decay: 0.0
71
 
72
  # ko-tiny-exp
73
 
74
- This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k dataset.
75
  It achieves the following results on the evaluation set:
76
- - Loss: 2.9471
77
 
78
  ## Model description
79
 
@@ -101,17 +109,16 @@ The following hyperparameters were used during training:
101
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
102
  - lr_scheduler_type: cosine
103
  - lr_scheduler_warmup_steps: 100
104
- - training_steps: 176
105
 
106
  ### Training results
107
 
108
  | Training Loss | Epoch | Step | Validation Loss |
109
  |:-------------:|:------:|:----:|:---------------:|
110
- | 3.0538 | 0.0113 | 1 | 3.0136 |
111
- | 2.9454 | 0.4972 | 44 | 3.0099 |
112
- | 2.9657 | 0.9944 | 88 | 2.9874 |
113
- | 3.0426 | 1.4859 | 132 | 2.9506 |
114
- | 2.9403 | 1.9831 | 176 | 2.9471 |
115
 
116
 
117
  ### Framework versions
 
6
  - generated_from_trainer
7
  datasets:
8
  - lemon-mint/Korean-FineTome-100k
9
+ - lemon-mint/smol-koreantalk
10
  model-index:
11
  - name: ko-tiny-exp
12
  results: []
 
31
  message_property_mappings:
32
  role: role
33
  content: content
34
+ - path: lemon-mint/smol-koreantalk
35
+ type: chat_template
36
+ split: train[:20%]
37
+ field_messages: messages
38
+ message_property_mappings:
39
+ role: role
40
+ content: content
41
  dataset_prepared_path: last_run_prepared
42
  val_set_size: 0.05
43
 
 
53
  gradient_accumulation_steps: 4
54
  micro_batch_size: 16
55
 
 
56
  optimizer: paged_adamw_8bit
57
  lr_scheduler: cosine
58
  learning_rate: 2e-5
 
68
  flash_attention: true
69
 
70
  warmup_steps: 100
71
+ num_epochs: 2
72
  evals_per_epoch: 2
73
  saves_per_epoch: 1
74
  weight_decay: 0.0
 
79
 
80
  # ko-tiny-exp
81
 
82
+ This model is a fine-tuned version of [minpeter/pretrained-tiny-ko](https://huggingface.co/minpeter/pretrained-tiny-ko) on the lemon-mint/Korean-FineTome-100k and the lemon-mint/smol-koreantalk datasets.
83
  It achieves the following results on the evaluation set:
84
+ - Loss: 2.8226
85
 
86
  ## Model description
87
 
 
109
  - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
110
  - lr_scheduler_type: cosine
111
  - lr_scheduler_warmup_steps: 100
112
+ - training_steps: 1498
113
 
114
  ### Training results
115
 
116
  | Training Loss | Epoch | Step | Validation Loss |
117
  |:-------------:|:------:|:----:|:---------------:|
118
+ | 3.0001 | 0.0013 | 1 | 2.9904 |
119
+ | 2.8288 | 0.5002 | 375 | 2.8669 |
120
+ | 2.8188 | 1.0 | 750 | 2.8255 |
121
+ | 2.8012 | 1.5002 | 1125 | 2.8226 |
 
122
 
123
 
124
  ### Framework versions