CocoRoF commited on
Commit
ddeca54
·
verified ·
1 Parent(s): 9260c4b

cc-100_0-2 Done

Browse files
Files changed (1) hide show
  1. README.md +27 -10
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
- base_model: answerdotai/ModernBERT-base
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -14,7 +14,9 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # KoModernBERT
16
 
17
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 
 
18
 
19
  ## Model description
20
 
@@ -33,24 +35,39 @@ More information needed
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
- - learning_rate: 0.0002
37
  - train_batch_size: 8
38
  - eval_batch_size: 8
39
  - seed: 42
40
- - gradient_accumulation_steps: 2
41
- - total_train_batch_size: 16
 
 
 
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
  - lr_scheduler_type: linear
44
- - lr_scheduler_warmup_steps: 7500
45
- - num_epochs: 1
46
 
47
  ### Training results
48
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
 
51
  ### Framework versions
52
 
53
- - Transformers 4.48.0.dev0
54
- - Pytorch 2.5.1+cu121
55
- - Datasets 3.1.0
56
  - Tokenizers 0.21.0
 
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
+ base_model: CocoRoF/KoModernBERT
5
  tags:
6
  - generated_from_trainer
7
  model-index:
 
14
 
15
  # KoModernBERT
16
 
17
+ This model is a fine-tuned version of [CocoRoF/KoModernBERT](https://huggingface.co/CocoRoF/KoModernBERT) on the None dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 2.3473
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 1e-05
39
  - train_batch_size: 8
40
  - eval_batch_size: 8
41
  - seed: 42
42
+ - distributed_type: multi-GPU
43
+ - num_devices: 8
44
+ - gradient_accumulation_steps: 8
45
+ - total_train_batch_size: 512
46
+ - total_eval_batch_size: 64
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: linear
49
+ - lr_scheduler_warmup_ratio: 0.1
50
+ - num_epochs: 1.0
51
 
52
  ### Training results
53
 
54
+ | Training Loss | Epoch | Step | Validation Loss |
55
+ |:-------------:|:------:|:-----:|:---------------:|
56
+ | 26.6178 | 0.0928 | 5000 | 3.3099 |
57
+ | 23.887 | 0.1856 | 10000 | 2.9665 |
58
+ | 22.3186 | 0.2784 | 15000 | 2.7910 |
59
+ | 21.6275 | 0.3711 | 20000 | 2.6757 |
60
+ | 20.7564 | 0.4639 | 25000 | 2.5967 |
61
+ | 20.0201 | 0.5567 | 30000 | 2.5263 |
62
+ | 19.7037 | 0.6495 | 35000 | 2.4709 |
63
+ | 19.2119 | 0.7423 | 40000 | 2.4196 |
64
+ | 19.053 | 0.8351 | 45000 | 2.3825 |
65
+ | 18.7262 | 0.9279 | 50000 | 2.3473 |
66
 
67
 
68
  ### Framework versions
69
 
70
+ - Transformers 4.48.1
71
+ - Pytorch 2.5.1+cu124
72
+ - Datasets 3.2.0
73
  - Tokenizers 0.21.0