harun27 commited on
Commit
aaa64e1
·
verified ·
1 Parent(s): 75f4900

Training in progress, epoch 1

Browse files
Files changed (5) hide show
  1. README.md +12 -11
  2. config.json +1 -1
  3. model.safetensors +1 -1
  4. tokenizer_config.json +1 -1
  5. training_args.bin +2 -2
README.md CHANGED
@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.1836
20
- - Classification Report: {'0': {'precision': 0.9434650455927052, 'recall': 0.9748743718592965, 'f1-score': 0.9589125733704047, 'support': 1592.0}, '1': {'precision': 0.8067632850241546, 'recall': 0.6423076923076924, 'f1-score': 0.715203426124197, 'support': 260.0}, 'accuracy': 0.9281857451403888, 'macro avg': {'precision': 0.8751141653084299, 'recall': 0.8085910320834944, 'f1-score': 0.8370579997473009, 'support': 1852.0}, 'weighted avg': {'precision': 0.9242736537202305, 'recall': 0.9281857451403888, 'f1-score': 0.9246985462192091, 'support': 1852.0}}
21
 
22
  ## Model description
23
 
@@ -37,27 +37,28 @@ More information needed
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 5e-06
40
- - train_batch_size: 64
41
- - eval_batch_size: 64
42
  - seed: 42
 
 
 
 
43
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
44
  - lr_scheduler_type: linear
45
- - num_epochs: 5
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss | Classification Report |
50
  |:-------------:|:-----:|:----:|:---------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
51
- | No log | 1.0 | 98 | 0.2189 | {'0': {'precision': 0.9206631142687981, 'recall': 0.9767587939698492, 'f1-score': 0.9478817433709235, 'support': 1592.0}, '1': {'precision': 0.7730061349693251, 'recall': 0.4846153846153846, 'f1-score': 0.5957446808510638, 'support': 260.0}, 'accuracy': 0.9076673866090713, 'macro avg': {'precision': 0.8468346246190617, 'recall': 0.7306870892926169, 'f1-score': 0.7718132121109936, 'support': 1852.0}, 'weighted avg': {'precision': 0.8999337327256756, 'recall': 0.9076673866090713, 'f1-score': 0.8984456546802305, 'support': 1852.0}} |
52
- | No log | 2.0 | 196 | 0.2076 | {'0': {'precision': 0.9115606936416185, 'recall': 0.9905778894472361, 'f1-score': 0.9494280553883203, 'support': 1592.0}, '1': {'precision': 0.8770491803278688, 'recall': 0.4115384615384615, 'f1-score': 0.5602094240837696, 'support': 260.0}, 'accuracy': 0.9092872570194385, 'macro avg': {'precision': 0.8943049369847437, 'recall': 0.7010581754928489, 'f1-score': 0.754818739736045, 'support': 1852.0}, 'weighted avg': {'precision': 0.9067156647746775, 'recall': 0.9092872570194385, 'f1-score': 0.8947861309071199, 'support': 1852.0}} |
53
- | No log | 3.0 | 294 | 0.1875 | {'0': {'precision': 0.9410692588092345, 'recall': 0.9729899497487438, 'f1-score': 0.9567634342186535, 'support': 1592.0}, '1': {'precision': 0.7912621359223301, 'recall': 0.6269230769230769, 'f1-score': 0.6995708154506438, 'support': 260.0}, 'accuracy': 0.9244060475161987, 'macro avg': {'precision': 0.8661656973657823, 'recall': 0.7999565133359103, 'f1-score': 0.8281671248346487, 'support': 1852.0}, 'weighted avg': {'precision': 0.9200380212549175, 'recall': 0.9244060475161987, 'f1-score': 0.9206564791000345, 'support': 1852.0}} |
54
- | No log | 4.0 | 392 | 0.1924 | {'0': {'precision': 0.9565772669220945, 'recall': 0.9409547738693468, 'f1-score': 0.9487017099430018, 'support': 1592.0}, '1': {'precision': 0.6713286713286714, 'recall': 0.7384615384615385, 'f1-score': 0.7032967032967034, 'support': 260.0}, 'accuracy': 0.9125269978401728, 'macro avg': {'precision': 0.813952969125383, 'recall': 0.8397081561654427, 'f1-score': 0.8259992066198526, 'support': 1852.0}, 'weighted avg': {'precision': 0.9165315677567111, 'recall': 0.9125269978401728, 'f1-score': 0.9142496031784026, 'support': 1852.0}} |
55
- | No log | 5.0 | 490 | 0.1836 | {'0': {'precision': 0.9434650455927052, 'recall': 0.9748743718592965, 'f1-score': 0.9589125733704047, 'support': 1592.0}, '1': {'precision': 0.8067632850241546, 'recall': 0.6423076923076924, 'f1-score': 0.715203426124197, 'support': 260.0}, 'accuracy': 0.9281857451403888, 'macro avg': {'precision': 0.8751141653084299, 'recall': 0.8085910320834944, 'f1-score': 0.8370579997473009, 'support': 1852.0}, 'weighted avg': {'precision': 0.9242736537202305, 'recall': 0.9281857451403888, 'f1-score': 0.9246985462192091, 'support': 1852.0}} |
56
 
57
 
58
  ### Framework versions
59
 
60
- - Transformers 4.52.3
61
  - Pytorch 2.6.0+cu124
62
  - Datasets 3.5.0
63
  - Tokenizers 0.21.1
 
16
 
17
  This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.2290
20
+ - Classification Report: {'0': {'precision': 0.9048991354466859, 'recall': 0.9861809045226131, 'f1-score': 0.9437932070934776, 'support': 1592.0}, '1': {'precision': 0.811965811965812, 'recall': 0.36538461538461536, 'f1-score': 0.5039787798408488, 'support': 260.0}, 'accuracy': 0.8990280777537797, 'macro avg': {'precision': 0.858432473706249, 'recall': 0.6757827599536143, 'f1-score': 0.7238859934671632, 'support': 1852.0}, 'weighted avg': {'precision': 0.891852340573561, 'recall': 0.8990280777537797, 'f1-score': 0.8820482011076873, 'support': 1852.0}}
21
 
22
  ## Model description
23
 
 
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 5e-06
40
+ - train_batch_size: 22
41
+ - eval_batch_size: 22
42
  - seed: 42
43
+ - distributed_type: multi-GPU
44
+ - num_devices: 4
45
+ - total_train_batch_size: 88
46
+ - total_eval_batch_size: 88
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: linear
49
+ - num_epochs: 2
50
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Classification Report |
54
  |:-------------:|:-----:|:----:|:---------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
55
+ | No log | 1.0 | 71 | 0.2510 | {'0': {'precision': 0.8783185840707964, 'recall': 0.9974874371859297, 'f1-score': 0.9341176470588235, 'support': 1592.0}, '1': {'precision': 0.9090909090909091, 'recall': 0.15384615384615385, 'f1-score': 0.2631578947368421, 'support': 260.0}, 'accuracy': 0.8790496760259179, 'macro avg': {'precision': 0.8937047465808527, 'recall': 0.5756667955160417, 'f1-score': 0.5986377708978328, 'support': 1852.0}, 'weighted avg': {'precision': 0.8826386728965142, 'recall': 0.8790496760259179, 'f1-score': 0.839922433449906, 'support': 1852.0}} |
56
+ | No log | 2.0 | 142 | 0.2290 | {'0': {'precision': 0.9048991354466859, 'recall': 0.9861809045226131, 'f1-score': 0.9437932070934776, 'support': 1592.0}, '1': {'precision': 0.811965811965812, 'recall': 0.36538461538461536, 'f1-score': 0.5039787798408488, 'support': 260.0}, 'accuracy': 0.8990280777537797, 'macro avg': {'precision': 0.858432473706249, 'recall': 0.6757827599536143, 'f1-score': 0.7238859934671632, 'support': 1852.0}, 'weighted avg': {'precision': 0.891852340573561, 'recall': 0.8990280777537797, 'f1-score': 0.8820482011076873, 'support': 1852.0}} |
 
 
 
57
 
58
 
59
  ### Framework versions
60
 
61
+ - Transformers 4.53.1
62
  - Pytorch 2.6.0+cu124
63
  - Datasets 3.5.0
64
  - Tokenizers 0.21.1
config.json CHANGED
@@ -41,6 +41,6 @@
41
  "sparse_pred_ignore_index": -100,
42
  "sparse_prediction": false,
43
  "torch_dtype": "float32",
44
- "transformers_version": "4.52.3",
45
  "vocab_size": 50368
46
  }
 
41
  "sparse_pred_ignore_index": -100,
42
  "sparse_prediction": false,
43
  "torch_dtype": "float32",
44
+ "transformers_version": "4.53.1",
45
  "vocab_size": 50368
46
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8e11275e932cf1148cdc3be90c594c5864b45c095ec1c39e6a4debcc9d14bdcd
3
  size 1583351632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d9a1a52d08e3671e0e666f294082dbf7f3691c8d5118f26f86050e1a0a66188
3
  size 1583351632
tokenizer_config.json CHANGED
@@ -941,6 +941,6 @@
941
  "model_max_length": 8192,
942
  "pad_token": "[PAD]",
943
  "sep_token": "[SEP]",
944
- "tokenizer_class": "PreTrainedTokenizer",
945
  "unk_token": "[UNK]"
946
  }
 
941
  "model_max_length": 8192,
942
  "pad_token": "[PAD]",
943
  "sep_token": "[SEP]",
944
+ "tokenizer_class": "PreTrainedTokenizerFast",
945
  "unk_token": "[UNK]"
946
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:838dec00c6aad5d5fb35d949ec98cc7898efe3412e0591b0e5ff4c1f75bd2a1c
3
- size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfe6221e093968623f6fb49acb97218e0318cb9e8a9689cf44fb28e975fce84b
3
+ size 5368