Training in progress, epoch 1

Browse files

Files changed (5) hide show

README.md +12 -11
config.json +1 -1
model.safetensors +1 -1
tokenizer_config.json +1 -1
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1836
-- Classification Report: {'0': {'precision': 0.9434650455927052, 'recall': 0.9748743718592965, 'f1-score': 0.9589125733704047, 'support': 1592.0}, '1': {'precision': 0.8067632850241546, 'recall': 0.6423076923076924, 'f1-score': 0.715203426124197, 'support': 260.0}, 'accuracy': 0.9281857451403888, 'macro avg': {'precision': 0.8751141653084299, 'recall': 0.8085910320834944, 'f1-score': 0.8370579997473009, 'support': 1852.0}, 'weighted avg': {'precision': 0.9242736537202305, 'recall': 0.9281857451403888, 'f1-score': 0.9246985462192091, 'support': 1852.0}}
 ## Model description
@@ -37,27 +37,28 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-06
-- train_batch_size: 64
-- eval_batch_size: 64
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
-- num_epochs: 5
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Classification Report                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 |:-------------:|:-----:|:----:|:---------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
-| No log        | 1.0   | 98   | 0.2189          | {'0': {'precision': 0.9206631142687981, 'recall': 0.9767587939698492, 'f1-score': 0.9478817433709235, 'support': 1592.0}, '1': {'precision': 0.7730061349693251, 'recall': 0.4846153846153846, 'f1-score': 0.5957446808510638, 'support': 260.0}, 'accuracy': 0.9076673866090713, 'macro avg': {'precision': 0.8468346246190617, 'recall': 0.7306870892926169, 'f1-score': 0.7718132121109936, 'support': 1852.0}, 'weighted avg': {'precision': 0.8999337327256756, 'recall': 0.9076673866090713, 'f1-score': 0.8984456546802305, 'support': 1852.0}} |
-| No log        | 2.0   | 196  | 0.2076          | {'0': {'precision': 0.9115606936416185, 'recall': 0.9905778894472361, 'f1-score': 0.9494280553883203, 'support': 1592.0}, '1': {'precision': 0.8770491803278688, 'recall': 0.4115384615384615, 'f1-score': 0.5602094240837696, 'support': 260.0}, 'accuracy': 0.9092872570194385, 'macro avg': {'precision': 0.8943049369847437, 'recall': 0.7010581754928489, 'f1-score': 0.754818739736045, 'support': 1852.0}, 'weighted avg': {'precision': 0.9067156647746775, 'recall': 0.9092872570194385, 'f1-score': 0.8947861309071199, 'support': 1852.0}}  |
-| No log        | 3.0   | 294  | 0.1875          | {'0': {'precision': 0.9410692588092345, 'recall': 0.9729899497487438, 'f1-score': 0.9567634342186535, 'support': 1592.0}, '1': {'precision': 0.7912621359223301, 'recall': 0.6269230769230769, 'f1-score': 0.6995708154506438, 'support': 260.0}, 'accuracy': 0.9244060475161987, 'macro avg': {'precision': 0.8661656973657823, 'recall': 0.7999565133359103, 'f1-score': 0.8281671248346487, 'support': 1852.0}, 'weighted avg': {'precision': 0.9200380212549175, 'recall': 0.9244060475161987, 'f1-score': 0.9206564791000345, 'support': 1852.0}} |
-| No log        | 4.0   | 392  | 0.1924          | {'0': {'precision': 0.9565772669220945, 'recall': 0.9409547738693468, 'f1-score': 0.9487017099430018, 'support': 1592.0}, '1': {'precision': 0.6713286713286714, 'recall': 0.7384615384615385, 'f1-score': 0.7032967032967034, 'support': 260.0}, 'accuracy': 0.9125269978401728, 'macro avg': {'precision': 0.813952969125383, 'recall': 0.8397081561654427, 'f1-score': 0.8259992066198526, 'support': 1852.0}, 'weighted avg': {'precision': 0.9165315677567111, 'recall': 0.9125269978401728, 'f1-score': 0.9142496031784026, 'support': 1852.0}}  |
-| No log        | 5.0   | 490  | 0.1836          | {'0': {'precision': 0.9434650455927052, 'recall': 0.9748743718592965, 'f1-score': 0.9589125733704047, 'support': 1592.0}, '1': {'precision': 0.8067632850241546, 'recall': 0.6423076923076924, 'f1-score': 0.715203426124197, 'support': 260.0}, 'accuracy': 0.9281857451403888, 'macro avg': {'precision': 0.8751141653084299, 'recall': 0.8085910320834944, 'f1-score': 0.8370579997473009, 'support': 1852.0}, 'weighted avg': {'precision': 0.9242736537202305, 'recall': 0.9281857451403888, 'f1-score': 0.9246985462192091, 'support': 1852.0}}  |
 ### Framework versions
-- Transformers 4.52.3
 - Pytorch 2.6.0+cu124
 - Datasets 3.5.0
 - Tokenizers 0.21.1

 This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2290
+- Classification Report: {'0': {'precision': 0.9048991354466859, 'recall': 0.9861809045226131, 'f1-score': 0.9437932070934776, 'support': 1592.0}, '1': {'precision': 0.811965811965812, 'recall': 0.36538461538461536, 'f1-score': 0.5039787798408488, 'support': 260.0}, 'accuracy': 0.8990280777537797, 'macro avg': {'precision': 0.858432473706249, 'recall': 0.6757827599536143, 'f1-score': 0.7238859934671632, 'support': 1852.0}, 'weighted avg': {'precision': 0.891852340573561, 'recall': 0.8990280777537797, 'f1-score': 0.8820482011076873, 'support': 1852.0}}
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-06
+- train_batch_size: 22
+- eval_batch_size: 22
 - seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- total_train_batch_size: 88
+- total_eval_batch_size: 88
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
+- num_epochs: 2
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Classification Report                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 |:-------------:|:-----:|:----:|:---------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
+| No log        | 1.0   | 71   | 0.2510          | {'0': {'precision': 0.8783185840707964, 'recall': 0.9974874371859297, 'f1-score': 0.9341176470588235, 'support': 1592.0}, '1': {'precision': 0.9090909090909091, 'recall': 0.15384615384615385, 'f1-score': 0.2631578947368421, 'support': 260.0}, 'accuracy': 0.8790496760259179, 'macro avg': {'precision': 0.8937047465808527, 'recall': 0.5756667955160417, 'f1-score': 0.5986377708978328, 'support': 1852.0}, 'weighted avg': {'precision': 0.8826386728965142, 'recall': 0.8790496760259179, 'f1-score': 0.839922433449906, 'support': 1852.0}} |
+| No log        | 2.0   | 142  | 0.2290          | {'0': {'precision': 0.9048991354466859, 'recall': 0.9861809045226131, 'f1-score': 0.9437932070934776, 'support': 1592.0}, '1': {'precision': 0.811965811965812, 'recall': 0.36538461538461536, 'f1-score': 0.5039787798408488, 'support': 260.0}, 'accuracy': 0.8990280777537797, 'macro avg': {'precision': 0.858432473706249, 'recall': 0.6757827599536143, 'f1-score': 0.7238859934671632, 'support': 1852.0}, 'weighted avg': {'precision': 0.891852340573561, 'recall': 0.8990280777537797, 'f1-score': 0.8820482011076873, 'support': 1852.0}}   |
 ### Framework versions
+- Transformers 4.53.1
 - Pytorch 2.6.0+cu124
 - Datasets 3.5.0
 - Tokenizers 0.21.1

config.json CHANGED Viewed

@@ -41,6 +41,6 @@
   "sparse_pred_ignore_index": -100,
   "sparse_prediction": false,
   "torch_dtype": "float32",
-  "transformers_version": "4.52.3",
   "vocab_size": 50368
 }

   "sparse_pred_ignore_index": -100,
   "sparse_prediction": false,
   "torch_dtype": "float32",
+  "transformers_version": "4.53.1",
   "vocab_size": 50368
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8e11275e932cf1148cdc3be90c594c5864b45c095ec1c39e6a4debcc9d14bdcd
 size 1583351632

 version https://git-lfs.github.com/spec/v1
+oid sha256:7d9a1a52d08e3671e0e666f294082dbf7f3691c8d5118f26f86050e1a0a66188
 size 1583351632

tokenizer_config.json CHANGED Viewed

@@ -941,6 +941,6 @@
   "model_max_length": 8192,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
-  "tokenizer_class": "PreTrainedTokenizer",
   "unk_token": "[UNK]"
 }

   "model_max_length": 8192,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
+  "tokenizer_class": "PreTrainedTokenizerFast",
   "unk_token": "[UNK]"
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:838dec00c6aad5d5fb35d949ec98cc7898efe3412e0591b0e5ff4c1f75bd2a1c
-size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:dfe6221e093968623f6fb49acb97218e0318cb9e8a9689cf44fb28e975fce84b
+size 5368