ryanzhangofficial
/

modernbert-llm-router

@@ -19,9 +19,9 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.4498
-- F1: 0.3440
-- Accuracy: 0.4433
 ## Model description
@@ -41,8 +41,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 32
-- eval_batch_size: 16
 - seed: 42
 - optimizer: Use OptimizerNames.SGD and the args are:
 No additional optimizer arguments
@@ -53,11 +53,11 @@ No additional optimizer arguments
 | Training Loss | Epoch | Step | Validation Loss | F1     | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:------:|:--------:|
-| 1.6817        | 1.0   | 466  | 1.6321          | 0.2854 | 0.2797   |
-| 1.5403        | 2.0   | 932  | 1.5267          | 0.3303 | 0.3708   |
-| 1.4888        | 3.0   | 1398 | 1.4770          | 0.3452 | 0.4213   |
-| 1.4701        | 4.0   | 1864 | 1.4553          | 0.3449 | 0.4401   |
-| 1.454         | 5.0   | 2330 | 1.4498          | 0.3440 | 0.4433   |
 ### Framework versions

 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1575
+- F1: 0.5932
+- Accuracy: 0.6911
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 16
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Use OptimizerNames.SGD and the args are:
 No additional optimizer arguments
 | Training Loss | Epoch | Step | Validation Loss | F1     | Accuracy |
 |:-------------:|:-----:|:----:|:---------------:|:------:|:--------:|
+| 1.4075        | 1.0   | 164  | 1.2845          | 0.5763 | 0.6284   |
+| 1.2963        | 2.0   | 328  | 1.2212          | 0.5962 | 0.6743   |
+| 1.2658        | 3.0   | 492  | 1.1842          | 0.6031 | 0.6896   |
+| 1.2241        | 4.0   | 656  | 1.1646          | 0.5929 | 0.6896   |
+| 1.2063        | 5.0   | 820  | 1.1575          | 0.5932 | 0.6911   |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:803d867e52c722f3b151cca1ae59b9c5ff187d66d10f7f93f2c93261cb095a9e
 size 598449012

 version https://git-lfs.github.com/spec/v1
+oid sha256:5fe4c91582e71dc34110b6bde9b582443b49db0f5ce6c93084ed5f8cfffbde61
 size 598449012

tokenizer.json CHANGED Viewed

@@ -2,13 +2,13 @@
   "version": "1.0",
   "truncation": {
     "direction": "Right",
-    "max_length": 512,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
-      "Fixed": 512
     },
     "direction": "Right",
     "pad_to_multiple_of": null,

   "version": "1.0",
   "truncation": {
     "direction": "Right",
+    "max_length": 1024,
     "strategy": "LongestFirst",
     "stride": 0
   },
   "padding": {
     "strategy": {
+      "Fixed": 1024
     },
     "direction": "Right",
     "pad_to_multiple_of": null,

tokenizer_config.json CHANGED Viewed

@@ -937,7 +937,7 @@
     "input_ids",
     "attention_mask"
   ],
-  "model_max_length": 512,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
   "tokenizer_class": "PreTrainedTokenizerFast",

     "input_ids",
     "attention_mask"
   ],
+  "model_max_length": 1024,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
   "tokenizer_class": "PreTrainedTokenizerFast",