DayCardoso
/

modernbert-base-multi-head-values-context

+---
+library_name: transformers
+license: apache-2.0
+base_model: answerdotai/ModernBERT-base
+tags:
+- generated_from_trainer
+model-index:
+- name: modernbert-base-multi-head-values-context
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# modernbert-base-multi-head-values-context
+This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.2990
+- Subset Accuracy: 0.2753
+- F1 Macro: 0.3032
+- F1 Micro: 0.3876
+- Precision Macro: 0.4109
+- Recall Macro: 0.2499
+- Roc Auc: 0.7910
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 2025
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.01
+- num_epochs: 33
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch   | Step  | Validation Loss | Subset Accuracy | F1 Macro | F1 Micro | Precision Macro | Recall Macro | Roc Auc |
+|:-------------:|:-------:|:-----:|:---------------:|:---------------:|:--------:|:--------:|:---------------:|:------------:|:-------:|
+| 2.5451        | 0.5002  | 767   | 0.2012          | 0.0027          | 0.0023   | 0.0050   | 0.0718          | 0.0012       | 0.6531  |
+| 1.5075        | 1.0     | 1534  | 0.1838          | 0.0768          | 0.0569   | 0.1319   | 0.2330          | 0.0353       | 0.7437  |
+| 1.4382        | 1.5002  | 2301  | 0.1781          | 0.1437          | 0.1318   | 0.2281   | 0.3534          | 0.0891       | 0.7792  |
+| 1.3858        | 2.0     | 3068  | 0.1710          | 0.1680          | 0.1582   | 0.2615   | 0.4338          | 0.1091       | 0.7962  |
+| 1.3157        | 2.5002  | 3835  | 0.1681          | 0.1822          | 0.1787   | 0.2796   | 0.4967          | 0.1267       | 0.8058  |
+| 1.291         | 3.0     | 4602  | 0.1622          | 0.2229          | 0.2115   | 0.3291   | 0.6302          | 0.1523       | 0.8195  |
+| 1.2388        | 3.5002  | 5369  | 0.1614          | 0.2026          | 0.2201   | 0.3082   | 0.6143          | 0.1536       | 0.8222  |
+| 1.1993        | 4.0     | 6136  | 0.1583          | 0.2445          | 0.2454   | 0.3554   | 0.5956          | 0.1783       | 0.8291  |
+| 1.1415        | 4.5002  | 6903  | 0.1608          | 0.2793          | 0.2883   | 0.3934   | 0.5614          | 0.2220       | 0.8288  |
+| 1.1221        | 5.0     | 7670  | 0.1595          | 0.2384          | 0.2523   | 0.3533   | 0.5982          | 0.1761       | 0.8342  |
+| 1.0726        | 5.5002  | 8437  | 0.1604          | 0.2727          | 0.2930   | 0.3906   | 0.5584          | 0.2178       | 0.8318  |
+| 1.0381        | 6.0     | 9204  | 0.1629          | 0.2599          | 0.2693   | 0.3759   | 0.5421          | 0.2099       | 0.8315  |
+| 0.9957        | 6.5002  | 9971  | 0.1662          | 0.2814          | 0.2856   | 0.4001   | 0.5380          | 0.2223       | 0.8300  |
+| 0.9319        | 7.0     | 10738 | 0.1640          | 0.2604          | 0.2960   | 0.3820   | 0.5431          | 0.2201       | 0.8288  |
+| 0.8279        | 7.5002  | 11505 | 0.1733          | 0.2788          | 0.2953   | 0.3939   | 0.5275          | 0.2307       | 0.8245  |
+| 0.8365        | 8.0     | 12272 | 0.1742          | 0.2757          | 0.3004   | 0.3910   | 0.5030          | 0.2339       | 0.8218  |
+| 0.7168        | 8.5002  | 13039 | 0.1810          | 0.2863          | 0.3063   | 0.4020   | 0.4589          | 0.2499       | 0.8202  |
+| 0.7158        | 9.0     | 13806 | 0.1804          | 0.2758          | 0.3052   | 0.3910   | 0.4622          | 0.2392       | 0.8212  |
+| 0.5827        | 9.5002  | 14573 | 0.1880          | 0.2878          | 0.3166   | 0.4034   | 0.4568          | 0.2584       | 0.8159  |
+| 0.5958        | 10.0    | 15340 | 0.1906          | 0.2788          | 0.3114   | 0.3940   | 0.4912          | 0.2522       | 0.8134  |
+| 0.4641        | 10.5002 | 16107 | 0.1978          | 0.2750          | 0.3104   | 0.3896   | 0.4505          | 0.2501       | 0.8106  |
+| 0.4608        | 11.0    | 16874 | 0.2022          | 0.2724          | 0.3026   | 0.3880   | 0.4840          | 0.2470       | 0.8082  |
+| 0.3546        | 11.5002 | 17641 | 0.2113          | 0.2773          | 0.3120   | 0.3922   | 0.4598          | 0.2556       | 0.8038  |
+| 0.3575        | 12.0    | 18408 | 0.2133          | 0.2834          | 0.3092   | 0.3980   | 0.4361          | 0.2535       | 0.8045  |
+| 0.2601        | 12.5002 | 19175 | 0.2226          | 0.2778          | 0.3104   | 0.3897   | 0.4274          | 0.2559       | 0.8003  |
+| 0.258         | 13.0    | 19942 | 0.2275          | 0.2824          | 0.3176   | 0.3956   | 0.4188          | 0.2643       | 0.8003  |
+| 0.1778        | 13.5002 | 20709 | 0.2375          | 0.2686          | 0.3035   | 0.3815   | 0.4103          | 0.2496       | 0.7994  |
+| 0.1803        | 14.0    | 21476 | 0.2426          | 0.2713          | 0.3083   | 0.3865   | 0.4305          | 0.2522       | 0.7968  |
+| 0.1233        | 14.5002 | 22243 | 0.2501          | 0.2781          | 0.3139   | 0.3906   | 0.4473          | 0.2592       | 0.7970  |
+| 0.1197        | 15.0    | 23010 | 0.2566          | 0.2735          | 0.3081   | 0.3864   | 0.4231          | 0.2519       | 0.7950  |
+| 0.0804        | 15.5002 | 23777 | 0.2653          | 0.2746          | 0.3065   | 0.3839   | 0.4267          | 0.2512       | 0.7941  |
+| 0.0813        | 16.0    | 24544 | 0.2723          | 0.2740          | 0.3078   | 0.3861   | 0.4372          | 0.2505       | 0.7931  |
+| 0.0548        | 16.5002 | 25311 | 0.2813          | 0.2776          | 0.3077   | 0.3922   | 0.4544          | 0.2500       | 0.7927  |
+| 0.0535        | 17.0    | 26078 | 0.2882          | 0.2804          | 0.3093   | 0.3912   | 0.4497          | 0.2528       | 0.7914  |
+| 0.0387        | 17.5002 | 26845 | 0.2990          | 0.2753          | 0.3032   | 0.3876   | 0.4109          | 0.2499       | 0.7910  |
+### Framework versions
+- Transformers 4.53.2
+- Pytorch 2.6.0+cu124
+- Datasets 2.14.4
+- Tokenizers 0.21.2

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ea17f5ce683fe0171a3849de2da18e21708fec4004088fc17ad93817f6bda6d3
 size 598556596

 version https://git-lfs.github.com/spec/v1
+oid sha256:f8353f31351827d8f0cff1fe62eeefa3314bd600c87369483887c1596102f4d0
 size 598556596

tokenizer_config.json CHANGED Viewed

@@ -1126,8 +1126,7 @@
   "mask_token": "[MASK]",
   "model_input_names": [
     "input_ids",
-    "attention_mask",
-    "token_type_ids"
   ],
   "model_max_length": 8192,
   "pad_token": "[PAD]",

   "mask_token": "[MASK]",
   "model_input_names": [
     "input_ids",
+    "attention_mask"
   ],
   "model_max_length": 8192,
   "pad_token": "[PAD]",