eternis
/

eternis_router_encoder_sft_10Sep

Transformers

Safetensors

Generated from Trainer

Model card Files Files and versions

xet

Community

eternis commited on Sep 11, 2025

Commit

1bb1e6d

verified ·

1 Parent(s): dea38d1

Model save

Browse files

Files changed (2) hide show

README.md +24 -48
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -16,19 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9091
-- Complexity Accuracy: 0.7714
-- Model Accuracy: 0.3943
-- Overall Accuracy: 0.2892
-- Comp Acc Class 0: 0.9333
-- Comp Acc Class 1: 0.7614
-- Comp Acc Class 2: 0.7030
-- Model Acc Class 0: 0.4166
-- Model Acc Class 1: 0.3263
-- Model Acc Class 2: 0.3706
-- Model Acc Class 3: 0.224
-- Complexity Macro F1: 0.7794
-- Model Macro F1: 0.2693
 ## Model description
@@ -47,50 +38,35 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0005
-- train_batch_size: 16
 - eval_batch_size: 32
 - seed: 42
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 32
 - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.02
 - num_epochs: 10
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Complexity Accuracy | Model Accuracy | Overall Accuracy | Comp Acc Class 0 | Comp Acc Class 1 | Comp Acc Class 2 | Model Acc Class 0 | Model Acc Class 1 | Model Acc Class 2 | Model Acc Class 3 | Complexity Macro F1 | Model Macro F1 |
-|:-------------:|:------:|:----:|:---------------:|:-------------------:|:--------------:|:----------------:|:----------------:|:----------------:|:----------------:|:-----------------:|:-----------------:|:-----------------:|:-----------------:|:-------------------:|:--------------:|
-| 0.9723        | 0.3429 | 300  | 0.9664          | 0.7293              | 0.3019         | 0.2105           | 0.9240           | 0.7113           | 0.6576           | 0.3374            | 0.0096            | 0.2388            | 0.88              | 0.7377              | 0.1978         |
-| 0.8984        | 0.6857 | 600  | 0.9349          | 0.7572              | 0.3106         | 0.2197           | 0.9194           | 0.8587           | 0.4901           | 0.3387            | 0.0077            | 0.3284            | 0.848             | 0.7460              | 0.2052         |
-| 0.8743        | 1.0286 | 900  | 0.9386          | 0.7537              | 0.3183         | 0.2244           | 0.9395           | 0.7604           | 0.6427           | 0.3286            | 0.1228            | 0.3731            | 0.712             | 0.7596              | 0.2343         |
-| 0.8401        | 1.3714 | 1200 | 0.9377          | 0.7649              | 0.3126         | 0.2234           | 0.9705           | 0.7984           | 0.5957           | 0.3498            | 0.0499            | 0.2139            | 0.84              | 0.7603              | 0.2145         |
-| 0.8264        | 1.7143 | 1500 | 0.8737          | 0.7527              | 0.3945         | 0.2717           | 0.9132           | 0.7178           | 0.7294           | 0.4321            | 0.1017            | 0.4303            | 0.608             | 0.7664              | 0.2638         |
-| 0.8233        | 2.0571 | 1800 | 0.8828          | 0.7781              | 0.3885         | 0.2867           | 0.9240           | 0.8508           | 0.5710           | 0.4152            | 0.2342            | 0.3557            | 0.504             | 0.7725              | 0.2724         |
-| 0.8009        | 2.4    | 2100 | 0.9305          | 0.7517              | 0.3484         | 0.2458           | 0.9488           | 0.7108           | 0.7195           | 0.3809            | 0.0672            | 0.3806            | 0.648             | 0.7620              | 0.2374         |
-| 0.7699        | 2.7429 | 2400 | 0.9213          | 0.7367              | 0.3674         | 0.2535           | 0.9473           | 0.6464           | 0.7855           | 0.3657            | 0.2361            | 0.5423            | 0.392             | 0.7546              | 0.2699         |
-| 0.7815        | 3.0857 | 2700 | 0.8605          | 0.7746              | 0.4279         | 0.3146           | 0.9767           | 0.8146           | 0.5957           | 0.4681            | 0.2476            | 0.3831            | 0.368             | 0.7728              | 0.2870         |
-| 0.7628        | 3.4286 | 3000 | 0.8406          | 0.7731              | 0.4620         | 0.3467           | 0.9442           | 0.7702           | 0.6873           | 0.5322            | 0.1593            | 0.3632            | 0.376             | 0.7801              | 0.2865         |
-| 0.7478        | 3.7714 | 3300 | 0.8976          | 0.7746              | 0.3701         | 0.2717           | 0.9426           | 0.8262           | 0.5932           | 0.3920            | 0.1939            | 0.4254            | 0.408             | 0.7721              | 0.2596         |
-| 0.7182        | 4.1143 | 3600 | 0.9086          | 0.7761              | 0.3726         | 0.2712           | 0.9395           | 0.8184           | 0.6139           | 0.3987            | 0.1843            | 0.4204            | 0.384             | 0.7756              | 0.2575         |
-| 0.7226        | 4.4571 | 3900 | 0.8708          | 0.7604              | 0.4182         | 0.2984           | 0.9132           | 0.7382           | 0.7186           | 0.4523            | 0.2169            | 0.4602            | 0.312             | 0.7717              | 0.2808         |
-| 0.7051        | 4.8    | 4200 | 0.9171          | 0.7517              | 0.3833         | 0.2648           | 0.9519           | 0.6719           | 0.7871           | 0.4024            | 0.2438            | 0.4254            | 0.376             | 0.7684              | 0.2700         |
-| 0.6826        | 5.1429 | 4500 | 0.8959          | 0.7626              | 0.3900         | 0.2854           | 0.9147           | 0.7535           | 0.6980           | 0.4102            | 0.2649            | 0.4328            | 0.296             | 0.7723              | 0.2708         |
-| 0.6957        | 5.4857 | 4800 | 0.8982          | 0.7719              | 0.4095         | 0.3014           | 0.8915           | 0.8049           | 0.6493           | 0.4439            | 0.2399            | 0.3930            | 0.352             | 0.7752              | 0.2771         |
-| 0.6686        | 5.8286 | 5100 | 0.8987          | 0.7631              | 0.4095         | 0.2939           | 0.9271           | 0.7437           | 0.7104           | 0.4493            | 0.2284            | 0.3856            | 0.296             | 0.7732              | 0.2716         |
-| 0.6681        | 6.1714 | 5400 | 0.8870          | 0.7694              | 0.4147         | 0.3046           | 0.9566           | 0.7558           | 0.6939           | 0.4580            | 0.2860            | 0.2960            | 0.304             | 0.7789              | 0.2731         |
-| 0.6526        | 6.5143 | 5700 | 0.8983          | 0.7746              | 0.3980         | 0.2892           | 0.9426           | 0.7692           | 0.6947           | 0.4243            | 0.2726            | 0.4055            | 0.272             | 0.7815              | 0.2714         |
-| 0.656         | 6.8571 | 6000 | 0.9080          | 0.7582              | 0.4            | 0.2889           | 0.9364           | 0.7164           | 0.7376           | 0.4216            | 0.2879            | 0.4527            | 0.184             | 0.7712              | 0.2708         |
-| 0.6421        | 7.2    | 6300 | 0.9028          | 0.7768              | 0.3893         | 0.2889           | 0.9349           | 0.8072           | 0.6386           | 0.4068            | 0.3551            | 0.3507            | 0.24              | 0.7791              | 0.2690         |
-| 0.6366        | 7.5429 | 6600 | 0.8946          | 0.7731              | 0.4087         | 0.3019           | 0.9504           | 0.7905           | 0.6477           | 0.4425            | 0.2399            | 0.4254            | 0.256             | 0.7764              | 0.2729         |
-| 0.6463        | 7.8857 | 6900 | 0.8931          | 0.7706              | 0.4092         | 0.2986           | 0.9380           | 0.7581           | 0.7038           | 0.4419            | 0.3186            | 0.3358            | 0.248             | 0.7790              | 0.2740         |
-| 0.6405        | 8.2286 | 7200 | 0.8919          | 0.7706              | 0.4110         | 0.3009           | 0.9380           | 0.7581           | 0.7038           | 0.4415            | 0.2802            | 0.4129            | 0.224             | 0.7784              | 0.2756         |
-| 0.6101        | 8.5714 | 7500 | 0.9054          | 0.7729              | 0.3988         | 0.2939           | 0.9302           | 0.7614           | 0.7096           | 0.4247            | 0.3225            | 0.3607            | 0.224             | 0.7813              | 0.2700         |
-| 0.6366        | 8.9143 | 7800 | 0.9038          | 0.7696              | 0.3995         | 0.2932           | 0.9240           | 0.7567           | 0.7104           | 0.4250            | 0.3186            | 0.3706            | 0.224             | 0.7778              | 0.2711         |
-| 0.6306        | 9.2571 | 8100 | 0.9099          | 0.7704              | 0.3915         | 0.2859           | 0.9240           | 0.7521           | 0.7211           | 0.4125            | 0.3186            | 0.3856            | 0.216             | 0.7794              | 0.2682         |
-| 0.6293        | 9.6    | 8400 | 0.9093          | 0.7716              | 0.3938         | 0.2889           | 0.9333           | 0.7618           | 0.7030           | 0.4159            | 0.3244            | 0.3731            | 0.224             | 0.7795              | 0.2691         |
-| 0.6177        | 9.9429 | 8700 | 0.9091          | 0.7714              | 0.3943         | 0.2892           | 0.9333           | 0.7614           | 0.7030           | 0.4166            | 0.3263            | 0.3706            | 0.224             | 0.7794              | 0.2693         |
 ### Framework versions

 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6852
+- Complexity Accuracy: 0.772
+- Model Accuracy: 0.747
+- Overall Accuracy: 0.5793
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
 - eval_batch_size: 32
 - seed: 42
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 64
 - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.01
 - num_epochs: 10
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Complexity Accuracy | Model Accuracy | Overall Accuracy |
+|:-------------:|:------:|:----:|:---------------:|:-------------------:|:--------------:|:----------------:|
+| 0.8284        | 0.6857 | 300  | 0.7391          | 0.7275              | 0.7475         | 0.5437           |
+| 0.7657        | 1.3703 | 600  | 0.7173          | 0.7408              | 0.7478         | 0.5515           |
+| 0.7398        | 2.0549 | 900  | 0.7099          | 0.7502              | 0.748          | 0.5595           |
+| 0.7161        | 2.7406 | 1200 | 0.7037          | 0.7578              | 0.748          | 0.5645           |
+| 0.7057        | 3.4251 | 1500 | 0.6973          | 0.7635              | 0.7468         | 0.569            |
+| 0.7115        | 4.1097 | 1800 | 0.6927          | 0.764               | 0.748          | 0.5705           |
+| 0.7214        | 4.7954 | 2100 | 0.6896          | 0.7672              | 0.7482         | 0.5755           |
+| 0.7034        | 5.48   | 2400 | 0.6886          | 0.769               | 0.7472         | 0.5777           |
+| 0.6935        | 6.1646 | 2700 | 0.6878          | 0.769               | 0.7478         | 0.577            |
+| 0.7055        | 6.8503 | 3000 | 0.6867          | 0.7722              | 0.7465         | 0.5787           |
+| 0.6983        | 7.5349 | 3300 | 0.6858          | 0.7728              | 0.7465         | 0.5797           |
+| 0.7092        | 8.2194 | 3600 | 0.6849          | 0.774               | 0.747          | 0.5803           |
+| 0.697         | 8.9051 | 3900 | 0.6851          | 0.7718              | 0.747          | 0.5787           |
+| 0.6989        | 9.5897 | 4200 | 0.6852          | 0.772               | 0.747          | 0.5793           |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:10aec3531301f5f7e619f12e219f65b4037d1b191e38e98c3486ff79c92a7893
 size 597632156

 version https://git-lfs.github.com/spec/v1
+oid sha256:ddda65714e2f58101eb9cce10d5d5bdc15236e1a9f7cdf53455428d7a54cde39
 size 597632156